(or emacs irrelevant)

Light it up! Pygments for Emacs Lisp.

The Challenge

More than 2 years ago, the formidable @bbatsov of Emacs Redux had this to say:

Well, let's turn that #fail-frown upside down!

The Python

A quick search brought me to this page: Write your own lexer -- Pygments. Turns out that the Pygments development takes place on Bitbucket, so I had to start an account there. I shortly cloned the repository:

hg clone https://[email protected]/birkenfeld/pygments-main

Then I quickly copy-pasted some starting code:

__all__ = ['SchemeLexer', 'CommonLispLexer',
           'HyLexer', 'RacketLexer',
           'NewLispLexer', 'EmacsLispLexer']

class EmacsLispLexer(RegexLexer):
    """
    An ELisp lexer, parsing a stream and outputting the tokens
    needed to highlight elisp code.
    """
    name = 'ELisp'
    aliases = ['emacs', 'elisp']
    filenames = ['*.el']
    mimetypes = ['text/x-elisp']

    flags = re.MULTILINE

    # the rest of the code was copied from CommonLispLexer for now

Apparently, infrastructure-wise, I only need to know two commands. The first one needs to be run just once, so that Pygments is aware of the new lexer:

$ cd ~/git/pygments-main && make mapfiles

The second command is to (re-)generate /tmp/example.html:

$ cp ~/git/emacs/lisp/vc/ediff.el \
  ~/git/pygments-main/tests/examplefiles/
$ ./pygmentize -O full -f html -o /tmp/example.html \
  tests/examplefiles/ediff.el

I would repeat the last line with each update to the code, and then refresh the page in Firefox to see the result.

The Elisp

To finalize the lexer, the following tasks ensued:

  • get a list of built-in macros
  • get a list of special forms
  • get a list of built-in functions

In the process, I've added two more lists:

  • a list of built-in functions that are highlighted with font-lock-keyword-face:

    'defvaralias', 'provide', 'require',
    'with-no-warnings', 'define-widget', 'with-electric-help',
    'throw', 'defalias', 'featurep'
    
  • a list of built-in functions and macros that are highlighted with font-lock-warning-face:

    'cl-assert', 'cl-check-type', 'error', 'signal',
    'user-error', 'warn'
    

To generate the other three lists, I started off writing things in *scratch*, but after a while my compulsion to C-x C-s kicked in and I've saved the work to research.el. At least, thanks to @bbatsov, I'm not C-x C-s-ing that much since I've added this:

(defun save-and-switch-buffer ()
  (interactive)
  (when (and (buffer-file-name)
             (not (bound-and-true-p archive-subfile-mode)))
    (save-buffer))
  (ido-switch-buffer))
(global-set-key "η" 'save-and-switch-buffer)

But it's time for the student to one-up the master, so here's a tip to improve even further:

(defun oleh-ido-setup-hook ()
  (define-key ido-buffer-completion-map "η" 'ido-next-match))

This way I can cycle the buffers with the same shortcut that invokes save-and-switch-buffer. The defaults are C-s and C-r, in case you didn't know.

The C

Getting the list of built-in C functions and special forms, obviously involved browsing the C source code. In case you don't (yet) have the Emacs sources, they're here:

$ git clone git://git.savannah.gnu.org/emacs.git

I switched to the ./src directory and called M-x find-name-dired with *.c to build a list of all the sources. Then I ran the following code from research.el:

(defvar foo-c-functions nil)
(defvar foo-c-special-forms nil)

(defun c-research ()
  (let ((files (dired-get-marked-files))
        (i 0))
    (dolist (file files)
      (message "%d" (incf i))
      (with-current-buffer (find-file-noselect file)
        (goto-char (point-min))
        (while (re-search-forward "^DEFUN (" nil t)
          (backward-char 1)
          (let ((beg (point))
                (end (save-excursion
                       (forward-list)
                       (point)))
                str)
            (forward-char 2)
            (search-forward "\"" nil t)
            (setq str (read (buffer-substring-no-properties
                             (+ beg 2) (1- (point)))))
            (if (re-search-forward "UNEVALLED" end t)
                (push str foo-c-special-forms)
              (push str foo-c-functions))))))))

This was beautiful, by the way, to just generate this sort of documentation from such well-formatted and documented C sources. Free Software FTW.

If you're interested, there are 1294 built-in functions. Here's a list of 23 special forms that I found:

and catch cond condition-case defconst
defvar function if interactive let let*
or prog1 prog2 progn quote
save-current-buffer save-excursion
save-restriction setq setq-default
unwind-protect while

You can read up on the special forms in the SICP. There's no node for them, so just use isearch.

The Result

You can see it here: ediff.html, as well as on the rest of the site, since I've switched it on everywhere.

The Impact

Unfortunately this won't have impact on the Github source code highlighter, since Github dropped Pygments recently.

But people that use the static blog generator Jekyll or the LaTeX package minted (that's the package that org-mode's PDF Export uses by default) will be able to get better Elisp highlighting. In fact, this blog is already using the new highlighter.

See the rest of projects that use Pygments here

The Bitbucket

So now, to share the new lexer with the world I just have to learn how to:

  • stage and commit in Mercurial
  • push Mercurial to Bitbucket
  • open a pull request on Bitbucket

I don't want to become a hipster, these things just happen.