.. -*- rst-mode -*-

Syntax Highlight
================

.. contents::
.. sectnum::

Syntax highlighting significantly enhances the readability of code. However,
in the current version, docutils does not highlight literal blocks. 

This sandbox project aims to add syntax highlight of code blocks to the
capabilities of docutils. To find its way into the docutils core, it should
meet the requirements laid out in a mail on `Questions about writing
programming manuals and scientific documents`__, by docutils main developer
David Goodger:

   I'd be happy to include Python source colouring support, and other
   languages would be welcome too. A multi-language solution would be
   useful, of course. My issue is providing support for all output formats
   -- HTML and LaTeX and XML and anything in the future -- simultaneously.
   Just HTML isn't good enough. Until there is a generic-output solution,
   this will be something users will have to put together themselves.

__ http://sourceforge.net/mailarchive/message.php?msg_id=12921194


State of the art
----------------

There are already docutils extensions providing syntax colouring, e.g:

SilverCity_, 
  a C++ library and Python extension that can provide lexical
  analysis for over 20 different programming languages. A recipe__ for a
  "code-block" directive provides syntax highlight by SilverCity.
  
__ http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/252170

`listings`_,
  a LaTeX package providing highly customisable and advanced syntax
  highlight, though only for LaTeX (and LaTeX derived PS|PDF).  
  
  Since Docutils 0.5, the latex2e writer (and rst2latex.py) support
  syntax highlight of literal blocks by `listings` with the
  ``--literal-block-env=listings`` option. You need to provide a
  stylesheet (see also `Pylit Examples`_)
  
.. _Pylit Examples: http://pylit.berlios.de/examples/index.html#latex-packages
  
Trac_ 
  has `reStructuredText support`__ and offers syntax highlighting with
  a "code-block" directive using GNU Enscript_, SilverCity_, or Pygments_.
  
__ http://trac.edgewall.org/wiki/WikiRestructuredText

rest2web_,
  the "site builder" provides the `colorize`__ macro (using the
  `Moin-Moin Python colorizer`_) 

__ http://www.voidspace.org.uk/python/rest2web/macros.html#colorize

Pygments_ 
  is a generic syntax highlighter written completely in Python. 

  * Usable as a command-line tool and as a Python package.
  * A wide range of common `languages and markup formats`_ is supported.
  * Additionally, OpenOffice's ``*.odt`` is supported by the odtwriter_.
  * The layout is configurable by style sheets.
  * Several built-in styles and an option for line-numbering.
  * Built-in output formats include HTML, LaTeX, rtf
  * Support for new languages, formats, and styles is added easily (modular
    structure, Python code, existing documentation).
  * Well documented and actively maintained.
  * The web site provides a recipe for `using Pygments in ReST documents`_.
    It is used in the `Pygments enhanced docutils front-ends`_ below.

Odtwriter_, experimental writer for Docutils OpenOffice export supports syntax
  colours using Pygments_. (See (the outdated) section `Odtwriter syntax`_.)

Pygments_ seems to be the most promising docutils highlighter. For printed
output, the listings_ package has its advantages too.


Pygments enhanced docutils front-ends
-------------------------------------

Syntax highlight can be achieved by `front-end scripts`_ combining docutils and
pygments.

   "something users [will have to] put together themselves"

Advantages:
  + Easy implementation with no changes to the stock docutils_. 
  + Separation of code blocks and ordinary literal blocks.

Disadvantages:
  1. "code-block" content is formatted by `pygments`_ and inserted in the
     document tree as a "raw" node making the approach writer-dependant.
  2. documents are incompatible with the standard docutils because of the
     locally defined directive.
  3. more "invasive" markup distracting from content
     (no "minimal" code block marker -- three additional lines per code block)


Point 1 and 2 lead to the `code-block directive proposal`_. 

Point 3 becomes an issue in literate programming where a code block is
the most used block markup. It is addressed in the proposal for a
`configurable literal block directive`_).


`code-block` directive proposal
-------------------------------

Syntax
""""""

This is the first draft for a reStructuredText definition, analogue to
other directives in ``directives.txt``.

:Directive Type: "code-block"
:Doctree Element: literal_block
:Directive Arguments: One (`language`) or more (class names), optional.
:Directive Options: None.
:Directive Content: Becomes the body of the literal block.

The "code-block" directive constructs a literal block where the content is
parsed as source code and syntax highlight rules for `language` are
applied. If syntax rules for `language` are not known to docutils, it
is rendered like an ordinary literal block.

For example, the following snipped is parsed and marked up as python source 
code. The actual rendering depends on the style-sheet. ::

  .. code-block:: python

    def my_function():
        "just a test"
        print 8/2

If the language argument is missing, a (configurable) default language
should be used. 

Additional arguments might be used and e.g. passed to
the pygments_ parser or (as class arguments) the output document.

:number-lines: let pygments include line-numbers

Include directive option
"""""""""""""""""""""""

The include directive should get a matching new option:

code : language 
  The entire included text is inserted into the document as if it were the
  content of a code-block directive (useful for program listings).

Implementation
""""""""""""""

Reading
'''''''

Felix Wiemann provided a `proof of concept`_ script that utilizes the
pygments_ parser to parse a source code string and store the result in
the document tree.

This concept is used in a `pygments_code_block_directive`_ (Source:
`pygments_code_block_directive.py`_), to define and register a "code-block"
directive.

* The ``DocutilsInterface`` class uses pygments to parse the content of the
  directive and classify the tokens using short CSS class names identical to
  pygments HTML output. If pygments is not available, the unparsed code is
  returned.

* The ``code_block_directive`` function inserts the tokens in a "rich"
  <literal_block> element with "classified" <inline> nodes.

The XML rendering of the small example file `myfunction.py.txt`_ looks like
`myfunction.py.xml`_.

Writing
'''''''

The writers can use the class information in the <inline> elements to render
the tokens. They should ignore the class information if they are unable to
use it or to pass it on.

HTML
  The "html" writer works out of the box. 

  * The rst2html-highlight_ front end registers the "code-block" directive and
    converts an input file to html. 
  * Styling is done with the adapted CSS style sheet `pygments-default.css`_
    based on docutils' default stylesheet and the output of 
    ``pygmentize -S default -f html``.
  * The result looks like `myfunction.py.htm`_.
  
  The "s5" and "pep" writers are not tested yet.
  
XML 
  "xml" and "pseudoxml" work out of the box, too. See `myfunction.py.xml`_
  and `myfunction.py.pseudoxml`_

LaTeX
  Latex writers must be fixed to handle the "rich" <literal_block> element
  correct:
  
  * The "latex" writer currently fails to handle "classified" <inline>
    doctree elements. The output `myfunction.py.tex`_ contains undefined
    control sequences ``\docutilsroleNone``.
  
  * The "newlatex2e" writer produces a valid LaTeX document
    (`myfunction.py.newlatex2e.tex`_). However there is some garbage in the
    `pdflatex` output (`myfunction.py.newlatex2e.pdf`_). 
    
  The pygments-produced LaTeX style file will not work with docutils' LaTeX
  output. The writer(s) will need to 
  
  a) be extended in order to convert the "classified" <inline> doctree
     elements into LaTeX styling instructions, or
     
  b) reconstruct the original content and pass it to a ``lstlistings``
     environment.
    
OpenOffice 
  The non-official "odtwriter" provides syntax highlight with
  pygments but uses a different syntax and implementation.


TODO
""""

1. Let the latex2e writer ignore unknown class arguments.

2. Write functional test case and sample.

3. Minimal implementation: 

   * move the code from `pygments_code_block_directive.py`_ to "the right
     place".
     
   * add the CSS rules to the default style-sheet (see pygments-default.css_)
   
4. Enable the latex2e writers to highlight code-blocks (more precise: to
   write LaTeX code that lets the latex engine highlight them).

5. Think about an interface for pygments' options (like "encoding" or
  "linenumbers").



.. _proof of concept:
     http://article.gmane.org/gmane.text.docutils.user/3689
.. _pygments_code_block_directive.py: ../pygments_code_block_directive.py
.. _pygments_code_block_directive: pygments_code_block_directive-bunt.py.htm
.. _pygments_docutils_interface.py: pygments_docutils_interface.py
.. _myfunction.py.txt: myfunction.py.txt
.. _myfunction.py.xml: myfunction.py.xml
.. _myfunction.py.htm: myfunction.py.htm
.. _myfunction.py.pseudoxml: myfunction.py.pseudoxml
.. _myfunction.py.tex: myfunction.py.tex
.. _myfunction.py.newlatex2e.tex: myfunction.py.newlatex2e.tex
.. _myfunction.py.newlatex2e.pdf: myfunction.py.newlatex2e.pdf
.. _rst2html-highlight: ../rst2html-highlight
.. _pygments-long.css: ../data/pygments-long.css



Configurable literal block directive
------------------------------------

Goal
""""

A clean and simple syntax for highlighted code blocks -- preserving the
space saving feature of the "minimised" literal block marker (``::`` at the
end of a text paragraph). This is especially desirable in documents with 
many code blocks like tutorials or literate programs.

Inline analogon
"""""""""""""""

The *role* of inline `interpreted text` can be customised with the
"default-role" directive. This allows the use of the concise "backtick"
syntax for the most often used role, e.g. in a chemical paper, one could
use::

  .. default-role:: subscript
  
  The triple point of H\ `2`\O is at 0°C.

.. default-role:: subscript

to produce 
  
  The triple point of H\ `2`\O is at 0°C.

This customisation is currently not possible for block markup.

Proposal
""""""""

* Define a new "literal-block" directive syntax for an ordinary literal
  block. This would simply insert the block content into the document
  tree as "literal-block" element.

* Define a "default-literal-block" setting that controls which
  directive is called on a block following ``::``. Default would be the
  "literal-block" directive (backwards compatible).

Motivation
""""""""""

Analogue to customising the default role of "interpreted text" with the
"default-role" directive, the concise ``::`` literal-block markup could be
used for e.g.

* a "code-block" directive for syntax highight

* the "line-block" directive for poems or addresses

* the "parsed-literal" directive

Example::

  ordinary literal block::
  
     some text typeset in monospace

  .. default-literal-block::  code-block python
     
  this is colourful Python code::
     
     def hello():
         print "hello world"

  
In the same line, a "default-block-quote" setting or directive could be
considered to configure the role of a block quote.

Odtwriter syntax
----------------

Dave Kuhlman's odtwriter_ extension can add syntax highlighting
to ordinary literal blocks.

.. attention:: 
   The following might no longer be true for the current version of
   the odtwriter_.

The ``--add-syntax-highlighting`` command line flag activates syntax
highlighting in literal blocks. By default, the "python" lexer is used.

You can change this within your reST document with the `sourcecode`
directive::
  
  .. sourcecode:: off
  
  ordinary literal block::
  
     content set in teletype

  .. sourcecode:: on
  .. sourcecode:: python
     
  colourful Python code::
     
     def hello():
         print "hello world"


The "sourcecode" directive defined by the odtwriter is principally
different from the "code-block" directive of ``rst2html-pygments``:
  
* The odtwriter directive does not have content. It is a switch.

* The syntax highlighting state and language/lexer set by this directive
  remain in effect until the next sourcecode directive is encountered in the
  reST document.
  
  ``.. sourcecode:: <newstate>`` 
       make highlighting active or inactive. 
       <newstate> is either ``on`` or ``off``.
  
  ``.. sourcecode:: <lexer>`` 
       change the lexer parsing literal code blocks.
       <lexer> should be one of aliases listed at pygment's `languages and
       markup formats`_.

I.e. the odtwriter implements a `configurable literal block directive`_
(but with a slightly different syntax than my proposal below).


.. External links 
.. _pylit: http://pylit.berlios.de
.. _docutils: http://docutils.sourceforge.net/
.. _rest2web: http://www.voidspace.org.uk/python/rest2web/
.. _Enscript: http://www.gnu.org/software/enscript/enscript.html
.. _SilverCity: http://silvercity.sourceforge.net/
.. _Trac: http://trac.edgewall.org/
.. _Moin-Moin Python colorizer: 
    http://www.standards-schmandards.com/2005/fangs-093/
.. _odtwriter: http://www.rexx.com/~dkuhlman/odtwriter.html
.. _pygments: http://pygments.org/
.. _listings: 
    http://www.ctan.org/tex-archive/help/Catalogue/entries/listings.html
.. _fancyvrb: 
    http://www.ctan.org/tex-archive/help/Catalogue/entries/fancyvrb.html
.. _alltt: http://www.ctan.org/tex-archive/help/Catalogue/entries/alltt.html
.. _moreverb:
    http://www.ctan.org/tex-archive/help/Catalogue/entries/moreverb.html
.. _verbatim: 
    http://www.ctan.org/tex-archive/help/Catalogue/entries/verbatim.html
.. _languages and markup formats: http://pygments.org/languages
.. _Using Pygments in ReST documents: http://pygments.org/docs/rstdirective/
.. _Docutils Document Tree: 
    http://docutils.sf.net/docs/ref/doctree.html#classes
.. _latex-variants: http://docutils.sourceforge.net/sandbox/latex-variants/

.. Internal links
.. _front-end scripts: ../tools/pygments-enhanced-front-ends
.. _pygments-default.css: ../data/pygments-default.css

