diff options
author | goodger <goodger@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2002-04-20 03:01:52 +0000 |
---|---|---|
committer | goodger <goodger@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2002-04-20 03:01:52 +0000 |
commit | 101671ae44e1686680c80cd07b452aabeb88fb63 (patch) | |
tree | c3e859c167fc0259d708de65ec5e703293d63f68 | |
parent | 3522fa4da8a8aec1e7da46cf4eb4a5239ed17bfd (diff) | |
download | docutils-101671ae44e1686680c80cd07b452aabeb88fb63.tar.gz |
Initial revision
git-svn-id: http://svn.code.sf.net/p/docutils/code/trunk/docutils@18 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
105 files changed, 30284 insertions, 0 deletions
diff --git a/COPYING.txt b/COPYING.txt new file mode 100644 index 000000000..fb2c3f5ae --- /dev/null +++ b/COPYING.txt @@ -0,0 +1,34 @@ +================== + Copying Docutils +================== + +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Date: $Date$ +:Web-site: http://docutils.sourceforge.net/ + +Most of the files included in this project are in the public domain, +and therefore have no license requirement and no restrictions on +copying or usage. The two exceptions are: + +- docutils/roman.py, copyright 2001 by Mark Pilgrim, licensed under the + `Python 2.1.1 license`_. + +- test/difflib.py, copyright by the Python Software Foundation, + licensed under the `Python 2.2 license`_. This file is included for + compatibility with Python versions less than 2.2; if you have Python + 2.2 or higher, difflib.py is not needed and may be removed. (It's + only used to report test failures anyhow; it isn't installed + anywhere. The included file is a pre-generator version of the + difflib.py module included in Python 2.2.) + +(Disclaimer: I am not a lawyer.) The Python license is OSI-approved_ +and GPL-compatible_. Although complicated by multiple owners and lots +of legalese, it basically lets you copy, use, modify, and redistribute +files as long as you keep the copyright attribution intact, note any +changes you make, and don't use the owner's name in vain. + +.. _Python 2.1.1 license: http://www.python.org/2.1.1/license.html +.. _Python 2.2 license: http://www.python.org/2.2/license.html +.. _OSI-approved: http://opensource.org/licenses/ +.. _GPL-compatible: http://www.gnu.org/philosophy/license-list.html diff --git a/HISTORY.txt b/HISTORY.txt new file mode 100644 index 000000000..afbbaf41e --- /dev/null +++ b/HISTORY.txt @@ -0,0 +1,52 @@ +================== + Docutils History +================== + +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Date: $Date$ +:Website: http://docutils.sourceforge.net/ + + +Acknowledgements +================ + +I would like to acknowledge the people who have made a direct impact +on the Docutils project, knowingly or not, in terms of encouragement, +suggestions, criticism, bug reports, code contributions, and related +projects: + + David Ascher, Fred Drake, Jim Fulton, Peter Funk, Doug Hellmann, + Juergen Hermann, Tony Ibbs, Richard Jones, Garth Kidd, Daniel + Larsson, Marc-Andre Lemburg, Wolfgang Lipp, Edward Loper, Ken + Manheimer, Paul Moore, Michel Pelletier, Sam Penrose, Tim Peters, + Mark Pilgrim, Tavis Rudd, Ueli Schlaepfer, Bob Tolbert, Laurence + Tratt, Guido van Rossum, Barry Warsaw, Edward Welbourne, Ka-Ping + Yee, Moshe Zadka + +(I'm still waiting for contributions of computer equipment and cold +hard cash :-).) Hopefully I haven't forgotten anyone or misspelled +any names; apologies (and please let me know!) if I have. + + +Release 0.1 (2002-04-??) +======================== + +This is the first release of Docutils, merged from the now inactive +reStructuredText__ and `Docstring Processing System`__ projects. For +the pre-Docutils history, see the `reStructuredText HISTORY.txt`__ and +the `DPS HISTORY.txt`__ files. + +__ http://structuredtext.sourceforge.net/ +__ http://docstring.sourceforge.net/ +__ http://structuredtext.sourceforge.net/HISTORY.txt +__ http://docstring.sourceforge.net/HISTORY.txt + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 000000000..04191d2dc --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1,6 @@ +include *.txt +include *.py +recursive-include docs * +recursive-include spec * +recursive-include test * +recursive-include tools * diff --git a/README.txt b/README.txt new file mode 100644 index 000000000..8b823b28d --- /dev/null +++ b/README.txt @@ -0,0 +1,147 @@ +================== + README: Docutils +================== + +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Date: $Date$ +:Website: http://docutils.sourceforge.net/ + +Thank you for downloading the Python Docutils project arhive. As this +is a work in progress, please check the project website for updated +working files. + +To run the code, Python 2.0 or later must already be installed. You +can get Python from http://www.python.org/. + + +Project Files & Directories +=========================== + +* README.txt: You're reading it. + +* COPYING.txt: Copyright details for non-public-domain files (most are + PD). + +* HISTORY.txt: Release notes for the current and previous project + releases. + +* setup.py: Installation script. See "Installation" below. + +* install.py: Quick & dirty installation script. + +* docutils: The project source directory, installed as a Python + package. + +* docs: The project user documentation directory. The docs/rest + directory contains reStructuredText user docs. + +* spec: The project specification directory. Contains PEPs (Python + Enhancement Proposals), XML DTDs (document type definitions), and + other documents. The spec/rest directory contains the + reStructuredText specification. + +* tools: Directory for standalone scripts that use reStructuredText. + + - quicktest.py: Input reStructuredText, output pretty-printed + pseudo-XML and various other forms. + + - publish.py: A minimal example of a complete Docutils system, using + the "standalone" reader and "pformat" writer. + + - html.py: Read standalone reStructuredText documents and write + HTML4/CSS1. Uses the default.css stylesheet. + +* test: Unit tests; ``test/alltests.py`` runs all the tests. Not + required to use the software, but very useful if you're planning to + modify it. + + +Installation +============ + +The first step is to expand the .tar.gz archive. It contains a +distutils setup file "setup.py". OS-specific installation +instructions follow. + +Linux, Unix, MacOS X +-------------------- + +1. Open a shell. + +2. Go to the directory created by expanding the archive:: + + cd <archive_directory_path> + +3. Install the package:: + + python setup.py install + + If the python executable isn't on your path, you'll have to specify + the complete path, such as /usr/local/bin/python. You may need + root permissions to complete this step. + +You can also just run install.py; it does the same thing. + +Windows +------- + +1. Open a DOS box (Command Shell, MSDOS Prompt, or whatever they're + calling it these days). + +2. Go to the directory created by expanding the archive:: + + cd <archive_directory_path> + +3. Install the package:: + + <path_to_python.exe>\python setup.py install + +If your system is set up to run Python when you double-click on .py +files, you can run install.py to do the same as the above. + +MacOS +----- + +1. Open the folder containing the expanded archive. + +2. Double-click on the file "setup.py", which should be a "Python + module" file. + + If the file isn't a "Python module", the line endings are probably + also wrong, and you will need to set up your system to recognize + ".py" file extensions as Python files. See + http://gotools.sourceforge.net/mac/python.html for detailed + instructions. Once set up, it's easiest to start over by expanding + the archive again. + +3. The distutils options window will appear. From the "Command" popup + list choose "install", click "Add", then click "OK". + +If install.py is a "Python module" (see step 2 above if it isn't), you +can run it instead of the above. The distutils options window will +not appear. + + +Usage +===== + +Start with the html.py and publish.py front-ends from the unpacked +"tools" subdirectory. Both tools take up to two arguments, the source +path and destination path, with STDIN and STDOUT being the defaults. + +The package modules are continually growing and evolving. The +``docutils.statemachine`` module is usable independently. It contains +extensive inline documentation (in reStructuredText format). + +The specs, the package structure, and the skeleton modules may also be +of interest to you. Contributions are welcome! + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/dev/pysource.dtd b/docs/dev/pysource.dtd new file mode 100644 index 000000000..463844a68 --- /dev/null +++ b/docs/dev/pysource.dtd @@ -0,0 +1,212 @@ +<!-- +====================================================================== + Docutils Python Source DTD +====================================================================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This DTD has been placed in the public domain. +:Filename: pysource.dtd + +This DTD (document type definition) extends the Generic DTD (see +below). + +More information about this DTD and the Docutils project can be found +at http://docutils.sourceforge.net/. The latest version of this DTD +is available from http://docutils.sourceforge.net/spec/pysource.dtd. + +The proposed formal public identifier for this DTD is:: + + +//IDN python.org//DTD Docutils Python Source//EN//XML +--> + + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Parameter Entity Overrides +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!ENTITY % additional.structural.elements + " | package_section | module_section | class_section + | method_section | function_section | module_attribute_section + | class_attribute_section | instance_attribute_section "> + +<!ENTITY % additional.inline.elements + " | package | module | class | method | function + | variable | parameter | type + | module_attribute | class_attribute | instance_attribute + | exception_class | warning_class "> + + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Generic DTD +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This DTD extends the Docutils Generic DTD, available from +http://docutils.sourceforge.net/spec/docutils.dtd. +--> + +<!ENTITY % docutils PUBLIC + "+//IDN python.org//DTD Docutils Generic//EN//XML" + "docutils.dtd"> +%docutils; + + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Additional Structural Elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!ELEMENT package_section (package, %structure.model;)> +<!ATTLIST package_section %basic.atts;> + +<!ELEMENT module_section (module, %structure.model;)> +<!ATTLIST module_section %basic.atts;> + +<!ELEMENT class_section + (class, inheritance_list?, parameter_list?, %structure.model;)> +<!ATTLIST class_section %basic.atts;> + +<!ELEMENT method_section (method, parameter_list?, %structure.model;)> +<!ATTLIST method_section %basic.atts;> + +<!ELEMENT function_section (function, parameter_list?, %structure.model;)> +<!ATTLIST function_section %basic.atts;> + +<!ELEMENT module_attribute_section + (module_attribute, initial_value?, %structure.model;)> +<!ATTLIST module_attribute_section %basic.atts;> + +<!ELEMENT class_attribute_section + (class_attribute, initial_value?, %structure.model;)> +<!ATTLIST class_attribute_section %basic.atts;> + +<!ELEMENT instance_attribute_section + (instance_attribute, initial_value?, %structure.model;)> +<!ATTLIST instance_attribute_section %basic.atts;> + +<!ELEMENT inheritance_list (class+)> +<!ATTLIST inheritance_list %basic.atts;> + +<!ELEMENT parameter_list + ((parameter_item+, optional_parameters*) | optional_parameters+)> +<!ATTLIST parameter_list %basic.atts;> + +<!ELEMENT parameter_item ((parameter | parameter_tuple), parameter_default?)> +<!ATTLIST parameter_item %basic.atts;> + +<!ELEMENT optional_parameters (parameter_item+, optional_parameters*)> +<!ATTLIST optional_parameters %basic.atts;> + +<!ELEMENT parameter_tuple (parameter | parameter_tuple)+> +<!ATTLIST parameter_tuple %basic.atts;> + +<!ELEMENT parameter_default (#PCDATA)> +<!ATTLIST parameter_default %basic.atts;> + +<!ELEMENT initial_value (#PCDATA)> +<!ATTLIST initial_value %basic.atts;> + + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Additional Inline Elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!-- Also used as the `package_section` identifier/title. --> +<!ELEMENT package (#PCDATA)> +<!ATTLIST package + %basic.atts; + %link.atts;> + +<!-- Also used as the `module_section` identifier/title. --> +<!ELEMENT module (#PCDATA)> +<!ATTLIST module + %basic.atts; + %link.atts;> + +<!-- +Also used as the `class_section` identifier/title, and in the +`inheritance` element. +--> +<!ELEMENT class (#PCDATA)> +<!ATTLIST class + %basic.atts; + %link.atts;> + +<!-- Also used as the `method_section` identifier/title. --> +<!ELEMENT method (#PCDATA)> +<!ATTLIST method + %basic.atts; + %link.atts;> + +<!-- Also used as the `function_section` identifier/title. --> +<!ELEMENT function (#PCDATA)> +<!ATTLIST function + %basic.atts; + %link.atts;> + +<!-- +Also used as the `module_attribute_section` identifier/title. A module +attribute is an exported module-level global variable. +--> +<!ELEMENT module_attribute (#PCDATA)> +<!ATTLIST module_attribute + %basic.atts; + %link.atts;> + +<!-- Also used as the `class_attribute_section` identifier/title. --> +<!ELEMENT class_attribute (#PCDATA)> +<!ATTLIST class_attribute + %basic.atts; + %link.atts;> + +<!-- +Also used as the `instance_attribute_section` identifier/title. +--> +<!ELEMENT instance_attribute (#PCDATA)> +<!ATTLIST instance_attribute + %basic.atts; + %link.atts;> + +<!ELEMENT variable (#PCDATA)> +<!ATTLIST variable + %basic.atts; + %link.atts;> + +<!-- Also used in `parameter_list`. --> +<!ELEMENT parameter (#PCDATA)> +<!ATTLIST parameter + %basic.atts; + %link.atts; + excess_positional %yesorno; #IMPLIED + excess_keyword %yesorno; #IMPLIED> + +<!ELEMENT type (#PCDATA)> +<!ATTLIST type + %basic.atts; + %link.atts;> + +<!ELEMENT exception_class (#PCDATA)> +<!ATTLIST exception_class + %basic.atts; + %link.atts;> + +<!ELEMENT warning_class (#PCDATA)> +<!ATTLIST warning_class + %basic.atts; + %link.atts;> + + +<!-- +Local Variables: +mode: sgml +indent-tabs-mode: nil +fill-column: 70 +End: +--> diff --git a/docs/dev/pysource.txt b/docs/dev/pysource.txt new file mode 100644 index 000000000..a75fd1d4d --- /dev/null +++ b/docs/dev/pysource.txt @@ -0,0 +1,173 @@ +====================== + Python Source Reader +====================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ + +This document explores issues around extracting and processing +docstrings from Python modules. + +For definitive element hierarchy details, see the "Python Plaintext +Document Interface DTD" XML document type definition, pysource.dtd_ +(which modifies the generic docutils.dtd_). Descriptions below list +'DTD elements' (XML 'generic identifiers' or tag names) corresponding +to syntax constructs. + + +.. contents:: + + +Python Source Reader +==================== + +The Python Source Reader ("PySource") model that's evolving in my mind +goes something like this: + +1. Extract the docstring/namespace [#]_ tree from the module(s) and/or + package(s). + + .. [#] See `Docstring Extractor`_ above. + +2. Run the parser on each docstring in turn, producing a forest of + doctrees (per nodes.py). + +3. Join the docstring trees together into a single tree, running + transforms: + + - merge hyperlinks + - merge namespaces + - create various sections like "Module Attributes", "Functions", + "Classes", "Class Attributes", etc.; see spec/ppdi.dtd + - convert the above special sections to ordinary doctree nodes + +4. Run transforms on the combined doctree. Examples: resolving + cross-references/hyperlinks (including interpreted text on Python + identifiers); footnote auto-numbering; first field list -> + bibliographic elements. + + (Or should step 4's transforms come before step 3?) + +5. Pass the resulting unified tree to the writer/builder. + +I've had trouble reconciling the roles of input parser and output +writer with the idea of modes ("readers" or "directors"). Does the +mode govern the tranformation of the input, the output, or both? +Perhaps the mode should be split into two. + +For example, say the source of our input is a Python module. Our +"input mode" should be the "Python Source Reader". It discovers (from +``__docformat__``) that the input parser is "reStructuredText". If we +want HTML, we'll specify the "HTML" output formatter. But there's a +piece missing. What *kind* or *style* of HTML output do we want? +PyDoc-style, LibRefMan style, etc. (many people will want to specify +and control their own style). Is the output style specific to a +particular output format (XML, HTML, etc.)? Is the style specific to +the input mode? Or can/should they be independent? + +I envision interaction between the input parser, an "input mode" , and +the output formatter. The same intermediate data format would be used +between each of these, being transformed as it progresses. + + +Docstring Extractor +=================== + +We need code that scans a parsed Python module, and returns an ordered +tree containing the names, docstrings (including attribute and +additional docstrings), and additional info (in parentheses below) of +all of the following objects: + +- packages +- modules +- module attributes (+ values) +- classes (+ inheritance) +- class attributes (+ values) +- instance attributes (+ values) +- methods (+ formal parameters & defaults) +- functions (+ formal parameters & defaults) + +(Extract comments too? For example, comments at the start of a module +would be a good place for bibliographic field lists.) + +In order to evaluate interpreted text cross-references, namespaces for +each of the above will also be required. + +See python-dev/docstring-develop thread "AST mining", started on +2001-08-14. + + +Interpreted Text +================ + +DTD elements: package, module, class, method, function, +module_attribute, class_attribute, instance_attribute, variable, +parameter, type, exception_class, warning_class. + +In Python docstrings, interpreted text is used to classify and mark up +program identifiers, such as the names of variables, functions, +classes, and modules. If the identifier alone is given, its role is +inferred implicitly according to the Python namespace lookup rules. +For functions and methods (even when dynamically assigned), +parentheses ('()') may be included:: + + This function uses `another()` to do its work. + +For class, instance and module attributes, dotted identifiers are used +when necessary:: + + class Keeper(Storer): + + """ + Extend `Storer`. Class attribute `instances` keeps track of + the number of `Keeper` objects instantiated. + """ + + instances = 0 + """How many `Keeper` objects are there?""" + + def __init__(self): + """ + Extend `Storer.__init__()` to keep track of instances. + + Keep count in `self.instances` and data in `self.data`. + """ + Storer.__init__(self) + self.instances += 1 + + self.data = [] + """Store data in a list, most recent last.""" + + def storedata(self, data): + """ + Extend `Storer.storedata()`; append new `data` to a list + (in `self.data`). + """ + self.data = data + +To classify identifiers explicitly, the role is given along with the +identifier in either prefix or suffix form:: + + Use :method:`Keeper.storedata` to store the object's data in + `Keeper.data`:instance_attribute:. + +The role may be one of 'package', 'module', 'class', 'method', +'function', 'module_attribute', 'class_attribute', +'instance_attribute', 'variable', 'parameter', 'type', +'exception_class', 'exception', 'warning_class', or 'warning'. Other +roles may be defined. + +.. _reStructuredText Markup Specification: + http://docutils.sourceforge.net/spec/rst/reStructuredText.html +.. _Docutils: http://docutils.sourceforge.net/ +.. _pysource.dtd: http://docutils.sourceforge.net/spec/pysource.dtd +.. _docutils.dtd: http://docutils.sourceforge.net/spec/docutils.dtd + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + fill-column: 70 + End: diff --git a/docs/dev/rst/alternatives.txt b/docs/dev/rst/alternatives.txt new file mode 100644 index 000000000..9bbe9a2f1 --- /dev/null +++ b/docs/dev/rst/alternatives.txt @@ -0,0 +1,1239 @@ +================================================== + A Record of reStructuredText Syntax Alternatives +================================================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ + +The following are ideas, alternatives, and justifications that were +considered for reStructuredText syntax, which did not originate with +Setext_ or StructuredText_. For an analysis of constructs which *did* +originate with StructuredText or Setext, please see `Problems With +StructuredText`_. See the `reStructuredText Markup Specification`_ +for full details of the established syntax. + +.. _Setext: http://docutils.sourceforge.net/mirror/setext.html +.. _StructuredText: + http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage +.. _Problems with StructuredText: problems.html +.. _reStructuredText Markup Specification: reStructuredText.html + + +.. contents:: + + +... Or Not To Do? +================= + +This is the realm of the possible but questionably probable. These +ideas are kept here as a record of what has been proposed, for +posterity and in case any of them prove to be useful. + + +Compound Enumerated Lists +------------------------- + +Allow for compound enumerators, such as '1.1.' or '1.a.' or '1(a)', to +allow for nested enumerated lists without indentation? + + +Auto-Numbered Enumerated Lists +------------------------------ + +Add these? Example:: + + #. Item 1. + #. Item 2. + #. Item 3. + +Arabic numerals only, or any sequence if first initialized? For +example:: + + a) Item a. + #) Item b. + #) Item c. + + +Sloppy Indentation of List Items +-------------------------------- + +Perhaps the indentation shouldn't be so strict. Currently, this is +required:: + + 1. First line, + second line. + +Anything wrong with this? :: + + 1. First line, + second line. + +Problem? + + 1. First para. + + Block quote. (no good: requires some indent relative to first + para) + + Second Para. + + 2. Have to carefully define where the literal block ends:: + + Literal block + + Literal block? + +Hmm... Non-strict indentation isn't such a good idea. + + +Lazy Indentation of List Items +------------------------------ + +Another approach: Going back to the first draft of reStructuredText +(2000-11-27 post to Doc-SIG):: + + - This is the fourth item of the main list (no blank line above). + The second line of this item is not indented relative to the + bullet, which precludes it from having a second paragraph. + +Change that to *require* a blank line above and below, to reduce +ambiguity. This "loosening" may be added later, once the parser's +been nailed down. However, a serious drawback of this approach is to +limit the content of each list item to a single paragraph. + + +David's Idea for Lazy Indentation +````````````````````````````````` + +Consider a paragraph in a word processor. It is a single logical line +of text which ends with a newline, soft-wrapped arbitrarily at the +right edge of the page or screen. We can think of a plaintext +paragraph in the same way, as a single logical line of text, ending +with two newlines (a blank line) instead of one, and which may contain +arbitrary line breaks (newlines) where it was accidentally +hard-wrapped by an application. We can compensate for the accidental +hard-wrapping by "unwrapping" every unindented second and subsequent +line. The indentation of the first line of a paragraph or list item +would determine the indentation for the entire element. Blank lines +would be required between list items when using lazy indentation. + +The following example shows the lazy indentation of multiple body +elements:: + + - This is the first paragraph + of the first list item. + + Here is the second paragraph + of the first list item. + + - This is the first paragraph + of the second list item. + + Here is the second paragraph + of the second list item. + +A more complex example shows the limitations of lazy indentation:: + + - This is the first paragraph + of the first list item. + + Next is a definition list item: + + Term + Definition. The indentation of the term is + required, as is the indentation of the definition's + first line. + + When the definition extends to more than + one line, lazy indentation may occur. (This is the second + paragraph of the definition.) + + - This is the first paragraph + of the second list item. + + - Here is the first paragraph of + the first item of a nested list. + + So this paragraph would be outside of the nested list, + but inside the second list item of the outer list. + + But this paragraph is not part of the list at all. + +And the ambiguity remains:: + + - Look at the hyphen at the beginning of the next line + - is it a second list item marker, or a dash in the text? + + Similarly, we may want to refer to numbers inside enumerated + lists: + + 1. How many socks in a pair? There are + 2. How many pants in a pair? Exactly + 1. Go figure. + +Literal blocks and block quotes would still require consistent +indentation for all their lines. For block quotes, we might be able +to get away with only requiring that the first line of each contained +element be indented. For example:: + + Here's a paragraph. + + This is a paragraph inside a block quote. + Second and subsequent lines need not be indented at all. + + - A bullet list inside + the block quote. + + Second paragraph of the + bullet list inside the block quote. + +Although feasible, this form of lazy indentation has problems. The +document structure and hierarchy is not obvious from the indentation, +making the source plaintext difficult to read. This will also make +keeping track of the indentation while writing difficult and +error-prone. However, these problems may be acceptable for Wikis and +email mode, where we may be able to rely on less complex structure +(few nested lists, for example). + + +Field Lists +=========== + +Prior to the syntax for field lists being finalized, several +alternatives were proposed. + +1. Unadorned RFC822_ everywhere:: + + Author: Me + Version: 1 + + Advantages: clean, precedent (RFC822-compliant). Disadvantage: + ambiguous (these paragraphs are a prime example). + + Conclusion: rejected. + +2. Special case: use unadorned RFC822_ for the very first or very last + text block of a document:: + + """ + Author: Me + Version: 1 + + The rest of the document... + """ + + Advantages: clean, precedent (RFC822-compliant). Disadvantages: + special case, flat (unnested) field lists only, still ambiguous:: + + """ + Usage: cmdname [options] arg1 arg2 ... + + We obviously *don't* want the like above to be interpreted as a + field list item. Or do we? + """ + + Conclusion: rejected for the general case, accepted for specific + contexts (PEPs, email). + +3. Use a directive:: + + .. fields:: + + Author: Me + Version: 1 + + Advantages: explicit and unambiguous, RFC822-compliant. + Disadvantage: cumbersome. + + Conclusion: rejected for the general case (but such a directive + could certainly be written). + +4. Use Javadoc-style:: + + @Author: Me + @Version: 1 + @param a: integer + + Advantages: unambiguous, precedent, flexible. Disadvantages: + non-intuitive, ugly, not RFC822-compliant. + + Conclusion: rejected. + +5. Use leading colons:: + + :Author: Me + :Version: 1 + + Advantages: unambiguous, obvious (*almost* RFC822-compliant), + flexible, perhaps even elegant. Disadvantages: no precedent, not + quite RFC822-compliant. + + Conclusion: accepted! + +6. Use double colons:: + + Author:: Me + Version:: 1 + + Advantages: unambiguous, obvious? (*almost* RFC822-compliant), + flexible, similar to syntax already used for literal blocks and + directives. Disadvantages: no precedent, not quite + RFC822-compliant, similar to syntax already used for literal blocks + and directives. + + Conclusion: rejected because of the syntax similarity & conflicts. + +Why is RFC822 compliance important? It's a universal Internet +standard, and super obvious. Also, I'd like to support the PEP format +(ulterior motive: get PEPs to use reStructuredText as their standard). +But it *would* be easy to get used to an alternative (easy even to +convert PEPs; probably harder to convert python-deviants ;-). + +Unfortunately, without well-defined context (such as in email headers: +RFC822 only applies before any blank lines), the RFC822 format is +ambiguous. It is very common in ordinary text. To implement field +lists unambiguously, we need explicit syntax. + +The following question was posed in a footnote: + + Should "bibliographic field lists" be defined at the parser level, + or at the DPS transformation level? In other words, are they + reStructuredText-specific, or would they also be applicable to + another (many/every other?) syntax? + +The answer is that bibliographic fields are a +reStructuredText-specific markup convention. Other syntaxes may +implement the bibliographic elements explicitly. For example, there +would be no need for such a transformation for an XML-based markup +syntax. + +.. _RFC822: http://www.rfc-editor.org/rfc/rfc822.txt + + +Interpreted Text "Roles" +======================== + +The original purpose of interpreted text was as a mechanism for +descriptive markup, to describe the nature or role of a word or +phrase. For example, in XML we could say "<function>len</function>" +to mark up "len" as a function. It is envisaged that within Python +docstrings (inline documentation in Python module source files, the +primary market for reStructuredText) the role of a piece of +interpreted text can be inferred implicitly from the context of the +docstring within the program source. For other applications, however, +the role may have to be indicated explicitly. + +Interpreted text is enclosed in single backquotes (`). + +1. Initially, it was proposed that an explicit role could be indicated + as a word or phrase within the enclosing backquotes: + + - As a prefix, separated by a colon and whitespace:: + + `role: interpreted text` + + - As a suffix, separated by whitespace and a colon:: + + `interpreted text :role` + + There are problems with the initial approach: + + - There could be ambiguity with interpreted text containing colons. + For example, an index entry of "Mission: Impossible" would + require a backslash-escaped colon. + + - The explicit role is descriptive markup, not content, and will + not be visible in the processed output. Putting it inside the + backquotes doesn't feel right; the *role* isn't being quoted. + +2. Tony Ibbs suggested that the role be placed outside the + backquotes:: + + role:`prefix` or `suffix`:role + + This removes the embedded-colons ambiguity, but limits the role + identifier to be a single word (whitespace would be illegal). + Since roles are not meant to be visible after processing, the lack + of whitespace support is not important. + + The suggested syntax remains ambiguous with respect to ratios and + some writing styles. For example, suppose there is a "signal" + identifier, and we write:: + + ...calculate the `signal`:noise ratio. + + "noise" looks like a role. + +3. As an improvement on #2, we can bracket the role with colons:: + + :role:`prefix` or `suffix`:role: + + This syntax is similar to that of field lists, which is fine since + both are doing similar things: describing. + + This is the syntax chosen for reStructuredText. + +4. Another alternative is two colons instead of one:: + + role::`prefix` or `suffix`::role + + But this is used for analogies ("A:B::C:D": "A is to B as C is to + D"). + + Both alternative #2 and #4 lack delimiters on both sides of the + role, making it difficult to parse (by the reader). + +5. Some kind of bracketing could be used: + + - Parentheses:: + + (role)`prefix` or `suffix`(role) + + - Braces:: + + {role}`prefix` or `suffix`{role} + + - Square brackets:: + + [role]`prefix` or `suffix`[role] + + - Angle brackets:: + + <role>`prefix` or `suffix`<role> + + (The overlap of \*ML tags with angle brackets would be too + confusing and precludes their use.) + +Syntax #3 was chosen for reStructuredText. + + +Comments +======== + +A problem with comments (actually, with all indented constructs) is +that they cannot be followed by an indented block -- a block quote -- +without swallowing it up. + +I thought that perhaps comments should be one-liners only. But would +this mean that footnotes, hyperlink targets, and directives must then +also be one-liners? Not a good solution. + +Tony Ibbs suggested a "comment" directive. I added that we could +limit a comment to a single text block, and that a "multi-block +comment" could use "comment-start" and "comment-end" directives. This +would remove the indentation incompatibility. A "comment" directive +automatically suggests "footnote" and (hyperlink) "target" directives +as well. This could go on forever! Bad choice. + +Garth Kidd suggested that an "empty comment", a ".." explicit markup +start with nothing on the first line (except possibly whitespace) and +a blank line immediately following, could serve as an "unindent". An +empty comment does **not** swallow up indented blocks following it, +so block quotes are safe. "A tiny but practical wart." Accepted. + + +Anonymous Hyperlinks +==================== + +Alan Jaffray came up with this idea, along with the following syntax:: + + Search the `Python DOC-SIG mailing list archives`{}_. + + .. _: http://mail.python.org/pipermail/doc-sig/ + +The idea is sound and useful. I suggested a "double underscore" +syntax:: + + Search the `Python DOC-SIG mailing list archives`__. + + .. __: http://mail.python.org/pipermail/doc-sig/ + +But perhaps single underscores are okay? The syntax looks better, but +the hyperlink itself doesn't explicitly say "anonymous":: + + Search the `Python DOC-SIG mailing list archives`_. + + .. _: http://mail.python.org/pipermail/doc-sig/ + +Mixing anonymous and named hyperlinks becomes confusing. The order of +targets is not significant for named hyperlinks, but it is for +anonymous hyperlinks:: + + Hyperlinks: anonymous_, named_, and another anonymous_. + + .. _named: named + .. _: anonymous1 + .. _: anonymous2 + +Without the extra syntax of double underscores, determining which +hyperlink references are anonymous may be difficult. We'd have to +check which references don't have corresponding targets, and match +those up with anonymous targets. Keeping to a simple consistent +ordering (as with auto-numbered footnotes) seems simplest. + +reStructuredText will use the explicit double-underscore syntax for +anonymous hyperlinks. An alternative (see `Reworking Explicit +Markup`_ below) for the somewhat awkward ".. __:" syntax is "__":: + + An anonymous__ reference. + + __ http://anonymous + + +Reworking Explicit Markup +========================= + +Alan Jaffray came up with the idea of `anonymous hyperlinks`_, added +to reStructuredText. Subsequently it was asserted that hyperlinks +(especially anonymous hyperlinks) would play an increasingly important +role in reStructuredText documents, and therefore they require a +simpler and more concise syntax. This prompted a review of the +current and proposed explicit markup syntaxes with regards to +improving usability. + +1. Original syntax:: + + .. _blah: internal hyperlink target + .. _blah: http://somewhere external hyperlink target + .. _blah: blahblah_ indirect hyperlink target + .. __: anonymous internal target + .. __: http://somewhere anonymous external target + .. __: blahblah_ anonymous indirect target + .. [blah] http://somewhere footnote + .. blah:: http://somewhere directive + .. blah: http://somewhere comment + + .. Note:: + + The comment text was intentionally made to look like a hyperlink + target. + + Origins: + + * Except for the colon (a delimiter necessary to allow for + phrase-links), hyperlink target ``.. _blah:`` comes from Setext. + * Comment syntax from Setext. + * Footnote syntax from StructuredText ("named links"). + * Directives and anonymous hyperlinks original to reStructuredText. + + Advantages: + + + Consistent explicit markup indicator: "..". + + Consistent hyperlink syntax: ".. _" & ":". + + Disadvantages: + + - Anonymous target markup is awkward: ".. __:". + - The explicit markup indicator ("..") is excessively overloaded? + - Comment text is limited (can't look like a footnote, hyperlink, + or directive). But this is probably not important. + +2. Alan Jaffray's proposed syntax #1:: + + __ _blah internal hyperlink target + __ blah: http://somewhere external hyperlink target + __ blah: blahblah_ indirect hyperlink target + __ anonymous internal target + __ http://somewhere anonymous external target + __ blahblah_ anonymous indirect target + __ [blah] http://somewhere footnote + .. blah:: http://somewhere directive + .. blah: http://somewhere comment + + The hyperlink-connoted underscores have become first-level syntax. + + Advantages: + + + Anonymous targets are simpler. + + All hyperlink targets are one character shorter. + + Disadvantages: + + - Inconsistent internal hyperlink targets. Unlike all other named + hyperlink targets, there's no colon. There's an extra leading + underscore, but we can't drop it because without it, "blah" looks + like a relative URI. Unless we restore the colon:: + + __ blah: internal hyperlink target + + - Obtrusive markup? + +3. Alan Jaffray's proposed syntax #2:: + + .. _blah internal hyperlink target + .. blah: http://somewhere external hyperlink target + .. blah: blahblah_ indirect hyperlink target + .. anonymous internal target + .. http://somewhere anonymous external target + .. blahblah_ anonymous indirect target + .. [blah] http://somewhere footnote + !! blah: http://somewhere directive + ## blah: http://somewhere comment + + Leading underscores have been (almost) replaced by "..", while + comments and directives have gained their own syntax. + + Advantages: + + + Anonymous hyperlinks are simpler. + + Unique syntax for comments. Connotation of "comment" from + some programming languages (including our favorite). + + Unique syntax for directives. Connotation of "action!". + + Disadvantages: + + - Inconsistent internal hyperlink targets. Again, unlike all other + named hyperlink targets, there's no colon. There's a leading + underscore, matching the trailing underscores of references, + which no other hyperlink targets have. We can't drop that one + leading underscore though: without it, "blah" looks like a + relative URI. Again, unless we restore the colon:: + + .. blah: internal hyperlink target + + - All (except for internal) hyperlink targets lack their leading + underscores, losing the "hyperlink" connotation. + + - Obtrusive syntax for comments. Alternatives:: + + ;; blah: http://somewhere + (also comment syntax in Lisp & others) + ,, blah: http://somewhere + ("comma comma": sounds like "comment"!) + + - Iffy syntax for directives. Alternatives? + +4. Tony Ibbs' proposed syntax:: + + .. _blah: internal hyperlink target + .. _blah: http://somewhere external hyperlink target + .. _blah: blahblah_ indirect hyperlink target + .. anonymous internal target + .. http://somewhere anonymous external target + .. blahblah_ anonymous indirect target + .. [blah] http://somewhere footnote + .. blah:: http://somewhere directive + .. blah: http://somewhere comment + + This is the same as the current syntax, except for anonymous + targets which drop their "__: ". + + Advantage: + + + Anonymous targets are simpler. + + Disadvantages: + + - Anonymous targets lack their leading underscores, losing the + "hyperlink" connotation. + - Anonymous targets are almost indistinguishable from comments. + (Better to know "up front".) + +5. David Goodger's proposed syntax: Perhaps going back to one of + Alan's earlier suggestions might be the best solution. How about + simply adding "__ " as a synonym for ".. __: " in the original + syntax? These would become equivalent:: + + .. __: anonymous internal target + .. __: http://somewhere anonymous external target + .. __: blahblah_ anonymous indirect target + + __ anonymous internal target + __ http://somewhere anonymous external target + __ blahblah_ anonymous indirect target + +Alternative 5 has been adopted. + + +Backquotes in Phrase-Links +========================== + +[From a 2001-06-05 Doc-SIG post in reply to questions from Doug +Hellmann.] + +The first draft of the spec, posted to the Doc-SIG in November 2000, +used square brackets for phrase-links. I changed my mind because: + +1. In the first draft, I had already decided on single-backquotes for + inline literal text. + +2. However, I wanted to minimize the necessity for backslash escapes, + for example when quoting Python repr-equivalent syntax that uses + backquotes. + +3. The processing of identifiers (funtion/method/attribute/module/etc. + names) into hyperlinks is a useful feature. PyDoc recognizes + identifiers heuristically, but it doesn't take much imagination to + come up with counter-examples where PyDoc's heuristics would result + in embarassing failure. I wanted to do it deterministically, and + that called for syntax. I called this construct 'interpreted + text'. + +4. Leveraging off the ``*emphasis*/**strong**`` syntax, lead to the + idea of using double-backquotes as syntax. + +5. I worked out some rules for inline markup recognition. + +6. In combination with #5, double backquotes lent themselves to inline + literals, neatly satisfying #2, minimizing backslash escapes. In + fact, the spec says that no interpretation of any kind is done + within double-backquote inline literal text; backslashes do *no* + escaping within literal text. + +7. Single backquotes are then freed up for interpreted text. + +8. I already had square brackets required for footnote references. + +9. Since interpreted text will typically turn into hyperlinks, it was + a natural fit to use backquotes as the phrase-quoting syntax for + trailing-underscore hyperlinks. + +The original inspiration for the trailing underscore hyperlink syntax +was Setext. But for phrases Setext used a very cumbersome +``underscores_between_words_like_this_`` syntax. + +The underscores can be viewed as if they were right-pointing arrows: +``-->``. So ``hyperlink_`` points away from the reference, and +``.. _hyperlink:`` points toward the target. + + +Substitution Mechanism +====================== + +Substitutions arose out of a Doc-SIG thread begun on 2001-10-28 by +Alan Jaffray, "reStructuredText inline markup". It reminded me of a +missing piece of the reStructuredText puzzle, first referred to in my +contribution to "Documentation markup & processing / PEPs" (Doc-SIG +2001-06-21). + +Substitutions allow the power and flexibility of directives to be +shared by inline text. They are a way to allow arbitrarily complex +inline objects, while keeping the details out of the flow of text. +They are the equivalent of SGML/XML's named entities. For example, an +inline image (using reference syntax alternative 4d (vertical bars) +and definition alternative 3, the alternatives chosen for inclusion in +the spec):: + + The |biohazard| symbol must be used on containers used to dispose + of medical waste. + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + +The ``|biohazard|`` substitution reference will be replaced in-line by +whatever the ``.. |biohazard|`` substitution definition generates (in +this case, an image). A substitution definition contains the +substitution text bracketed with vertical bars, followed by a an +embedded inline-compatible directive, such as "image". A transform is +required to complete the substitution. + +Syntax alternatives for the reference: + +1. Use the existing interpreted text syntax, with a predefined role + such as "sub":: + + The `biohazard`:sub: symbol... + + Advantages: existing syntax, explicit. Disadvantages: verbose, + obtrusive. + +2. Use a variant of the interpreted text syntax, with a new suffix + akin to the underscore in phrase-link references:: + + (a) `name`@ + (b) `name`# + (c) `name`& + (d) `name`/ + (e) `name`< + (f) `name`:: + (g) `name`: + + + Due to incompatibility with other constructs and ordinary text + usage, (f) and (g) are not possible. + +3. Use interpreted text syntax with a fixed internal format:: + + (a) `:name:` + (b) `name:` + (c) `name::` + (d) `::name::` + (e) `%name%` + (f) `#name#` + (g) `/name/` + (h) `&name&` + (i) `|name|` + (j) `[name]` + (k) `<name>` + (l) `&name;` + (m) `'name'` + + To avoid ML confusion (k) and (l) are definitely out. Square + brackets (j) won't work in the target (the substitution definition + would be indistinguishable from a footnote). + + The ```/name/``` syntax (g) is reminiscent of "s/find/sub" + substitution syntax in ed-like languages. However, it may have a + misleading association with regexps, and looks like an absolute + POSIX path. (i) is visually equivalent and lacking the + connotations. + + A disadvantage of all of these is that they limit interpreted text, + albeit only slightly. + +4. Use specialized syntax, something new:: + + (a) #name# + (b) @name@ + (c) /name/ + (d) |name| + (e) <<name>> + (f) //name// + (g) ||name|| + (h) ^name^ + (i) [[name]] + (j) ~name~ + (k) !name! + (l) =name= + (m) ?name? + (n) >name< + + "#" (a) and "@" (b) are obtrusive. "/" (c) without backquotes + looks just like a POSIX path; it is likely for such usage to appear + in text. + + "|" (d) and "^" (h) are feasible. + +5. Redefine the trailing underscore syntax. See definition syntax + alternative 4, below. + +Syntax alternatives for the definition: + +1. Use the existing directive syntax, with a predefined directive such + as "sub". It contains a further embedded directive resolving to an + inline-compatible object:: + + .. sub:: biohazard + .. image:: biohazard.png + [height=20 width=20] + + .. sub:: parrot + That bird wouldn't *voom* if you put 10,000,000 volts + through it! + + The advantages and disadvantages are the same as in inline + alternative 1. + +2. Use syntax as in #1, but with an embedded directivecompressed:: + + .. sub:: biohazard image:: biohazard.png + [height=20 width=20] + + This is a bit better than alternative 1, but still too much. + +3. Use a variant of directive syntax, incorporating the substitution + text, obviating the need for a special "sub" directive name. If we + assume reference alternative 4d (vertical bars), the matching + definition would look like this:: + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + +4. (Suggested by Alan Jaffray on Doc-SIG from 2001-11-06.) + + Instead of adding new syntax, redefine the trailing underscore + syntax to mean "substitution reference" instead of "hyperlink + reference". Alan's example:: + + I had lunch with Jonathan_ today. We talked about Zope_. + + .. _Jonathan: lj [user=jhl] + .. _Zope: http://www.zope.org/ + + A problem with the proposed syntax is that URIs which look like + simple reference names (alphanum plus ".", "-", "_") would be + indistinguishable from substitution directive names. A more + consistent syntax would be:: + + I had lunch with Jonathan_ today. We talked about Zope_. + + .. _Jonathan: lj:: user=jhl + .. _Zope: http://www.zope.org/ + + (``::`` after ``.. _Jonathan: lj``.) + + The "Zope" target is a simple external hyperlink, but the + "Jonathan" target contains a directive. Alan proposed is that the + reference text be replaced by whatever the referenced directive + (the "directive target") produces. A directive reference becomes a + hyperlink reference if the contents of the directive target resolve + to a hyperlink. If the directive target resolves to an icon, the + reference is replaced by an inline icon. If the directive target + resolves to a hyperlink, the directive reference becomes a + hyperlink reference. + + This seems too indirect and complicated for easy comprehension. + + The reference in the text will sometimes become a link, sometimes + not. Sometimes the reference text will remain, sometimes not. We + don't know *at the reference*:: + + This is a `hyperlink reference`_; its text will remain. + This is an `inline icon`_; its text will disappear. + + That's a problem. + +The syntax that has been incorporated into the spec and parser is +reference alternative 4d with definition alternative 3:: + + The |biohazard| symbol... + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + +We can also combine substitution references with hyperlink references, +by appending a "_" (named hyperlink reference) or "__" (anonymous +hyperlink reference) suffix to the substitution reference. This +allows us to click on an image-link:: + + The |biohazard|_ symbol... + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + .. _biohazard: http://www.cdc.gov/ + +There have been several suggestions for the naming of these +constructs, originally called "substitution references" and +"substitutions". + +1. Candidate names for the reference construct: + + (a) substitution reference + (b) tagging reference + (c) inline directive reference + (d) directive reference + (e) indirect inline directive reference + (f) inline directive placeholder + (g) inline directive insertion reference + (h) directive insertion reference + (i) insertion reference + (j) directive macro reference + (k) macro reference + (l) substitution directive reference + +2. Candidate names for the definition construct: + + (a) substitution + (b) substitution directive + (c) tag + (d) tagged directive + (e) directive target + (f) inline directive + (g) inline directive definition + (h) referenced directive + (i) indirect directive + (j) indirect directive definition + (k) directive definition + (l) indirect inline directive + (m) named directive definition + (n) inline directive insertion definition + (o) directive insertion definition + (p) insertion definition + (q) insertion directive + (r) substitution definition + (s) directive macro definition + (t) macro definition + (u) substitution directive definition + (v) substitution definition + +"Inline directive reference" (1c) seems to be an appropriate term at +first, but the term "inline" is redundant in the case of the +reference. Its counterpart "inline directive definition" (2g) is +awkward, because the directive definition itself is not inline. + +"Directive reference" (1d) and "directive definition" (2k) are too +vague. "Directive definition" could be used to refer to any +directive, not just those used for inline substitutions. + +One meaning of the term "macro" (1k, 2s, 2t) is too +programming-language-specific. Also, macros are typically simple text +substitution mechanisms: the text is substituted first and evaluated +later. reStructuredText substitution definitions are evaluated in +place at parse time and substituted afterwards. + +"Insertion" (1h, 1i, 2n-2q) is almost right, but it implies that +something new is getting added rather than one construct being +replaced by another. + +Which brings us back to "substitution". The overall best names are +"substitution reference" (1a) and "substitution definition" (2v). A +long way to go to add one word! + + +Reworking Footnotes +=================== + +As a further wrinkle (see `Reworking Explicit Markup`_ above), in the +wee hours of 2002-02-28 I posted several ideas for changes to footnote +syntax: + + - Change footnote syntax from ``.. [1]`` to ``_[1]``? ... + - Differentiate (with new DTD elements) author-date "citations" + (``[GVR2002]``) from numbered footnotes? ... + - Render footnote references as superscripts without "[]"? ... + +These ideas are all related, and suggest changes in the +reStructuredText syntax as well as the docutils tree model. + +The footnote has been used for both true footnotes (asides expanding +on points or defining terms) and for citations (references to external +works). Rather than dealing with one amalgam construct, we could +separate the current footnote concept into strict footnotes and +citations. Citations could be interpreted and treated differently +from footnotes. Footnotes would be limited to numerical labels: +manual ("1") and auto-numbered (anonymous "#", named "#label"). + +The footnote is the only explicit markup construct (starts with ".. ") +that directly translates to a visible body element. I've always been +a little bit uncomfortable with the ".. " marker for footnotes because +of this; ".. " has a connotation of "special", but footnotes aren't +especially "special". Printed texts often put footnotes at the bottom +of the page where the reference occurs (thus "foot note"). Some HTML +designs would leave footnotes to be rendered the same positions where +they're defined. Other online and printed designs will gather +footnotes into a section near the end of the document, converting them +to "endnotes" (perhaps using a directive in our case); but this +"special processing" is not an intrinsic property of the footnote +itself, but a decision made by the document author or processing +system. + +Citations are almost invariably collected in a section at the end of a +document or section. Citations "disappear" from where they are +defined and are magically reinserted at some well-defined point. +There's more of a connection to the "special" connotation of the ".. " +syntax. The point at which the list of citations is inserted could be +defined manually by a directive (e.g., ".. citations::"), and/or have +default behavior (e.g., a section automatically inserted at the end of +the document) that might be influenced by options to the Writer. + +Syntax proposals: + ++ Footnotes: + + - Current syntax:: + + .. [1] Footnote 1 + .. [#] Auto-numbered footnote. + .. [#label] Auto-labeled footnote. + + - The syntax proposed in the original 2002-02-28 Doc-SIG post: + remove the ".. ", prefix a "_":: + + _[1] Footnote 1 + _[#] Auto-numbered footnote. + _[#label] Auto-labeled footnote. + + The leading underscore syntax (earlier dropped because + ``.. _[1]:`` was too verbose) is a useful reminder that footnotes + are hyperlink targets. + + - Minimal syntax: remove the ".. [" and "]", prefix a "_", and + suffix a ".":: + + _1. Footnote 1. + _#. Auto-numbered footnote. + _#label. Auto-labeled footnote. + + ``_1.``, ``_#.``, and ``_#label.`` are markers, + like list markers. + + Footnotes could be rendered something like this in HTML + + | 1. This is a footnote. The brackets could be dropped + | from the label, and a vertical bar could set them + | off from the rest of the document in the HTML. + + Two-way hyperlinks on the footnote marker ("1." above) would also + help to differentiate footnotes from enumerated lists. + + If converted to endnotes (by a directive/transform), a horizontal + half-line might be used instead. Page-oriented output formats + would typically use the horizontal line for true footnotes. + ++ Footnote references: + + - Current syntax:: + + [1]_, [#]_, [#label]_ + + - Minimal syntax to match the minimal footnote syntax above:: + + 1_, #_, #label_ + + As a consequence, pure-numeric hyperlink references would not be + possible; they'd be interpreted as footnote references. + ++ Citation references: no change is proposed from the current footnote + reference syntax:: + + [GVR2001]_ + ++ Citations: + + - Current syntax (footnote syntax):: + + .. [GVR2001] Python Documentation; van Rossum, Drake, et al.; + http://www.python.org/doc/ + + - Possible new syntax:: + + _[GVR2001] Python Documentation; van Rossum, Drake, et al.; + http://www.python.org/doc/ + + _[DJG2002] + Docutils: Python Documentation Utilities project; Goodger + et al.; http://docutils.sourceforge.net/ + + Without the ".. " marker, subsequent lines would either have to + align as in one of the above, or we'd have to allow loose + alignment (I'd rather not):: + + _[GVR2001] Python Documentation; van Rossum, Drake, et al.; + http://www.python.org/doc/ + +I proposed adopting the "minimal" syntax for footnotes and footnote +references, and adding citations and citation references to +reStructuredText's repertoire. The current footnote syntax for +citations is better than the alternatives given. + +From a reply by Tony Ibbs on 2002-03-01: + + However, I think easier with examples, so let's create one:: + + Fans of Terry Pratchett are perhaps more likely to use + footnotes [1]_ in their own writings than other people + [2]_. Of course, in *general*, one only sees footnotes + in academic or technical writing - it's use in fiction + and letter writing is not normally considered good + style [4]_, particularly in emails (not a medium that + lends itself to footnotes). + + .. [1] That is, little bits of referenced text at the + bottom of the page. + .. [2] Because Terry himself does, of course [3]_. + .. [3] Although he has the distinction of being + *funny* when he does it, and his fans don't always + achieve that aim. + .. [4] Presumably because it detracts from linear + reading of the text - this is, of course, the point. + + and look at it with the second syntax proposal:: + + Fans of Terry Pratchett are perhaps more likely to use + footnotes [1]_ in their own writings than other people + [2]_. Of course, in *general*, one only sees footnotes + in academic or technical writing - it's use in fiction + and letter writing is not normally considered good + style [4]_, particularly in emails (not a medium that + lends itself to footnotes). + + _[1] That is, little bits of referenced text at the + bottom of the page. + _[2] Because Terry himself does, of course [3]_. + _[3] Although he has the distinction of being + *funny* when he does it, and his fans don't always + achieve that aim. + _[4] Presumably because it detracts from linear + reading of the text - this is, of course, the point. + + (I note here that if I have gotten the indentation of the + footnotes themselves correct, this is clearly not as nice. And if + the indentation should be to the left margin instead, I like that + even less). + + and the third (new) proposal:: + + Fans of Terry Pratchett are perhaps more likely to use + footnotes 1_ in their own writings than other people + 2_. Of course, in *general*, one only sees footnotes + in academic or technical writing - it's use in fiction + and letter writing is not normally considered good + style 4_, particularly in emails (not a medium that + lends itself to footnotes). + + _1. That is, little bits of referenced text at the + bottom of the page. + _2. Because Terry himself does, of course 3_. + _3. Although he has the distinction of being + *funny* when he does it, and his fans don't always + achieve that aim. + _4. Presumably because it detracts from linear + reading of the text - this is, of course, the point. + + I think I don't, in practice, mind the targets too much (the use + of a dot after the number helps a lot here), but I do have a + problem with the body text, in that I don't naturally separate out + the footnotes as different than the rest of the text - instead I + keep wondering why there are numbers interspered in the text. The + use of brackets around the numbers ([ and ]) made me somehow parse + the footnote references as "odd" - i.e., not part of the body text + - and thus both easier to skip, and also (paradoxically) easier to + pick out so that I could follow them. + + Thus, for the moment (and as always susceptable to argument), I'd + say -1 on the new form of footnote reference (i.e., I much prefer + the existing ``[1]_`` over the proposed ``1_``), and ambivalent + over the proposed target change. + + That leaves David's problem of wanting to distinguish footnotes + and citations - and the only thing I can propose there is that + footnotes are numeric or # and citations are not (which, as a + human being, I can probably cope with!). + +From a reply by Paul Moore on 2002-03-01: + + I think the current footnote syntax ``[1]_`` is *exactly* the + right balance of distinctness vs unobtrusiveness. I very + definitely don't think this should change. + + On the target change, it doesn't matter much to me. + +From a further reply by Tony Ibbs on 2002-03-01, referring to the +"[1]" form and actual usage in email: + + Clearly this is a form people are used to, and thus we should + consider it strongly (in the same way that the usage of ``*..*`` + to mean emphasis was taken partly from email practise). + + Equally clearly, there is something "magical" for people in the + use of a similar form (i.e., ``[1]``) for both footnote reference + and footnote target - it seems natural to keep them similar. + + ... + + I think that this established plaintext usage leads me to strongly + believe we should retain square brackets at both ends of a + footnote. The markup of the reference end (a single trailing + underscore) seems about as minimal as we can get away with. The + markup of the target end depends on how one envisages the thing - + if ".." means "I am a target" (as I tend to see it), then that's + good, but one can also argue that the "_[1]" syntax has a neat + symmetry with the footnote reference itself, if one wishes (in + which case ".." presumably means "hidden/special" as David seems + to think, which is why one needs a ".." *and* a leading underline + for hyperlink targets. + +Given the persuading arguments voiced, we'll leave footnote & footnote +reference syntax alone. Except that these discussions gave rise to +the "auto-symbol footnote" concept, which has been added. Citations +and citation references have also been added. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/dev/rst/problems.txt b/docs/dev/rst/problems.txt new file mode 100644 index 000000000..f366bdf3f --- /dev/null +++ b/docs/dev/rst/problems.txt @@ -0,0 +1,761 @@ +============================== + Problems With StructuredText +============================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ + +There are several problems, unresolved issues, and areas of +controversy within StructuredText_ (Classic and Next Generation). In +order to resolve all these issues, this analysis brings all of the +issues out into the open, enumerates all the alternatives, and +proposes solutions to be incorporated into the reStructuredText_ +specification. + + +.. contents:: + + +Formal Specification +==================== + +The description in the original StructuredText.py has been criticized +for being vague. For practical purposes, "the code *is* the spec." +Tony Ibbs has been working on deducing a `detailed description`_ from +the documentation and code of StructuredTextNG_. Edward Loper's +STMinus_ is another attempt to formalize a spec. + +For this kind of a project, the specification should always precede +the code. Otherwise, the markup is a moving target which can never be +adopted as a standard. Of course, a specification may be revised +during lifetime of the code, but without a spec there is no visible +control and thus no confidence. + + +Understanding and Extending the Code +==================================== + +The original StructuredText_ is a dense mass of sparsely commented +code and inscrutable regular expressions. It was not designed to be +extended and is very difficult to understand. StructuredTextNG_ has +been designed to allow input (syntax) and output extensions, but its +documentation (both internal [comments & docstrings], and external) is +inadequate for the complexity of the code itself. + +For reStructuredText to become truly useful, perhaps even part of +Python's standard library, it must have clear, understandable +documentation and implementation code. For the implementation of +reStructuredText to be taken seriously, it must be a sterling example +of the potential of docstrings; the implementation must practice what +the specification preaches. + + +Section Structure via Indentation +================================= + +Setext_ required that body text be indented by 2 spaces. The original +StructuredText_ and StructuredTextNG_ require that section structure +be indicated through indentation, as "inspired by Python". For +certain structures with a very limited, local extent (such as lists, +block quotes, and literal blocks), indentation naturally indicates +structure or hierarchy. For sections (which may have a very large +extent), structure via indentation is unnecessary, unnatural and +ambiguous. Rather, the syntax of the section title *itself* should +indicate that it is a section title. + +The original StructuredText states that "A single-line paragraph whose +immediately succeeding paragraphs are lower level is treated as a +header." Requiring indentation in this way is: + +- Unnecessary. The vast majority of docstrings and standalone + documents will have no more than one level of section structure. + Requiring indentation for such docstrings is unnecessary and + irritating. + +- Unnatural. Most published works use title style (type size, face, + weight, and position) and/or section/subsection numbering rather + than indentation to indicate hierarchy. This is a tradition with a + very long history. + +- Ambiguous. A StructuredText header is indistinguishable from a + one-line paragraph followed by a block quote (precluding the use of + block quotes). Enumerated section titles are ambiguous (is it a + header? is it a list item?). Some additional adornment must be + required to confirm the line's role as a title, both to a parser and + to the human reader of the source text. + +Python's use of significant whitespace is a wonderful (if not +original) innovation, however requiring indentation in ordinary +written text is hypergeneralization. + +reStructuredText_ indicates section structure through title adornment +style (as exemplified by this document). This is far more natural. +In fact, it is already in widespread use in plain text documents, +including in Python's standard distribution (such as the toplevel +README_ file). + + +Character Escaping Mechanism +============================ + +No matter what characters are chosen for markup, some day someone will +want to write documentation *about* that markup or using markup +characters in a non-markup context. Therefore, any complete markup +language must have an escaping or encoding mechanism. For a +lightweight markup system, encoding mechanisms like SGML/XML's '*' +are out. So an escaping mechanism is in. However, with carefully +chosen markup, it should be necessary to use the escaping mechanism +only infrequently. + +reStructuredText_ needs an escaping mechanism: a way to treat +markup-significant characters as the characters themselves. Currently +there is no such mechanism (although ZWiki uses '!'). What are the +candidates? + +1. ``!`` (http://dev.zope.org/Members/jim/StructuredTextWiki/NGEscaping) +2. ``\`` +3. ``~`` +4. doubling of characters + +The best choice for this is the backslash (``\``). It's "the single +most popular escaping character in the world!", therefore familiar and +unsurprising. Since characters only need to be escaped under special +circumstances, which are typically those explaining technical +programming issues, the use of the backslash is natural and +understandable. Python docstrings can be raw (prefixed with an 'r', +as in 'r""'), which would obviate the need for gratuitous doubling-up +of backslashes. + +(On 2001-03-29 on the Doc-SIG mailing list, GvR endorsed backslash +escapes, saying, "'nuff said. Backslash it is." Although neither +legally binding nor irrevocable nor any kind of guarantee of anything, +it is a good sign.) + +The rule would be: An unescaped backslash followed by any markup +character escapes the character. The escaped character represents the +character itself, and is prevented from playing a role in any markup +interpretation. The backslash is removed from the output. A literal +backslash is represented by an "escaped backslash," two backslashes in +a row. + +A carefully constructed set of recognition rules for inline markup +will obviate the need for backslash-escapes in almost all cases; see +`Delimitation of Inline Markup`_ below. + +When an expression (requiring backslashes and other characters used +for markup) becomes too complicated and therefore unreadable, a +literal block may be used instead. Inside literal blocks, no markup +is recognized, therefore backslashes (for the purpose of escaping +markup) become unnecessary. + +We could allow backslashes preceding non-markup characters to remain +in the output. This would make describing regular expressions and +other uses of backslashes easier. However, this would complicate the +markup rules and would be confusing. + + +Blank Lines in Lists +==================== + +Oft-requested in Doc-SIG (the earliest reference is dated 1996-08-13) +is the ability to write lists without requiring blank lines between +items. In docstrings, space is at a premium. Authors want to convey +their API or usage information in as compact a form as possible. +StructuredText_ requires blank lines between all body elements, +including list items, even when boundaries are obvious from the markup +itself. + +In reStructuredText, blank lines are optional between list items. +However, in order to eliminate ambiguity, a blank line is required +before the first list item and after the last. Nested lists also +require blank lines before the list start and after the list end. + + +Bullet List Markup +================== + +StructuredText_ includes 'o' as a bullet character. This is dangerous +and counter to the language-independent nature of the markup. There +are many languages in which 'o' is a word. For example, in Spanish:: + + Llamame a la casa + o al trabajo. + + (Call me at home or at work.) + +And in Japanese (when romanized):: + + Senshuu no doyoubi ni tegami + o kakimashita. + + ([I] wrote a letter on Saturday last week.) + +If a paragraph containing an 'o' word wraps such that the 'o' is the +first text on a line, or if a paragraph begins with such a word, it +could be misinterpreted as a bullet list. + +In reStructuredText_, 'o' is not used as a bullet character. '-', +'*', and '+' are the possible bullet characters. + + +Enumerated List Markup +====================== + +StructuredText enumerated lists are allowed to begin with numbers and +letters followed by a period or right-parenthesis, then whitespace. +This has surprising consequences for writing styles. For example, +this is recognized as an enumerated list item by StructuredText:: + + Mr. Creosote. + +People will write enumerated lists in all different ways. It is folly +to try to come up with the "perfect" format for an enumerated list, +and limit the docstring parser's recognition to that one format only. + +Rather, the parser should recognize a variety of enumerator styles, +marking each block as a potential enumerated list item (PELI), and +interpret the enumerators of adjacent PELIs to decide whether they +make up a consistent enumerated list. + +If a PELI is labeled with a "1.", and is immediately followed by a +PELI labeled with a "2.", we've got an enumerated list. Or "(A)" +followed by "(B)". Or "i)" followed by "ii)", etc. The chances of +accidentally recognizing two adjacent and consistently labeled PELIs, +are acceptably small. + +For an enumerated list to be recognized, the following must be true: + +- the list must consist of multiple adjacent list items (2 or more) +- the enumerators must all have the same format +- the enumerators must be sequential + +It is also recommended that the enumerator of the first list item be +ordinal-1 ('1', 'A', 'a', 'I', or 'i'), as output formats may not be +able to begin a list at an arbitrary enumeration. + + +Definition List Markup +====================== + +StructuredText uses ' -- ' (whitespace, two hyphens, whitespace) on +the first line of a paragraph to indicate a definition list item. The +' -- ' serves to separate the term (on the left) from the definition +(on the right). + +Many people use ' -- ' as an em-dash in their text, conflicting with +the StructuredText usage. Although the Chicago Manual of Style says +that spaces should not be used around an em-dash, Peter Funk pointed +out that this is standard usage in German (according to the Duden, the +official German reference), and possibly in other languages as well. +The widespread use of ' -- ' precludes its use for definition lists; +it would violate the "unsurprising" criterion. + +A simpler, and at least equally visually distinctive construct +(proposed by Guido van Rossum, who incidentally is a frequent user of +' -- ') would do just as well:: + + term 1 + Definition. + + term 2 + Definition 2, paragraph 1. + + Definition 2, paragraph 2. + +A reStructuredText definition list item consists of a term and a +definition. A term is a simple one-line paragraph. A definition is a +block indented relative to the term, and may contain multiple +paragraphs and other body elements. No blank line precedes a +definition (this distinguishes definition lists from block quotes). + + +Literal Blocks +============== +The StructuredText_ specification has literal blocks indicated by +'example', 'examples', or '::' ending the preceding paragraph. STNG +only recognizes '::'; 'example'/'examples' are not implemented. This +is good; it fixes an unnecessary language dependency. The problem is +what to do with the sometimes- unwanted '::'. + +In reStructuredText_ '::' at the end of a paragraph indicates that +subsequent *indented* blocks are treated as literal text. No further +markup interpretation is done within literal blocks (not even +backslash-escapes). If the '::' is preceded by whitespace, '::' is +omitted from the output; if '::' was the sole content of a paragraph, +the entire paragraph is removed (no 'empty' paragraph remains). If +'::' is preceded by a non-whitespace character, '::' is replaced by +':' (i.e., the extra colon is removed). + +Thus, a section could begin with a literal block as follows:: + + Section Title + ------------- + + :: + + print "this is example literal" + + +Tables +====== + +The table markup scheme in classic StructuredText was horrible. Its +omission from StructuredTextNG is welcome, and its markup will not be +repeated here. However, tables themselves are useful in +documentation. Alternatives: + +1. This format is the most natural and obvious. It was independently + invented (no great feat of creation!), and later found to be the + format supported by the `Emacs table mode`_:: + + +------------+------------+------------+--------------+ + | Header 1 | Header 2 | Header 3 | Header 4 | + +============+============+============+==============+ + | Column 1 | Column 2 | Column 3 & 4 span (Row 1) | + +------------+------------+------------+--------------+ + | Column 1 & 2 span | Column 3 | - Column 4 | + +------------+------------+------------+ - Row 2 & 3 | + | 1 | 2 | 3 | - span | + +------------+------------+------------+--------------+ + + Tables are described with a visual outline made up of the + characters '-', '=', '|', and '+': + + - The hyphen ('-') is used for horizontal lines (row separators). + - The equals sign ('=') is optionally used as a header separator + (as of version 1.5.24, this is not supported by the Emacs table + mode). + - The vertical bar ('|') is used for for vertical lines (column + separators). + - The plus sign ('+') is used for intersections of horizontal and + vertical lines. + + Row and column spans are possible simply by omitting the column or + row separators, respectively. The header row separator must be + complete; in other words, a header cell may not span into the table + body. Each cell contains body elements, and may have multiple + paragraphs, lists, etc. Initial spaces for a left margin are + allowed; the first line of text in a cell determines its left + margin. + +2. Below is a minimalist possibility. It may be better suited to + manual input than alternative #1, but there is no Emacs editing + mode available. One disadvantage is that it resembles section + titles; a one-column table would look exactly like section & + subsection titles. :: + + ============ ============ ============ ============== + Header 1 Header 2 Header 3 Header 4 + ============ ============ ============ ============== + Column 1 Column 2 Column 3 & 4 span (Row 1) + ------------ ------------ --------------------------- + Column 1 & 2 span Column 3 - Column 4 + ------------------------- ------------ - Row 2 & 3 + 1 2 3 - span + ============ ============ ============ ============== + + The table begins with a top border of equals signs with a space at + each column boundary (regardless of spans). Each row is + underlined. Internal row separators are underlines of '-', with + spaces at column boundaries. The last of the optional head rows is + underlined with '=', again with spaces at column boundaries. + Column spans have no spaces in their underline. Row spans simply + lack an underline at the row boundary. The bottom boundary of the + table consists of '=' underlines. A blank line is required + following a table. + +Alternative #1 is the choice adopted by reStructuredText. + + +Delimitation of Inline Markup +============================= + +StructuredText specifies that inline markup must begin with +whitespace, precluding such constructs as parenthesized or quoted +emphatic text:: + + "**What?**" she cried. (*exit stage left*) + +The `reStructuredText markup specification`_ allows for such +constructs and disambiguates inline markup through a set of +recognition rules. These recognition rules define the context of +markup start-strings and end-strings, allowing markup characters to be +used in most non-markup contexts without a problem (or a backslash). +So we can say, "Use asterisks (*) around words or phrases to +*emphasisze* them." The '(*)' will not be recognized as markup. This +reduces the need for markup escaping to the point where an escape +character is *almost* (but not quite!) unnecessary. + + +Underlining +=========== + +StructuredText uses '_text_' to indicate underlining. To quote David +Ascher in his 2000-01-21 Doc-SIG mailing list post, "Docstring +grammar: a very revised proposal": + + The tagging of underlined text with _'s is suboptimal. Underlines + shouldn't be used from a typographic perspective (underlines were + designed to be used in manuscripts to communicate to the + typesetter that the text should be italicized -- no well-typeset + book ever uses underlines), and conflict with double-underscored + Python variable names (__init__ and the like), which would get + truncated and underlined when that effect is not desired. Note + that while *complete* markup would prevent that truncation + ('__init__'), I think of docstring markups much like I think of + type annotations -- they should be optional and above all do no + harm. In this case the underline markup does harm. + +Underlining is not part of the reStructuredText specification. + + +Inline Literals +=============== + +StructuredText's markup for inline literals (text left as-is, +verbatim, usually in a monospaced font; as in HTML <TT>) is single +quotes ('literals'). The problem with single quotes is that they are +too often used for other purposes: + +- Apostrophes: "Don't blame me, 'cause it ain't mine, it's Chris'."; + +- Quoting text: + + First Bruce: "Well Bruce, I heard the prime minister use it. + 'S'hot enough to boil a monkey's bum in 'ere your Majesty,' he + said, and she smiled quietly to herself." + + In the UK, single quotes are used for dialogue in published works. + +- String literals: s = '' + +Alternatives:: + + 'text' \'text\' ''text'' "text" \"text\" ""text"" + #text# @text@ `text` ^text^ ``text'' ``text`` + +The examples below contain inline literals, quoted text, and +apostrophes. Each example should evaluate to the following HTML:: + + Some <TT>code</TT>, with a 'quote', "double", ain't it grand? + Does <TT>a[b] = 'c' + "d" + `2^3`</TT> work? + + 0. Some code, with a quote, double, ain't it grand? + Does a[b] = 'c' + "d" + `2^3` work? + 1. Some 'code', with a \'quote\', "double", ain\'t it grand? + Does 'a[b] = \'c\' + "d" + `2^3`' work? + 2. Some \'code\', with a 'quote', "double", ain't it grand? + Does \'a[b] = 'c' + "d" + `2^3`\' work? + 3. Some ''code'', with a 'quote', "double", ain't it grand? + Does ''a[b] = 'c' + "d" + `2^3`'' work? + 4. Some "code", with a 'quote', \"double\", ain't it grand? + Does "a[b] = 'c' + "d" + `2^3`" work? + 5. Some \"code\", with a 'quote', "double", ain't it grand? + Does \"a[b] = 'c' + "d" + `2^3`\" work? + 6. Some ""code"", with a 'quote', "double", ain't it grand? + Does ""a[b] = 'c' + "d" + `2^3`"" work? + 7. Some #code#, with a 'quote', "double", ain't it grand? + Does #a[b] = 'c' + "d" + `2^3`# work? + 8. Some @code@, with a 'quote', "double", ain't it grand? + Does @a[b] = 'c' + "d" + `2^3`@ work? + 9. Some `code`, with a 'quote', "double", ain't it grand? + Does `a[b] = 'c' + "d" + \`2^3\`` work? + 10. Some ^code^, with a 'quote', "double", ain't it grand? + Does ^a[b] = 'c' + "d" + `2\^3`^ work? + 11. Some ``code'', with a 'quote', "double", ain't it grand? + Does ``a[b] = 'c' + "d" + `2^3`'' work? + 12. Some ``code``, with a 'quote', "double", ain't it grand? + Does ``a[b] = 'c' + "d" + `2^3\``` work? + +Backquotes (#9 & #12) are the best choice. They are unobtrusive and +relatviely rarely used (more rarely than ' or ", anyhow). Backquotes +have the connotation of 'quotes', which other options (like carets, +#10) don't. + +Analogously with ``*emph*`` & ``**strong**``, double-backquotes (#12) +could be used for inline literals. If single-backquotes are used for +'interpreted text' (context-sensitive domain-specific descriptive +markup) such as function name hyperlinks in Python docstrings, then +double-backquotes could be used for absolute-literals, wherein no +processing whatsoever takes place. An advantage of double-backquotes +would be that backslash-escaping would no longer be necessary for +embedded single-backquotes; however, embedded double-backquotes (in an +end-string context) would be illegal. See `Backquotes in +Phrase-Links`__ in `Record of reStructuredText Syntax Alternatives`__. + +__ alternatives.html#backquotes-in-phrase-links +__ alternatives.html + +Alternative choices are carets (#10) and TeX-style quotes (#11). For +examples of TeX-style quoting, see +http://www.zope.org/Members/jim/StructuredTextWiki/CustomizingTheDocumentProcessor. + +Some existing uses of backquotes: + +1. As a synonym for repr() in Python. +2. For command-interpolation in shell scripts. +3. Used as open-quotes in TeX code (and carried over into plaintext + by TeXies). + +The inline markup start-string and end-string recognition rules +defined by the `reStructuredText markup specification`_ would allow +all of these cases inside inline literals, with very few exceptions. +As a fallback, literal blocks could handle all cases. + +Outside of inline literals, the above uses of backquotes would require +backslash-escaping. However, these are all prime examples of text +that should be marked up with inline literals. + +If either backquotes or straight single-quotes are used as markup, +TeX-quotes are too troublesome to support, so no special-casing of +TeX-quotes should be done (at least at first). If TeX-quotes have to +be used outside of literals, a single backslash-escaped would suffice: +\``TeX quote''. Ugly, true, but very infrequently used. + +Using literal blocks is a fallback option which removes the need for +backslash-escaping:: + + like this:: + + Here, we can do ``absolutely'' anything `'`'\|/|\ we like! + +No mechanism for inline literals is perfect, just as no escaping +mechanism is perfect. No matter what we use, complicated inline +expressions involving the inline literal quote and/or the backslash +will end up looking ugly. We can only choose the least often ugly +option. + +reStructuredText will use double backquotes for inline literals, and +single backqoutes for interpreted text. + + +Hyperlinks +========== + +There are three forms of hyperlink currently in StructuredText_: + +1. (Absolute & relative URIs.) Text enclosed by double quotes + followed by a colon, a URI, and concluded by punctuation plus white + space, or just white space, is treated as a hyperlink:: + + "Python":http://www.python.org/ + +2. (Absolute URIs only.) Text enclosed by double quotes followed by a + comma, one or more spaces, an absolute URI and concluded by + punctuation plus white space, or just white space, is treated as a + hyperlink:: + + "mail me", mailto:me@mail.com + +3. (Endnotes.) Text enclosed by brackets link to an endnote at the + end of the document: at the beginning of the line, two dots, a + space, and the same text in brackets, followed by the end note + itself:: + + Please refer to the fine manual [GVR2001]. + + .. [GVR2001] Python Documentation, Release 2.1, van Rossum, + Drake, et al., http://www.python.org/doc/ + +The problem with forms 1 and 2 is that they are neither intuitive nor +unobtrusive (they break design goals 5 & 2). They overload +double-quotes, which are too often used in ordinary text (potentially +breaking design goal 4). The brackets in form 3 are also too common +in ordinary text (such as [nested] asides and Python lists like [12]). + +Alternatives: + +1. Have no special markup for hyperlinks. + +2. A. Interpret and mark up hyperlinks as any contiguous text + containing '://' or ':...@' (absolute URI) or '@' (email + address) after an alphanumeric word. To de-emphasize the URI, + simply enclose it in parentheses: + + Python (http://www.python.org/) + + B. Leave special hyperlink markup as a domain-specific extension. + Hyperlinks in ordinary reStructuredText documents would be + required to be standalone (i.e. the URI text inline in the + document text). Processed hyperlinks (where the URI text is + hidden behind the link) are important enough to warrant syntax. + +3. The original Setext_ introduced a mechanism of indirect hyperlinks. + A source link word ('hot word') in the text was given a trailing + underscore:: + + Here is some text with a hyperlink_ built in. + + The hyperlink itself appeared at the end of the document on a line + by itself, beginning with two dots, a space, the link word with a + leading underscore, whitespace, and the URI itself:: + + .. _hyperlink http://www.123.xyz + + Setext used ``underscores_instead_of_spaces_`` for phrase links. + +With some modification, alternative 3 best satisfies the design goals. +It has the advantage of being readable and relatively unobtrusive. +Since each source link must match up to a target, the odd variable +ending in an underscore can be spared being marked up (although it +should generate a "no such link target" warning). The only +disadvantage is that phrase-links aren't possible without some +obtrusive syntax. + +We could achieve phrase-links if we enclose the link text: + +1. in double quotes:: + + "like this"_ + +2. in brackets:: + + [like this]_ + +3. or in backquotes:: + + `like this`_ + +Each gives us somewhat obtrusive markup, but that is unavoidable. The +bracketed syntax (#2) is reminiscent of links on many web pages +(intuitive), although it is somewhat obtrusive. Alternative #3 is +much less obtrusive, and is consistent with interpreted text: the +trailing underscore indicates the interpretation of the phrase, as a +hyperlink. #3 also disambiguates hyperlinks from footnote references. +Alternative #3 wins. + +The same trailing underscore markup can also be used for footnote and +citation references, removing the problem with ordinary bracketed text +and Python lists:: + + Please refer to the fine manual [GVR2000]_. + + .. [GVR2000] Python Documentation, van Rossum, Drake, et al., + http://www.python.org/doc/ + +The two-dots-and-a-space syntax was generalized by Setext for +comments, which are removed from the (visible) processed output. +reStructuredText uses this syntax for comments, footnotes, and link +target, collectively termed "explicit markup". For link targets, in +order to eliminate ambiguity with comments and footnotes, +reStructuredText specifies that a colon always follow the link target +word/phrase. The colon denotes 'maps to'. There is no reason to +restrict target links to the end of the document; they could just as +easily be interspersed. + +Internal hyperlinks (links from one point to another within a single +document) can be expressed by a source link as before, and a target +link with a colon but no URI. In effect, these targets 'map to' the +element immediately following. + +As an added bonus, we now have a perfect candidate for +reStructuredText directives, a simple extension mechanism: explicit +markup containing a single word followed by two colons and whitespace. +The interpretation of subsequent data on the directive line or +following is directive-dependent. + +To summarize:: + + .. This is a comment. + + .. The line below is an example of a directive. + .. version:: 1 + + This is a footnote [1]_. + + This internal hyperlink will take us to the footnotes_ area below. + + Here is a one-word_ external hyperlink. + + Here is `a hyperlink phrase`_. + + .. _footnotes: + .. [1] Footnote text goes here. + + .. external hyperlink target mappings: + .. _one-word: http://www.123.xyz + .. _a hyperlink phrase: http://www.123.xyz + +The presence or absence of a colon after the target link +differentiates an indirect hyperlink from a footnote, respectively. A +footnote requires brackets. Backquotes around a target link word or +phrase are required if the phrase contains a colon, optional +otherwise. + +Below are examples using no markup, the two StructuredText hypertext +styles, and the reStructuredText hypertext style. Each example +contains an indirect link, a direct link, a footnote/endnote, and +bracketed text. In HTML, each example should evaluate to:: + + <P>A <A HREF="http://spam.org">URI</A>, see <A HREF="#eggs2000"> + [eggs2000]</A> (in Bacon [Publisher]). Also see + <A HREF="http://eggs.org">http://eggs.org</A>.</P> + + <P><A NAME="eggs2000">[eggs2000]</A> "Spam, Spam, Spam, Eggs, + Bacon, and Spam"</P> + +1. No markup:: + + A URI http://spam.org, see eggs2000 (in Bacon [Publisher]). + Also see http://eggs.org. + + eggs2000 "Spam, Spam, Spam, Eggs, Bacon, and Spam" + +2. StructuredText absolute/relative URI syntax + ("text":http://www.url.org):: + + A "URI":http://spam.org, see [eggs2000] (in Bacon [Publisher]). + Also see "http://eggs.org":http://eggs.org. + + .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" + + Note that StructuredText does not recognize standalone URIs, + forcing doubling up as shown in the second line of the example + above. + +3. StructuredText absolute-only URI syntax + ("text", mailto:you@your.com):: + + A "URI", http://spam.org, see [eggs2000] (in Bacon + [Publisher]). Also see "http://eggs.org", http://eggs.org. + + .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" + +4. reStructuredText syntax:: + + 4. A URI_, see [eggs2000]_ (in Bacon [Publisher]). + Also see http://eggs.org. + + .. _URI: http:/spam.org + .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" + +The bracketed text '[Publisher]' may be problematic with +StructuredText (syntax 2 & 3). + +reStructuredText's syntax (#4) is definitely the most readable. The +text is separated from the link URI and the footnote, resulting in +cleanly readable text. + +.. _StructuredText: + http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage +.. _Setext: http://docutils.sourceforge.net/mirror/setext.html +.. _reStructuredText: http://docutils.sourceforge.net/rst.html +.. _detailed description: + http://www.tibsnjoan.demon.co.uk/STNG-format.html +.. _STMinus: http://www.cis.upenn.edu/~edloper/pydoc/stminus.html +.. _StructuredTextNG: + http://dev.zope.org/Members/jim/StructuredTextWiki/StructuredTextNG +.. _README: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/ + python/python/dist/src/README +.. _Emacs table mode: http://table.sourceforge.net/ +.. _reStructuredText Markup Specification: reStructuredText.html + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/dev/todo.txt b/docs/dev/todo.txt new file mode 100644 index 000000000..311751f37 --- /dev/null +++ b/docs/dev/todo.txt @@ -0,0 +1,385 @@ +================ + Docutils Notes +================ +:Date: $Date$ +:Revision: $Revision$ + +.. contents:: + +To Do +===== + +General +------- + +- Document! + + - Internal module documentation. + + - User docs. + + - Doctree nodes (DTD element) semantics: + + - External (public) attributes (node.attributes). + - Internal attributes (node.*). + - Linking mechanism. + +- Refactor + + - Rename methods & variables according to the `coding conventions`_ + below. + + - The name->id conversion and hyperlink resolution code needs to be + checked for correctness and refactored. I'm afraid it's a bit of + a spaghetti mess now. + +- Add validation? See http://pytrex.sourceforge.net, RELAX NG. + +- Ask Python-dev for opinions (GvR for a pronouncement) on special + variables (__author__, __version__, etc.): convenience vs. namespace + pollution. Ask opinions on whether or not Docutils should recognize + & use them. + +- Provide a mechanism to pass options to Readers, Writers, and Parsers + through docutils.core.publish/Publisher? Or create custom + Reader/Writer/Parser objects first, and pass *them* to + publish/Publisher? + +- In reader.get_reader_class (& parser & writer too), should we be + importing 'standalone' or 'docutils.readers.standalone'? (This would + avoid importing top-level modules if the module name is not in + docutils/readers. Potential nastiness.) + +- Perhaps store a name->id mapping file? This could be stored + permanently, read by subsequent processing runs, and updated with + new entries. ("Persistent ID mapping"?) + +- The "Docutils System Messages" section appears even if no actual + system messages are there. They must be below the threshold. The + transform should be fixed. + +- TOC transform: use alt-text for inline images. + + +Specification +------------- + +- Complete PEP 258 Docutils Design Specification. + + - Fill in the blanks in API details. + + - Specify the nodes.py internal data structure implementation. + + [Tibs:] Eventually we need to have direct documentation in + there on how it all hangs together - the DTD is not enough + (indeed, is it still meant to be correct? [Yes, it is.]). + +- Rework PEP 257, separating style from spec from tools, wrt Docutils? + See Doc-SIG from 2001-06-19/20. + +- Add layout component to framework? Or part of the formatter? + +- Once doctree.txt is fleshed out, how about breaking (most of) it up + and putting it into nodes.py as docstrings? + + +reStructuredText Parser +----------------------- + +- Add motivation sections for constructs in spec. + +- Allow very long titles (on two or more lines)? + +- And for the sake of completeness, should definition list terms be + allowed to be very long (two or more lines) also? + +- Allow hyperlink references to targets in other documents? Not in an + HTML-centric way, though (it's trivial to say + ``http://www.whatever.com/doc#name``, and useless in non-HTML + contexts). XLink/XPointer? ``.. baseref::``? See Doc-SIG + 2001-08-10. + +- Add character processing? For example: + + - ``--`` -> em-dash (or ``--`` -> en-dash, and ``---`` -> em-dash). + (Look for pre-existing conventions.) + - Convert quotes to curly quote entities. (Essentially impossible + for HTML? Unnecessary for TeX. An output issue?) + - Various forms of ``:-)`` to smiley icons. + - ``"\ "`` -> . + - Escaped newlines -> <BR>. + - Escaped period or quote as a disappearing catalyst to allow + character-level inline markup? + - Others? + + How to represent character entities in the text though? Probably as + Unicode. + + Which component is responsible for this, the parser, the reader, or + the writer? + +- Implement the header row separator modification to table.el. (Wrote + to Takaaki Ota & the table.el mailing list on 2001-08-12, suggesting + support for '=====' header rows. On 2001-08-17 he replied, saying + he'd put it on his to-do list, but "don't hold your breath".) + +- Tony says inline markup rule 7 could do with a *little* more + exposition in the spec, to make clear what is going on for people + with head colds. + +- Alan Jaffray suggested (and I agree) that it would be sensible to: + + - have a directive to specify a default role for interpreted text + - allow the reST processor to take an argument for the default role + - issue a warning when processing documents with no default role + which contain interpreted text with no explicitly specified role + +- Fix the parser's indentation handling to conform with the stricter + definition in the spec. (Explicit markup blocks should be strict or + forgiving?) + +- Tighten up the spec for indentation of "constructs using complex + markers": field lists and option lists? Bodies may begin on the + same line as the marker or on a subsequent line (with blank lines + optional). Require that for bodies beginning on the same line as + the marker, all lines be in strict alignment. Currently, this is + acceptable:: + + :Field-name-of-medium-length: Field body beginning on the same + line as the field name. + + This proposal would make the above example illegal, instead + requiring strict alignment. A field body may either begin on the + same line:: + + :Field-name-of-medium-length: Field body beginning on the same + line as the field name. + + Or it may begin on a subsequent line:: + + :Field-name-of-medium-length: + Field body beginning on a line subsequent to that of the + field name. + + This would be especially relevant in degenerate cases like this:: + + :Number-of-African-swallows-requried-to-carry-a-coconut: + It would be very difficult to align the field body with + the left edge of the first line if it began on the same + line as the field name. + +- Allow syntax constructs to be added or disabled at run-time. + +- Make footnotes two-way, GNU-style? What if there are multiple + references to a single footnote? + +- Add RFC-2822 header parsing (for PEP, email Readers). + +- Change ``.. meta::`` to use a "pending" element, only activated for + HTML writers. + +- Allow for variant styles by interpreting indented lists as if they + weren't indented? For example, currently the list below will be + parsed as a list within a block quote:: + + paragraph + + * list item 1 + * list item 2 + + But a lot of people seem to write that way, and HTML browsers make + it look as if that's the way it should be. The parser could check + the contents of block quotes, and if they contain only a single + list, remove the block quote wrapper. There would be two problems: + + 1. What if we actually *do* want a list inside a block quote? + + 2. What if such a list comes immediately after an indented + construct, such as a literal block? + + Both could be solved using empty comments (problem 2 already exists + for a block quote after a literal block). But that's a hack. + + See the Doc-SIG discussion starting 2001-04-18 with Ed Loper's + "Structuring: a summary; and an attempt at EBNF", item 4. + +- Produce a better system message when a list ends abruptly. Input:: + + -1 Option "1" + -2 + + Produces:: + + Reporter: WARNING (2) Unindent without blank line at line 2. + + But it should produce:: + + Reporter: WARNING (2) List ends without blank line at line 2. + + +Directives +`````````` + +- Allow directives to be added at run-time. + +- Use the language module for directive attribute names? + +- Add more attributes to the image directive: align, border? + +- Implement directives: + + - html.imagemap + + - components.endnotes, .citations, .topic, .sectnum (section + numbering; add support to .contents; could be cmdline option also) + + - misc.raw + + - misc.include: ``#include`` one file in another. But how to + parse wrt sections, reference names, conflicts? + + - misc.exec: Execute Python code & insert the results. Perhaps + dangerous? + + - misc.eval: Evaluate an expression & insert the text. At parse + time or at substitution time? + + - block.qa: Questions & Answers. Implement as a generic two-column + marked list? Or as a standalone construct? + + - block.columns: Multi-column table/list, with number of columns as + argument. + + - block.verse: Paragraphs with linebreaks preserved. A directive + would be easy; what about a literal-block-like prefix, perhaps + ';;'? E.g.:: + + Take it away, Eric the orchestra leader! ;; + + Half a bee, + Philosophically, + Must ipso-facto + Half not be. + You see? + + ... + + - colorize.python: Colorize Python code. Fine for HTML output, but + what about other formats? Revert to a literal block? Do we need + some kind of "alternate" mechanism? Perhaps use a "pending" + transform, which could switch its output based on the "format" in + use. Use a factory function "transformFF()" which returns either + "HTMLTransform()" instance or "GenericTransform" instance? + + - text.date: Datestamp. For substitutions. + + - Combined with misc.include, implement canned macros? + + +Unimplemented Transforms +------------------------ + +- Footnote Gathering + + Collect and move footnotes to the end of a document. + +- Hyperlink Target Gathering + + It probably comes in two phases, because in a Python context we need + to *resolve* them on a per-docstring basis [do we? --DG], but if the + user is trying to do the callout form of presentation, they would + then want to group them all at the end of the document. + +- Reference Merging + + When merging two or more subdocuments (such as docstrings), + conflicting references may need to be resolved. There may be: + + - duplicate reference and/or substitution names that need to be made + unique; and/or + - duplicate footnote numbers that need to be renumbered. + + Should this be done before or after reference-resolving transforms + are applied? What about references from within one subdocument to + inside another? + +- Document Splitting + + If the processed document is written to multiple files (possibly in + a directory tree), it will need to be split up. References will + have to be adjusted. + + (HTML only?) + +- Navigation + + If a document is split up, each segment will need navigation links: + parent, children (small TOC), previous (preorder), next (preorder). + +- Index + + +HTML Writer +----------- + +- Considerations for an HTML Writer [#]_: + + - Boolean attributes. ``<element boolean>`` is good, ``<element + boolean="boolean">`` is bad. Use a special value in attribute + mappings, such as ``None``? + + - Escape double-dashes inside comments. + + - Put the language code into an appropriate element's LANG + attribute (<HTML>?). + + - Docutils identifiers (the "class" and "id" attributes) will + conform to the regular expression ``[a-z][-a-z0-9]*``. See + ``docutils.utils.id()``. + + .. _HTML 4.01 spec: http://www.w3.org/TR/html401 + .. _CSS1 spec: http://www.w3.org/TR/REC-CSS1 + .. [#] Source: `HTML 4.0 in Netscape and Explorer`__. + __ http://www.webreference.com/dev/html4nsie/index.html + +- Allow for style sheet info to be passed in, either as a <LINK>, or + as embedded style info. + +- Construct a templating system, as in ht2html/yaptu, using directives + and substitutions for dynamic stuff. + +- Improve the granularity of document parts in the HTML writer, so + that one could just grab the parts needed. + + +Coding Conventions +================== + +This project shall follow the generic coding conventions as specified +in the `Style Guide for Python Code`__ and `Docstring Conventions`__ +PEPs, with the following clarifications: + +- 4 spaces per indentation level. No tabs. +- No one-liner compound statements (i.e., no ``if x: return``: use two + lines & indentation), except for degenerate class or method + definitions (i.e., ``class X: pass`` is O.K.). +- Lines should be no more than 78 or 79 characters long. +- "CamelCase" shall be used for class names. +- Use "lowercase" or "lowercase_with_underscores" for function, + method, and variable names. For short names, maximum two joined + words, use lowercase (e.g. 'tagname'). For long names with three or + more joined words, or where it's hard to parse the split between two + words, use lowercase_with_underscores (e.g., 'note_explicit_target', + 'explicit_target'). + +__ http://www.python.org/peps/pep-0008.html +__ http://www.python.org/peps/pep-0257.html + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/peps/pep-0256.txt b/docs/peps/pep-0256.txt new file mode 100644 index 000000000..92c8e7f61 --- /dev/null +++ b/docs/peps/pep-0256.txt @@ -0,0 +1,253 @@ +PEP: 256 +Title: Docstring Processing System Framework +Version: $Revision$ +Last-Modified: $Date$ +Author: goodger@users.sourceforge.net (David Goodger) +Discussions-To: doc-sig@python.org +Status: Draft +Type: Standards Track +Created: 01-Jun-2001 +Post-History: 13-Jun-2001 + + +Abstract + + Python lends itself to inline documentation. With its built-in + docstring syntax, a limited form of Literate Programming [1]_ is + easy to do in Python. However, there are no satisfactory standard + tools for extracting and processing Python docstrings. The lack + of a standard toolset is a significant gap in Python's + infrastructure; this PEP aims to fill the gap. + + The issues surrounding docstring processing have been contentious + and difficult to resolve. This PEP proposes a generic Docstring + Processing System (DPS) framework, which separates out the + components (program and conceptual), enabling the resolution of + individual issues either through consensus (one solution) or + through divergence (many). It promotes standard interfaces which + will allow a variety of plug-in components (input context readers, + markup parsers, and output format writers) to be used. + + The concepts of a DPS framework are presented independently of + implementation details. + + +Rationale + + There are standard inline documentation systems for some other + languages. For example, Perl has POD [2]_ and Java has Javadoc + [3]_, but neither of these mesh with the Pythonic way. POD syntax + is very explicit, but takes after Perl in terms of readability. + Javadoc is HTML-centric; except for '@field' tags, raw HTML is + used for markup. There are also general tools such as Autoduck + [4]_ and Web (Tangle & Weave) [5]_, useful for multiple languages. + + There have been many attempts to write auto-documentation systems + for Python (not an exhaustive list): + + - Marc-Andre Lemburg's doc.py [6]_ + + - Daniel Larsson's pythondoc & gendoc [7]_ + + - Doug Hellmann's HappyDoc [8]_ + + - Laurence Tratt's Crystal [9]_ + + - Ka-Ping Yee's htmldoc & pydoc [10]_ (pydoc.py is now part of the + Python standard library; see below) + + - Tony Ibbs' docutils [11]_ + + - Edward Loper's STminus formalization and related efforts [12]_ + + These systems, each with different goals, have had varying degrees + of success. A problem with many of the above systems was + over-ambition combined with inflexibility. They provided a + self-contained set of components: a docstring extraction system, + a markup parser, an internal processing system and one or more + output format writers. Inevitably, one or more aspects of each + system had serious shortcomings, and they were not easily extended + or modified, preventing them from being adopted as standard tools. + + It has become clear (to this author, at least) that the "all or + nothing" approach cannot succeed, since no monolithic + self-contained system could possibly be agreed upon by all + interested parties. A modular component approach designed for + extension, where components may be multiply implemented, may be + the only chance for success. By separating out the issues, we can + form consensus more easily (smaller fights ;-), and accept + divergence more readily. + + Each of the components of a docstring processing system should be + developed independently. A 'best of breed' system should be + chosen, either merged from existing systems, and/or developed + anew. This system should be included in Python's standard + library. + + +PyDoc & Other Existing Systems + + PyDoc became part of the Python standard library as of release + 2.1. It extracts and displays docstrings from within the Python + interactive interpreter, from the shell command line, and from a + GUI window into a web browser (HTML). Although a very useful + tool, PyDoc has several deficiencies, including: + + - In the case of the GUI/HTML, except for some heuristic + hyperlinking of identifier names, no formatting of the + docstrings is done. They are presented within <p><small><tt> + tags to avoid unwanted line wrapping. Unfortunately, the result + is not attractive. + + - PyDoc extracts docstrings and structural information (class + identifiers, method signatures, etc.) from imported module + objects. There are security issues involved with importing + untrusted code. Also, information from the source is lost when + importing, such as comments, "additional docstrings" (string + literals in non-docstring contexts; see PEP 258 [13]_), and the + order of definitions. + + The functionality proposed in this PEP could be added to or used + by PyDoc when serving HTML pages. The proposed docstring + processing system's functionality is much more than PyDoc needs in + its current form. Either an independent tool will be developed + (which PyDoc may or may not use), or PyDoc could be expanded to + encompass this functionality and *become* the docstring processing + system (or one such system). That decision is beyond the scope of + this PEP. + + Similarly for other existing docstring processing systems, their + authors may or may not choose compatibility with this framework. + However, if this framework is accepted and adopted as the Python + standard, compatibility will become an important consideration in + these systems' future. + + +Specification + + The docstring processing system framework consists of components, + as follows:: + + 1. Docstring conventions. Documents issues such as: + + - What should be documented where. + + - First line is a one-line synopsis. + + PEP 257, Docstring Conventions [14]_, documents some of these + issues. + + 2. Docstring processing system design specification. Documents + issues such as: + + - High-level spec: what a DPS does. + + - Command-line interface for executable script. + + - System Python API. + + - Docstring extraction rules. + + - Readers, which encapsulate the input context . + + - Parsers. + + - Document tree: the intermediate internal data structure. The + output of the Parser and Reader, and the input to the Writer + all share the same data structure. + + - Transforms, which modify the document tree. + + - Writers for output formats. + + - Distributors, which handle output management (one file, many + files, or objects in memory). + + These issues are applicable to any docstring processing system + implementation. PEP 258, Docutils Design Specification [13 ]_, + documents these issues. + + 3. Docstring processing system implementation. + + 4. Input markup specifications: docstring syntax. PEP 2xx, + reStructuredText Standard Docstring Format [15]_, proposes a + standard syntax. + + 5. Input parser implementations. + + 6. Input context readers ("modes": Python source code, PEP, + standalone text file, email, etc.) and implementations. + + 7. Output formats (HTML, XML, TeX, DocBook, info, etc.) and writer + implementations. + + Components 1, 2/3, and 4/5 are the subject of individual companion + PEPs. If there is another implementation of the framework or + syntax/parser, additional PEPs may be required. Multiple + implementations of each of components 6 and 7 will be required; + the PEP mechanism may be overkill for these components. + + +Project Web Site + + A SourceForge project has been set up for this work at + http://docutils.sourceforge.net/. + + +References and Footnotes + + [1] http://www.literateprogramming.com/ + + [2] Perl "Plain Old Documentation" + http://www.perldoc.com/perl5.6/pod/perlpod.html + + [3] http://java.sun.com/j2se/javadoc/ + + [4] http://www.helpmaster.com/hlp-developmentaids-autoduck.htm + + [5] http://www-cs-faculty.stanford.edu/~knuth/cweb.html + + [6] http://www.lemburg.com/files/python/SoftwareDescriptions.html#doc.py + + [7] http://starship.python.net/crew/danilo/pythondoc/ + + [8] http://happydoc.sourceforge.net/ + + [9] http://www.btinternet.com/~tratt/comp/python/crystal/ + + [10] http://www.python.org/doc/current/lib/module-pydoc.html + + [11] http://homepage.ntlworld.com/tibsnjoan/docutils/ + + [12] http://www.cis.upenn.edu/~edloper/pydoc/ + + [13] PEP 258, Docutils Design Specification, Goodger + http://www.python.org/peps/pep-0258.html + + [14] PEP 257, Docstring Conventions, Goodger, Van Rossum + http://www.python.org/peps/pep-0257.html + + [15] PEP 287, reStructuredText Standard Docstring Format, Goodger + http://www.python.org/peps/pep-0287.html + + [16] http://www.python.org/sigs/doc-sig/ + + +Copyright + + This document has been placed in the public domain. + + +Acknowledgements + + This document borrows ideas from the archives of the Python + Doc-SIG [16]_. Thanks to all members past & present. + + + +Local Variables: +mode: indented-text +indent-tabs-mode: nil +fill-column: 70 +sentence-end-double-space: t +End: diff --git a/docs/peps/pep-0257.txt b/docs/peps/pep-0257.txt new file mode 100644 index 000000000..48425d9cc --- /dev/null +++ b/docs/peps/pep-0257.txt @@ -0,0 +1,248 @@ +PEP: 257 +Title: Docstring Conventions +Version: $Revision$ +Last-Modified: $Date$ +Author: goodger@users.sourceforge.net (David Goodger), + guido@python.org (Guido van Rossum) +Discussions-To: doc-sig@python.org +Status: Active +Type: Informational +Created: 29-May-2001 +Post-History: 13-Jun-2001 + + +Abstract + + This PEP documents the semantics and conventions associated with + Python docstrings. + + +Rationale + + The aim of this PEP is to standardize the high-level structure of + docstrings: what they should contain, and how to say it (without + touching on any markup syntax within docstrings). The PEP + contains conventions, not laws or syntax. + + "A universal convention supplies all of maintainability, + clarity, consistency, and a foundation for good programming + habits too. What it doesn't do is insist that you follow it + against your will. That's Python!" + + --Tim Peters on comp.lang.python, 2001-06-16 + + If you violate the conventions, the worst you'll get is some dirty + looks. But some software (such as the Docutils docstring + processing system [1] [2]) will be aware of the conventions, so + following them will get you the best results. + + +Specification + + What is a Docstring? + -------------------- + + A docstring is a string literal that occurs as the first statement + in a module, function, class, or method definition. Such a + docstring becomes the __doc__ special attribute of that object. + + All modules should normally have docstrings, and all functions and + classes exported by a module should also have docstrings. Public + methods (including the __init__ constructor) should also have + docstrings. A package may be documented in the module docstring + of the __init__.py file in the package directory. + + String literals occurring elsewhere in Python code may also act as + documentation. They are not recognized by the Python bytecode + compiler and are not accessible as runtime object attributes + (i.e. not assigned to __doc__), but two types of extra docstrings + may be extracted by software tools: + + 1. String literals occurring immediately after a simple assignment + at the top level of a module, class, or __init__ method + are called "attribute docstrings". + + 2. String literals occurring immediately after another docstring + are called "additional docstrings". + + Please see PEP 258 "Docutils Design Specification" [2] for a + detailed description of attribute and additional docstrings. + + XXX Mention docstrings of 2.2 properties. + + For consistency, always use """triple double quotes""" around + docstrings. Use r"""raw triple double quotes""" if you use any + backslashes in your docstrings. For Unicode docstrings, use + u"""Unicode triple-quoted strings""". + + There are two forms of docstrings: one-liners and multi-line + docstrings. + + One-line Docstrings + -------------------- + + One-liners are for really obvious cases. They should really fit + on one line. For example:: + + def kos_root(): + """Return the pathname of the KOS root directory.""" + global _kos_root + if _kos_root: return _kos_root + ... + + Notes: + + - Triple quotes are used even though the string fits on one line. + This makes it easy to later expand it. + + - The closing quotes are on the same line as the opening quotes. + This looks better for one-liners. + + - There's no blank line either before or after the docstring. + + - The docstring is a phrase ending in a period. It prescribes the + function or method's effect as a command ("Do this", "Return + that"), not as a description: e.g. don't write "Returns the + pathname ..." + + - The one-line docstring should NOT be a "signature" reiterating + the function/method parameters (which can be obtained by + introspection). Don't do:: + + def function(a, b): + """function(a, b) -> list""" + + This type of docstring is only appropriate for C functions (such + as built-ins), where introspection is not possible. + + Multi-line Docstrings + ---------------------- + + Multi-line docstrings consist of a summary line just like a + one-line docstring, followed by a blank line, followed by a more + elaborate description. The summary line may be used by automatic + indexing tools; it is important that it fits on one line and is + separated from the rest of the docstring by a blank line. The + summary line may be on the same line as the opening quotes or on + the next line. + + The entire docstring is indented the same as the quotes at its + first line (see example below). Docstring processing tools will + strip an amount of indentation from the second and further lines + of the docstring equal to the indentation of the first non-blank + line after the first line of the docstring. Relative indentation + of later lines in the docstring is retained. + + Insert a blank line before and after all docstrings (one-line or + multi-line) that document a class -- generally speaking, the + class's methods are separated from each other by a single blank + line, and the docstring needs to be offset from the first method + by a blank line; for symmetry, put a blank line between the class + header and the docstring. Docstrings documenting functions or + methods generally don't have this requirement, unless the function + or method's body is written as a number of blank-line separated + sections -- in this case, treat the docstring as another section, + and precede it with a blank line. + + The docstring of a script (a stand-alone program) should be usable + as its "usage" message, printed when the script is invoked with + incorrect or missing arguments (or perhaps with a "-h" option, for + "help"). Such a docstring should document the script's function + and command line syntax, environment variables, and files. Usage + messages can be fairly elaborate (several screens full) and should + be sufficient for a new user to use the command properly, as well + as a complete quick reference to all options and arguments for the + sophisticated user. + + The docstring for a module should generally list the classes, + exceptions and functions (and any other objects) that are exported + by the module, with a one-line summary of each. (These summaries + generally give less detail than the summary line in the object's + docstring.) The docstring for a package (i.e., the docstring of + the package's __init__.py module) should also list the modules and + subpackages exported by the package. + + The docstring for a function or method should summarize its + behavior and document its arguments, return value(s), side + effects, exceptions raised, and restrictions on when it can be + called (all if applicable). Optional arguments should be + indicated. It should be documented whether keyword arguments are + part of the interface. + + The docstring for a class should summarize its behavior and list + the public methods and instance variables. If the class is + intended to be subclassed, and has an additional interface for + subclasses, this interface should be listed separately (in the + docstring). The class constructor should be documented in the + docstring for its __init__ method. Individual methods should be + documented by their own docstring. + + If a class subclasses another class and its behavior is mostly + inherited from that class, its docstring should mention this and + summarize the differences. Use the verb "override" to indicate + that a subclass method replaces a superclass method and does not + call the superclass method; use the verb "extend" to indicate that + a subclass method calls the superclass method (in addition to its + own behavior). + + *Do not* use the Emacs convention of mentioning the arguments of + functions or methods in upper case in running text. Python is + case sensitive and the argument names can be used for keyword + arguments, so the docstring should document the correct argument + names. It is best to list each argument on a separate line. For + example:: + + def complex(real=0.0, imag=0.0): + """Form a complex number. + + Keyword arguments: + real -- the real part (default 0.0) + imag -- the imaginary part (default 0.0) + + """ + if imag == 0.0 and real == 0.0: return complex_zero + ... + + The BDFL [3] recommends inserting a blank line between the last + paragraph in a multi-line docstring and its closing quotes, + placing the closing quotes on a line by themselves. This way, + Emacs' fill-paragraph command can be used on it. + + +References and Footnotes + + [1] PEP 256, Docstring Processing System Framework, Goodger + http://www.python.org/peps/pep-0256.html + + [2] PEP 258, Docutils Design Specification, Goodger + http://www.python.org/peps/pep-0258.html + + [3] Guido van Rossum, Python's creator and Benevolent Dictator For + Life. + + [4] http://www.python.org/doc/essays/styleguide.html + + [5] http://www.python.org/sigs/doc-sig/ + + +Copyright + + This document has been placed in the public domain. + + +Acknowledgements + + The "Specification" text comes mostly verbatim from the Python + Style Guide essay by Guido van Rossum [4]. + + This document borrows ideas from the archives of the Python + Doc-SIG [5]. Thanks to all members past and present. + + + +Local Variables: +mode: indented-text +indent-tabs-mode: nil +fill-column: 70 +sentence-end-double-space: t +End: diff --git a/docs/peps/pep-0258.txt b/docs/peps/pep-0258.txt new file mode 100644 index 000000000..6a55e20de --- /dev/null +++ b/docs/peps/pep-0258.txt @@ -0,0 +1,662 @@ +PEP: 258 +Title: Docutils Design Specification +Version: $Revision$ +Last-Modified: $Date$ +Author: goodger@users.sourceforge.net (David Goodger) +Discussions-To: doc-sig@python.org +Status: Draft +Type: Standards Track +Requires: 256, 257 +Created: 31-May-2001 +Post-History: 13-Jun-2001 + + +Abstract + + This PEP documents design issues and implementation details for + Docutils, a Python Docstring Processing System (DPS). The + rationale and high-level concepts of a DPS are documented in PEP + 256, "Docstring Processing System Framework" [1]. + + No changes to the core Python language are required by this PEP. + Its deliverables consist of a package for the standard library and + its documentation. + + +Specification + + Docstring Extraction Rules + ========================== + + 1. What to examine: + + a) If the "__all__" variable is present in the module being + documented, only identifiers listed in "__all__" are + examined for docstrings. + + b) In the absense of "__all__", all identifiers are examined, + except those whose names are private (names begin with "_" + but don't begin and end with "__"). + + c) 1a and 1b can be overridden by a parameter or command-line + option. + + 2. Where: + + Docstrings are string literal expressions, and are recognized + in the following places within Python modules: + + a) At the beginning of a module, function definition, class + definition, or method definition, after any comments. This + is the standard for Python __doc__ attributes. + + b) Immediately following a simple assignment at the top level + of a module, class definition, or __init__ method + definition, after any comments. See "Attribute Docstrings" + below. + + c) Additional string literals found immediately after the + docstrings in (a) and (b) will be recognized, extracted, and + concatenated. See "Additional Docstrings" below. + + d) @@@ 2.2-style "properties" with attribute docstrings? + + 3. How: + + Whenever possible, Python modules should be parsed by Docutils, + not imported. There are security reasons for not importing + untrusted code. Information from the source is lost when using + introspection to examine an imported module, such as comments + and the order of definitions. Also, docstrings are to be + recognized in places where the bytecode compiler ignores string + literal expressions (2b and 2c above), meaning importing the + module will lose these docstrings. Of course, standard Python + parsing tools such as the "parser" library module may be used. + + When the Python source code for a module is not available + (i.e. only the .pyc file exists) or for C extension modules, to + access docstrings the module can only be imported, and any + limitations must be lived with. + + Since attribute docstrings and additional docstrings are ignored + by the Python bytecode compiler, no namespace pollution or runtime + bloat will result from their use. They are not assigned to + __doc__ or to any other attribute. The initial parsing of a + module may take a slight performance hit. + + + Attribute Docstrings + -------------------- + + (This is a simplified version of PEP 224 [2] by Marc-Andre + Lemberg.) + + A string literal immediately following an assignment statement is + interpreted by the docstring extration machinery as the docstring + of the target of the assignment statement, under the following + conditions: + + 1. The assignment must be in one of the following contexts: + + a) At the top level of a module (i.e., not nested inside a + compound statement such as a loop or conditional): a module + attribute. + + b) At the top level of a class definition: a class attribute. + + c) At the top level of the "__init__" method definition of a + class: an instance attribute. + + Since each of the above contexts are at the top level (i.e., in + the outermost suite of a definition), it may be necessary to + place dummy assignments for attributes assigned conditionally + or in a loop. + + 2. The assignment must be to a single target, not to a list or a + tuple of targets. + + 3. The form of the target: + + a) For contexts 1a and 1b above, the target must be a simple + identifier (not a dotted identifier, a subscripted + expression, or a sliced expression). + + b) For context 1c above, the target must be of the form + "self.attrib", where "self" matches the "__init__" method's + first parameter (the instance parameter) and "attrib" is a + simple indentifier as in 3a. + + Blank lines may be used after attribute docstrings to emphasize + the connection between the assignment and the docstring. + + Examples:: + + g = 'module attribute (module-global variable)' + """This is g's docstring.""" + + class AClass: + + c = 'class attribute' + """This is AClass.c's docstring.""" + + def __init__(self): + self.i = 'instance attribute' + """This is self.i's docstring.""" + + + Additional Docstrings + --------------------- + + (This idea was adapted from PEP 216, Docstring Format [3], by + Moshe Zadka.) + + Many programmers would like to make extensive use of docstrings + for API documentation. However, docstrings do take up space in + the running program, so some of these programmers are reluctant to + "bloat up" their code. Also, not all API documentation is + applicable to interactive environments, where __doc__ would be + displayed. + + The docstring processing system's extraction tools will + concatenate all string literal expressions which appear at the + beginning of a definition or after a simple assignment. Only the + first strings in definitions will be available as __doc__, and can + be used for brief usage text suitable for interactive sessions; + subsequent string literals and all attribute docstrings are + ignored by the Python bytecode compiler and may contain more + extensive API information. + + Example:: + + def function(arg): + """This is __doc__, function's docstring.""" + """ + This is an additional docstring, ignored by the bytecode + compiler, but extracted by the Docutils. + """ + pass + + Issue: This breaks "from __future__ import" statements in Python + 2.1 for multiple module docstrings. The Python Reference Manual + specifies: + + A future statement must appear near the top of the module. + The only lines that can appear before a future statement are: + + * the module docstring (if any), + * comments, + * blank lines, and + * other future statements. + + Resolution? + + 1. Should we search for docstrings after a __future__ statement? + Very ugly. + + 2. Redefine __future__ statements to allow multiple preceeding + string literals? + + 3. Or should we not even worry about this? There shouldn't be + __future__ statements in production code, after all. Will + modules with __future__ statements simply have to put up with + the single-docstring limitation? + + + Choice of Docstring Format + ========================== + + Rather than force everyone to use a single docstring format, + multiple input formats are allowed by the processing system. A + special variable, __docformat__, may appear at the top level of a + module before any function or class definitions. Over time or + through decree, a standard format or set of formats should emerge. + + The __docformat__ variable is a string containing the name of the + format being used, a case-insensitive string matching the input + parser's module or package name (i.e., the same name as required + to "import" the module or package), or a registered alias. If no + __docformat__ is specified, the default format is "plaintext" for + now; this may be changed to the standard format once determined. + + The __docformat__ string may contain an optional second field, + separated from the format name (first field) by a single space: a + case-insensitive language identifier as defined in RFC 1766 [4]. + A typical language identifier consists of a 2-letter language code + from ISO 639 [5] (3-letter codes used only if no 2-letter code + exists; RFC 1766 is currently being revised to allow 3-letter + codes). If no language identifier is specified, the default is + "en" for English. The language identifier is passed to the parser + and can be used for language-dependent markup features. + + + Docutils Project Model + ====================== + + :: + + +--------------------------+ + | Docutils: | + | docutils.core.Publisher, | + | docutils.core.publish() | + +--------------------------+ + / \ + / \ + 1,3,5 / \ 6,8 + +--------+ +--------+ + | READER | =======================> | WRITER | + +--------+ +--------+ + // \ / \ + // \ / \ + 2 // 4 \ 7 / 9 \ + +--------+ +------------+ +------------+ +--------------+ + | PARSER |...| reader | | writer |...| DISTRIBUTOR? | + +--------+ | transforms | | transforms | | | + | | | | | - one file | + | - docinfo | | - styling | | - many files | + | - titles | | - writer- | | - objects in | + | - linking | | specific | | memory | + | - lookups | | - etc. | +--------------+ + | - reader- | +------------+ + | specific | + | - parser- | + | specific | + | - layout | + | - etc. | + +------------+ + + The numbers indicate the path a document would take through the + code. Double-width lines between reader & parser and between + reader & writer, indicating that data sent along these paths + should be standard (pure & unextended) Docutils doc trees. + Single-width lines signify that internal tree extensions or + completely unrelated representations are possible, but they must + be supported internally at both ends. + + + Publisher + --------- + + The "docutils.core" module contains a "Publisher" facade class and + "publish" convenience function. Publisher encapsulates the + high-level logic of a Docutils system. The Publisher.publish() + method passes its input to its Reader, then passes the resulting + document tree through its Writer to its destination. + + + Readers + ------- + + Readers understand the input context (where the data is coming + from), send the whole input or discrete "chunks" to the parser, + and provide the context to bind the chunks together back into a + cohesive whole. Using transforms_, Readers also resolve + references, footnote numbers, interpreted text processing, and + anything else that requires context-sensitive computation. + + Each reader is a module or package exporting a "Reader" class with + a "read" method. The base "Reader" class can be found in the + docutils/readers/__init__.py module. + + Most Readers will have to be told what parser to use. So far (see + the list of examples below), only the Python Source Reader + (PySource) will be able to determine the syntax on its own. + + Responsibilities: + + - Do raw input on the source ("Reader.scan()"). + + - Pass the raw text to the parser, along with a fresh doctree + root ("Reader.parse()"). + + - Run transforms over the doctree(s) ("Reader.transform()"). + + Examples: + + - Standalone/Raw/Plain: Just read a text file and process it. The + reader needs to be told which parser to use. Parser-specific + readers? + + - Python Source: See `Python Source Reader`_ above. + + - Email: RFC-822 headers, quoted excerpts, signatures, MIME parts. + + - PEP: RFC-822 headers, "PEP xxxx" and "RFC xxxx" conversion to + URIs. Either interpret PEPs' indented sections or convert + existing PEPs to reStructuredText (or both?). + + - Wiki: Global reference lookups of "wiki links" incorporated into + transforms. (CamelCase only or unrestricted?) Lazy + indentation? + + - Web Page: As standalone, but recognize meta fields as meta tags. + Support for templates of some sort? (After <body>, before + </body>?) + + - FAQ: Structured "question & answer(s)" constructs. + + - Compound document: Merge chapters into a book. Master TOC file? + + + Parsers + ------- + + Parsers analyze their input and produce a Docutils `document + tree`_. They don't know or care anything about the source or + destination of the data. + + Each input parser is a module or package exporting a "Parser" + class with a "parse" method. The base "Parser" class can be found + in the docutils/parsers/__init__.py module. + + Responsibilities: Given raw input text and a doctree root node, + populate the doctree by parsing the input text. + + Example: The only parser implemented so far is for the + reStructuredText markup. + + + Transforms + ---------- + + Transforms change the document tree from one form to another, add + to the tree, or prune it. Transforms are run by Reader and Writer + objects. Some transforms are Reader-specific, some are + Parser-specific, and others are Writer-specific. The choice and + order of transforms is specified in the Reader and Writer objects. + + Each transform is a class in a module in the docutils/transforms + package, a subclass of docutils.tranforms.Transform. + + Responsibilities: + + - Modify a doctree in-place, either purely transforming one + structure into another, or adding new structures based on the + doctree and/or external data. + + Examples (in "docutils.transforms"): + + - frontmatter.DocInfo: conversion of document metadata + (bibliographic information). + + - references.Hyperlinks: resolution of hyperlinks. + + - document.Merger: combining multiple populated doctrees into one. + + + Writers + ------- + + Writers produce the final output (HTML, XML, TeX, etc.). Writers + translate the internal document tree structure into the final data + format, possibly running output-specific transforms_ first. + + Each writer is a module or package exporting a "Writer" class with + a "write" method. The base "Writer" class can be found in the + docutils/writers/__init__.py module. + + Responsibilities: + + - Run transforms over the doctree(s). + + - Translate doctree(s) into specific output formats. + + - Transform references into format-native forms. + + - Write output to the destination (possibly via a "Distributor"). + + Examples: + + - XML: Various forms, such as DocBook. Also, raw doctree XML. + + - HTML + + - TeX + + - Plain text + + - reStructuredText? + + + Distributors + ------------ + + Distributors will exist for each method of storing the results of + processing: + + - In a single file on disk. + + - In a tree of directories and files on disk. + + - In a single tree-shaped data structure in memory. + + - Some other set of data structures in memory. + + @@@ Distributors are currently just an idea; they may or may not + be practical. Issues: + + Is it better for the writer to control the distributor, or + vice versa? Or should they be equals? + + Looking at the list of writers, it seems that only HTML would + require anything other than monolithic output. Perhaps merge + the HTML "distributor" into "writer" variants? + + Perhaps translator/writer instead of writer/distributor? + + Responsibilities: + + - Do raw output to the destination. + + - Transform references per incarnation (method of distribution). + + Examples: + + - Single file. + + - Multiple files & directories. + + - Objects in memory. + + + Docutils Package Structure + ========================== + + - Package "docutils". + + - Module "docutils.core" contains facade class "Publisher" and + convenience function "publish()". See `Publisher API`_ below. + + - Module "docutils.nodes" contains the Docutils document tree + element class library plus Visitor pattern base classes. See + `Document Tree`_ below. + + - Module "docutils.roman" contains Roman numeral conversion + routines. + + - Module "docutils.statemachine" contains a finite state machine + specialized for regular-expression-based text filters. The + reStructuredText parser implementation is based on this + module. + + - Module "docutils.urischemes" contains a mapping of known URI + schemes ("http", "ftp", "mail", etc.). + + - Module "docutils.utils" contains utility functions and + classes, including a logger class ("Reporter"; see `Error + Handling`_ below). + + - Package "docutils.parsers": markup parsers_. + + - Function "get_parser_class(parsername)" returns a parser + module by name. Class "Parser" is the base class of + specific parsers. (docutils/parsers/__init__.py) + + - Package "docutils.parsers.rst": the reStructuredText parser. + + - Alternate markup parsers may be added. + + - Package "docutils.readers": context-aware input readers. + + - Function "get_reader_class(readername)" returns a reader + module by name or alias. Class "Reader" is the base class + of specific readers. (docutils/readers/__init__.py) + + - Module "docutils.readers.standalone": reads independent + document files. + + - Readers to be added for: Python source code (structure & + docstrings), PEPs, email, FAQ, and perhaps Wiki and others. + + - Package "docutils.writers": output format writers. + + - Function "get_writer_class(writername)" returns a writer + module by name. Class "Writer" is the base class of + specific writers. (docutils/writers/__init__.py) + + - Module "docutils.writers.pprint" is a simple internal + document tree writer; it writes indented pseudo-XML. + + - Module "docutils.writers.html4css1" is a simple HyperText + Markup Language document tree writer for HTML 4.01 and CSS1. + + - Writers to be added: HTML 3.2 or 4.01-loose, XML (various + forms, such as DocBook and the raw internal doctree), TeX, + plaintext, reStructuredText, and perhaps others. + + - Package "docutils.transforms": tree transform classes. + + - Class "Transform" is the base class of specific transforms; + see `Transform API`_ below. + (docutils/transforms/__init__.py) + + - Each module contains related transform classes. + + - Package "docutils.languages": Language modules contain + language-dependent strings and mappings. They are named for + their language identifier (as defined in `Choice of Docstring + Format`_ above), converting dashes to underscores. + + - Function "getlanguage(languagecode)", returns matching + language module. (docutils/languages/__init__.py) + + - Module "docutils.languages.en" (English). + + - Other languages to be added. + + + Front-End Tools + =============== + + @@@ To be determined. + + @@@ Document tools & summarize their command-line interfaces. + + + Document Tree + ============= + + A single intermediate data structure is used internally by + Docutils, in the interfaces between components; it is defined in + the docutils.nodes module. It is not required that this data + structure be used *internally* by any of the components, just + *between* components. This data structure is similar to a DOM + tree whose schema is documented in an XML DTD (eXtensible Markup + Language Document Type Definition), which comes in two parts: + + - the Docutils Generic DTD, docutils.dtd [6], and + + - the OASIS Exchange Table Model, soextbl.dtd [7]. + + The DTD defines a rich set of elements, suitable for many input + and output formats. The DTD retains all information necessary to + reconstruct the original input text, or a reasonable facsimile + thereof. + + + Error Handling + ============== + + When the parser encounters an error in markup, it inserts a system + message (DTD element "system_message"). There are five levels of + system messages: + + - Level-0, "DEBUG": an internal reporting issue. There is no + effect on the processing. Level-0 system messages are + handled separately from the others. + + - Level-1, "INFO": a minor issue that can be ignored. There is + little or no effect on the processing. Typically level-1 system + messages are not reported. + + - Level-2, "WARNING": an issue that should be addressed. If + ignored, there may be unpredictable problems with the output. + Typically level-2 system messages are reported but do not halt + processing + + - Level-3, "ERROR": a major issue that should be addressed. If + ignored, the output will contain errors. Typically level-3 + system messages are reported but do not halt processing + + - Level-4, "SEVERE": a critical error that must be addressed. + Typically level-4 system messages are turned into exceptions + which halt processing. If ignored, the output will contain + severe errors. + + Although the initial message levels were devised independently, + they have a strong correspondence to VMS error condition severity + levels [8]; the names in quotes for levels 1 through 4 were + borrowed from VMS. Error handling has since been influenced by + the log4j project [9]. + + +References and Footnotes + + [1] PEP 256, Docstring Processing System Framework, Goodger + http://www.python.org/peps/pep-0256.html + + [2] PEP 224, Attribute Docstrings, Lemburg + http://www.python.org/peps/pep-0224.html + + [3] PEP 216, Docstring Format, Zadka + http://www.python.org/peps/pep-0216.html + + [4] http://www.rfc-editor.org/rfc/rfc1766.txt + + [5] http://lcweb.loc.gov/standards/iso639-2/englangn.html + + [6] http://docutils.sourceforge.net/spec/docutils.dtd + + [7] http://docstring.sourceforge.net/spec/soextblx.dtd + + [8] http://www.openvms.compaq.com:8000/73final/5841/ + 5841pro_027.html#error_cond_severity + + [9] http://jakarta.apache.org/log4j/ + + [10] http://www.python.org/sigs/doc-sig/ + + +Project Web Site + + A SourceForge project has been set up for this work at + http://docutils.sourceforge.net/. + + +Copyright + + This document has been placed in the public domain. + + +Acknowledgements + + This document borrows ideas from the archives of the Python + Doc-SIG [10]. Thanks to all members past & present. + + + +Local Variables: +mode: indented-text +indent-tabs-mode: nil +fill-column: 70 +sentence-end-double-space: t +End: diff --git a/docs/ref/doctree.txt b/docs/ref/doctree.txt new file mode 100644 index 000000000..90aea7054 --- /dev/null +++ b/docs/ref/doctree.txt @@ -0,0 +1,344 @@ +================================== + Docutils Document Tree Structure +================================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ + +This document describes the internal data structure representing +document trees in Docutils. The data structure is defined by the +hierarchy of classes in the ``docutils.nodes`` module. It is also +formally described by the `Docutils Generic DTD`_ XML document type +definition, docutils.dtd_, which is the definitive source for element +hierarchy details. + +Below is a simplified diagram of the hierarchy of element types in the +Docutils document tree structure. An element may contain any other +elements immediately below it in the diagram. Text in square brackets +are notes. Element types in parentheses indicate recursive or +one-to-many relationships; sections may contain (sub)sections, tables +contain further body elements, etc. :: + + +--------------------------------------------------------------------+ + | document [may begin with a title, subtitle, docinfo] | + | +--------------------------------------+ + | | sections [each begins with a title] | + +-----------------------------+-------------------------+------------+ + | [body elements:] | (sections) | + | | - literal | - lists | | - hyperlink +------------+ + | | blocks | - tables | | targets | + | para- | - doctest | - block | foot- | - sub. defs | + | graphs | blocks | quotes | notes | - comments | + +---------+-----------+----------+-------+--------------+ + | [text]+ | [text] | (body elements) | [text] | + | (inline +-----------+------------------+--------------+ + | markup) | + +---------+ + + +------------------- + Element Hierarchy +------------------- + +A class hierarchy has been implemented in nodes.py where the position +of the element (the level at which it can occur) is significant. +E.G., Root, Structural, Body, Inline classes etc. Certain +transformations will be easier because we can use isinstance() on +them. + +The elements making up Docutils document trees can be categorized into +the following groups: + +- _`Root element`: document_ + +- _`Title elements`: title_, subtitle_ + +- _`Bibliographic elements`: docinfo_, author_, authors_, + organization_, contact_, version_, revision_, status_, date_, + copyright_ + +- _`Structural elements`: document_, section_, topic_, transition_ + +- _`Body elements`: + + - _`General body elements`: paragraph_, literal_block_, + block_quote_, doctest_block_, table_, figure_, image_, footnote_ + + - _`Lists`: bullet_list_, enumerated_list_, definition_list_, + field_list_, option_list_ + + - _`Admonitions`: note_, tip_, warning_, error_, caution_, danger_, + important_ + + - _`Special body elements`: target_, substitution_definition_, + comment_, system_warning_ + +- _`Inline elements`: emphasis_, strong_, interpreted_, literal_, + reference_, target_, footnote_reference_, substitution_reference_, + image_, problematic_ + + +``Node`` +======== + + +``Text`` +======== + + +``Element`` +=========== + + +``TextElement`` +=============== + + +------------------- + Element Reference +------------------- + +``document`` +============ +description + +contents + +External attributes +------------------- +`Common external attributes`_. + + +Internal attributes +------------------- +- `Common internal attributes`_. +- ``explicittargets`` +- ``implicittargets`` +- ``externaltargets`` +- ``indirecttargets`` +- ``refnames`` +- ``anonymoustargets`` +- ``anonymousrefs`` +- ``autofootnotes`` +- ``autofootnoterefs`` +- ``reporter`` + + +--------------------- + Attribute Reference +--------------------- + +External Attributes +=================== + +Through the `%basic.atts;`_ parameter entity, all elements share the +following _`common external attributes`: id_, name_, dupname_, +source_. + + +``anonymous`` +------------- +The ``anonymous`` attribute + + +``auto`` +-------- +The ``auto`` attribute + + +``dupname`` +----------- +The ``dupname`` attribute + + +``id`` +------ +The ``id`` attribute + + +``name`` +-------- +The ``name`` attribute + + +``refid`` +--------- +The ``refid`` attribute + + +``refname`` +----------- +The ``refname`` attribute + + +``refuri`` +---------- +The ``refuri`` attribute + + +``source`` +---------- +The ``source`` attribute + + +``xml:space`` +------------- +The ``xml:space`` attribute + + +Internal Attributes +=================== + +All element objects share the following _`common internal attributes`: +rawsource_, children_, attributes_, tagname_. + + +------------------------ + DTD Parameter Entities +------------------------ + +``%basic.atts;`` +================ +The ``%basic.atts;`` parameter entity lists attributes common to all +elements. See `Common Attributes`_. + + +``%body.elements;`` +=================== +The ``%body.elements;`` parameter entity + + +``%inline.elements;`` +==================== +The ``%inline.elements;`` parameter entity + + +``%reference.atts;`` +==================== +The ``%reference.atts;`` parameter entity + + +``%structure.model;`` +===================== +The ``%structure.model;`` parameter entity + + +``%text.model;`` +================ +The ``%text.model;`` parameter entity + + +-------------------------------- + Appendix: Miscellaneous Topics +-------------------------------- + +Representation of Horizontal Rules +================================== + +Having added the "horizontal rule" construct to the reStructuredText_ +spec, a decision had to be made as to how to reflect the construct in +the implementation of the document tree. Given this source:: + + Document + ======== + + Paragraph + + -------- + + Paragraph + +The horizontal rule indicates a "transition" (in prose terms) or the +start of a new "division". Before implementation, the parsed document +tree would be:: + + <document> + <section name="document"> + <title> + Document + <paragraph> + Paragraph + -------- <--- error here + <paragraph> + Paragraph + +There are several possibilities for the implementation. Solution 3 +was chosen. + +1. Implement horizontal rules as "divisions" or segments. A + "division" is a title-less, non-hierarchical section. The first + try at an implementation looked like this:: + + <document> + <section name="document"> + <title> + Document + <paragraph> + Paragraph + <division> + <paragraph> + Paragraph + + But the two paragraphs are really at the same level; they shouldn't + appear to be at different levels. There's really an invisible + "first division". The horizontal rule splits the document body + into two segments, which should be treated uniformly. + +2. Treating "divisions" uniformly brings us to the second + possibility:: + + <document> + <section name="document"> + <title> + Document + <division> + <paragraph> + Paragraph + <division> + <paragraph> + Paragraph + + With this change, documents and sections will directly contain + divisions and sections, but not body elements. Only divisions will + directly contain body elements. Even without a horizontal rule + anywhere, the body elements of a document or section would be + contained within a division element. This makes the document tree + deeper. This is similar to the way HTML treats document contents: + grouped within a <BODY> element. + +3. Implement them as "transitions", empty elements:: + + <document> + <section name="document"> + <title> + Document + <paragraph> + Paragraph + <transition> + <paragraph> + Paragraph + + A transition would be a "point element", not containing anything, + only identifying a point within the document structure. This keeps + the document tree flatter, but the idea of a "point element" like + "transition" smells bad. A transition isn't a thing itself, it's + the space between two divisions. + + This solution has been chosen for incorporation into the document + tree. + + +.. _Docutils Generic DTD: +.. _docutils.dtd: http://docutils.sourceforge.net/spec/docutils.dtd +.. _reStructuredText: + http://docutils.sourceforge.net/spec/rst/reStructuredText.html + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/ref/docutils.dtd b/docs/ref/docutils.dtd new file mode 100644 index 000000000..d47238b4d --- /dev/null +++ b/docs/ref/docutils.dtd @@ -0,0 +1,514 @@ +<!-- +====================================================================== + Docutils Generic DTD +====================================================================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This DTD has been placed in the public domain. +:Filename: docutils.dtd + +More information about this DTD (document type definition) and the +Docutils project can be found at http://docutils.sourceforge.net/. +The latest version of this DTD is available from +http://docutils.sourceforge.net/spec/docutils.dtd. + +The proposed formal public identifier for this DTD is:: + + +//IDN python.org//DTD Docutils Generic//EN//XML +--> + + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Parameter Entities +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Parameter entities are used to simplify the DTD (reduce duplication) +and to allow the DTD to be customized by wrapper DTDs. Parameter +entities beginning with 'additional' are meant to allow easy extension +by wrapper DTDs. +--> + +<!-- Attributes +================================================================== --> + +<!-- Boolean: no if zero(s), yes if any other value. --> +<!ENTITY % yesorno "NMTOKEN"> + +<!ENTITY % additional.basic.atts ""> +<!-- +Attributes shared by all elements in this DTD: + +- `id` is a unique identifier, typically assigned by the system. +- `name` is an identifier assigned in the markup. +- `dupname` is the same as `name`, used when it's a duplicate. +- `source` is the name of the source of this document or fragment. +- `class` is used to transmit individuality information forward. +--> +<!ENTITY % basic.atts + " id ID #IMPLIED + name CDATA #IMPLIED + dupname CDATA #IMPLIED + source CDATA #IMPLIED + class CDATA #IMPLIED + %additional.basic.atts; "> + +<!-- External reference to a URI/URL. --> +<!ENTITY % refuri.att + " refuri CDATA #IMPLIED "> + +<!-- Internal reference to the `id` attribute of an element. --> +<!ENTITY % refid.att + " refid IDREF #IMPLIED "> + +<!-- Space-separated list of id references, for backlinks. --> +<!ENTITY % backrefs.att + " backrefs IDREFS #IMPLIED "> + +<!-- +Internal reference to the `name` attribute of an element. On a +'target' element, 'refname' indicates an indirect target which may +resolve to either an internal or external reference. +--> +<!ENTITY % refname.att + " refname CDATA #IMPLIED "> + +<!ENTITY % additional.reference.atts ""> +<!-- Collected hyperlink reference attributes. --> +<!ENTITY % reference.atts + " %refuri.att; + %refid.att; + %refname.att; + %additional.reference.atts; "> + +<!-- Unnamed hyperlink. --> +<!ENTITY % anonymous.att + " anonymous %yesorno; #IMPLIED "> + +<!-- Auto-numbered footnote. --> +<!ENTITY % auto.att + " auto %yesorno; #IMPLIED "> + +<!-- XML standard attribute for whitespace-preserving elements. --> +<!ENTITY % fixedspace.att + " xml:space (default | preserve) #FIXED 'preserve' "> + + +<!-- Element OR-Lists +============================================================= --> + +<!ENTITY % additional.bibliographic.elements ""> +<!ENTITY % bibliographic.elements + " author | authors | organization | contact + | version | revision | status | date | copyright + %additional.bibliographic.elements; "> + +<!ENTITY % additional.structural.elements ""> +<!ENTITY % structural.elements + " section + %additional.structural.elements; "> + +<!ENTITY % additional.body.elements ""> +<!ENTITY % body.elements + " paragraph | literal_block | block_quote | doctest_block| table + | figure | image | footnote | citation + | bullet_list | enumerated_list | definition_list | field_list + | option_list + | attention | caution | danger | error | hint | important | note + | tip | warning + | target | substitution_definition | comment | pending + | system_message | raw + %additional.body.elements; "> + +<!ENTITY % additional.inline.elements ""> +<!ENTITY % inline.elements + " emphasis | strong | interpreted | literal + | reference | footnote_reference | citation_reference + | substitution_reference | target | image | problematic | raw + %additional.inline.elements; "> + +<!-- Element Content Models +================================================================== --> + +<!ENTITY % structure.model + " ( ( (%body.elements; | topic)+, + (transition, (%body.elements; | topic)+ )*, + (%structural.elements;)* ) + | (%structural.elements;)+ ) "> + +<!ENTITY % text.model + " (#PCDATA | %inline.elements;)* "> + +<!-- Table Model +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This DTD uses the Exchange subset of the CALS-table model (OASIS +Technical Memorandum 9901:1999 "XML Exchange Table Model DTD", +http://www.oasis-open.org/html/tm9901.htm). +--> + +<!ENTITY % calstblx PUBLIC + "-//OASIS//DTD XML Exchange Table Model 19990315//EN" + "soextblx.dtd"> + +<!-- These parameter entities customize the table model DTD. --> +<!ENTITY % bodyatt " %basic.atts; "> <!-- table elt --> +<!ENTITY % tbl.tgroup.att " %basic.atts; "> +<!ENTITY % tbl.thead.att " %basic.atts; "> +<!ENTITY % tbl.tbody.att " %basic.atts; "> +<!ENTITY % tbl.colspec.att " %basic.atts; "> +<!ENTITY % tbl.row.att " %basic.atts; "> +<!ENTITY % tbl.entry.mdl " (%body.elements;)* "> +<!ENTITY % tbl.entry.att + " %basic.atts; + morecols NMTOKEN #IMPLIED "> + + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Root Element +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!-- Optional elements may be generated by internal processing. --> +<!ELEMENT document + ((title, subtitle?)?, docinfo?, %structure.model;)> +<!ATTLIST document %basic.atts;> + + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Title Elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!ELEMENT title %text.model;> +<!ATTLIST title + %basic.atts; + %refid.att;> + +<!ELEMENT subtitle %text.model;> +<!ATTLIST subtitle %basic.atts;> + + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Bibliographic Elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!-- Container for bibliographic elements. May not be empty. --> +<!ELEMENT docinfo (%bibliographic.elements;)+> +<!ATTLIST docinfo %basic.atts;> + +<!ELEMENT author %text.model;> +<!ATTLIST author %basic.atts;> + +<!ELEMENT authors ((author, organization?, contact?)+)> +<!ATTLIST authors %basic.atts;> + +<!ELEMENT organization %text.model;> +<!ATTLIST organization %basic.atts;> + +<!ELEMENT contact %text.model;> +<!ATTLIST contact %basic.atts;> + +<!ELEMENT version %text.model;> +<!ATTLIST version %basic.atts;> + +<!ELEMENT revision %text.model;> +<!ATTLIST revision %basic.atts;> + +<!ELEMENT status %text.model;> +<!ATTLIST status %basic.atts;> + +<!ELEMENT date %text.model;> +<!ATTLIST date %basic.atts;> + +<!ELEMENT copyright %text.model;> +<!ATTLIST copyright %basic.atts;> + + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Structural Elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!ELEMENT section (title, %structure.model;)> +<!ATTLIST section %basic.atts;> + +<!ELEMENT topic (title?, (%body.elements;)+)> +<!ATTLIST topic %basic.atts;> + +<!ELEMENT transition EMPTY> +<!ATTLIST transition %basic.atts;> + + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Body Elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!ELEMENT paragraph %text.model;> +<!ATTLIST paragraph %basic.atts;> + +<!ELEMENT bullet_list (list_item+)> +<!ATTLIST bullet_list + %basic.atts; + bullet CDATA #IMPLIED> + +<!ELEMENT enumerated_list (list_item+)> +<!ATTLIST enumerated_list + %basic.atts; + enumtype (arabic | loweralpha | upperalpha + | lowerroman | upperroman) + #IMPLIED + prefix CDATA #IMPLIED + suffix CDATA #IMPLIED + start NUMBER #IMPLIED> + +<!ELEMENT list_item (%body.elements;)+> +<!ATTLIST list_item %basic.atts;> + +<!ELEMENT definition_list (definition_list_item+)> +<!ATTLIST definition_list %basic.atts;> + +<!ELEMENT definition_list_item (term, classifier?, definition)> +<!ATTLIST definition_list_item %basic.atts;> + +<!ELEMENT term %text.model;> +<!ATTLIST term %basic.atts;> + +<!ELEMENT classifier %text.model;> +<!ATTLIST classifier %basic.atts;> + +<!ELEMENT definition (%body.elements;)+> +<!ATTLIST definition %basic.atts;> + +<!ELEMENT field_list (field+)> +<!ATTLIST field_list %basic.atts;> + +<!ELEMENT field (field_name, field_argument*, field_body)> +<!ATTLIST field %basic.atts;> + +<!ELEMENT field_name (#PCDATA)> +<!ATTLIST field_name %basic.atts;> + +<!-- Support for Javadoc-style tags with arguments. --> +<!ELEMENT field_argument (#PCDATA)> +<!ATTLIST field_argument %basic.atts;> + +<!ELEMENT field_body (%body.elements;)+> +<!ATTLIST field_body %basic.atts;> + +<!ELEMENT option_list (option_list_item+)> +<!ATTLIST option_list %basic.atts;> + +<!ELEMENT option_list_item (option_group, description)> +<!ATTLIST option_list_item %basic.atts;> + +<!ELEMENT option_group (option+)> +<!ATTLIST option_group %basic.atts;> + +<!ELEMENT option (option_string, option_argument*)> +<!ATTLIST option %basic.atts;> + +<!ELEMENT option_string (#PCDATA)> +<!ATTLIST option_string %basic.atts;> + +<!-- +`delimiter` contains the string preceding the `option_argument`: +either the string separating it from the `option` (typically either +"=" or " ") or the string between option arguments (typically either +"," or " "). +--> +<!ELEMENT option_argument (#PCDATA)> +<!ATTLIST option_argument + %basic.atts; + delimiter CDATA #IMPLIED> + +<!ELEMENT description (%body.elements;)+> +<!ATTLIST description %basic.atts;> + +<!ELEMENT literal_block (#PCDATA)> +<!ATTLIST literal_block + %basic.atts; + %fixedspace.att;> + +<!ELEMENT block_quote (%body.elements;)+> +<!ATTLIST block_quote %basic.atts;> + +<!ELEMENT doctest_block (#PCDATA)> +<!ATTLIST doctest_block + %basic.atts; + %fixedspace.att;> + +<!ELEMENT attention (%body.elements;)+> +<!ATTLIST attention %basic.atts;> + +<!ELEMENT caution (%body.elements;)+> +<!ATTLIST caution %basic.atts;> + +<!ELEMENT danger (%body.elements;)+> +<!ATTLIST danger %basic.atts;> + +<!ELEMENT error (%body.elements;)+> +<!ATTLIST error %basic.atts;> + +<!ELEMENT hint (%body.elements;)+> +<!ATTLIST hint %basic.atts;> + +<!ELEMENT important (%body.elements;)+> +<!ATTLIST important %basic.atts;> + +<!ELEMENT note (%body.elements;)+> +<!ATTLIST note %basic.atts;> + +<!ELEMENT tip (%body.elements;)+> +<!ATTLIST tip %basic.atts;> + +<!ELEMENT warning (%body.elements;)+> +<!ATTLIST warning %basic.atts;> + +<!ELEMENT footnote (label?, (%body.elements;)+)> +<!ATTLIST footnote + %basic.atts; + %backrefs.att; + %auto.att;> + +<!ELEMENT citation (label, (%body.elements;)+)> +<!ATTLIST citation + %basic.atts; + %backrefs.att;> + +<!ELEMENT label (#PCDATA)> +<!ATTLIST label %basic.atts;> + +<!-- Empty except when used as an inline element. --> +<!ELEMENT target (%text.model;)> +<!ATTLIST target + %basic.atts; + %reference.atts; + %anonymous.att;> + +<!ELEMENT substitution_definition (%text.model;)> +<!ATTLIST substitution_definition %basic.atts;> + +<!ELEMENT comment (#PCDATA)> +<!ATTLIST comment + %basic.atts; + %fixedspace.att;> + +<!ELEMENT pending EMPTY> +<!ATTLIST pending %basic.atts;> + +<!ELEMENT figure (image, ((caption, legend?) | legend) > +<!ATTLIST figure %basic.atts;> + +<!-- Also an inline element. --> +<!ELEMENT image EMPTY> +<!ATTLIST image + %basic.atts; + uri CDATA #REQUIRED + alt CDATA #IMPLIED + height NMTOKEN #IMPLIED + width NMTOKEN #IMPLIED + scale NMTOKEN #IMPLIED> + +<!ELEMENT caption %text.model;> +<!ATTLIST caption %basic.atts;> + +<!ELEMENT legend (%body.elements;)+> +<!ATTLIST legend %basic.atts;> + +<!-- +Table elements: table, tgroup, colspec, thead, tbody, row, entry. +--> +%calstblx; + +<!-- Used to record processing information. --> +<!ELEMENT system_message (%body.elements;)+> +<!ATTLIST system_message + %basic.atts; + %backrefs.att; + level NMTOKEN #IMPLIED + type CDATA #IMPLIED> + +<!-- Used to pass raw data through the system. Also inline. --> +<!ELEMENT raw %text.model;> +<!ATTLIST raw + %basic.atts; + %fixedspace.att; + format CDATA #IMPLIED> + + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Inline Elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Inline elements occur within the text contents of body elements. Some +nesting of inline elements is allowed by these definitions, with the +following caveats: + +- An inline element may not contain a nested element of the same type + (e.g. <strong> may not contain another <strong>). +- Nested inline elements may or may not be supported by individual + applications using this DTD. +- The inline elements <footnote_reference>, <citation_reference>, + <literal>, and <image> do not support nesting. +--> + +<!ELEMENT emphasis (%text.model;)> +<!ATTLIST emphasis %basic.atts;> + +<!ELEMENT strong (%text.model;)> +<!ATTLIST strong %basic.atts;> + +<!ELEMENT interpreted (%text.model;)> +<!ATTLIST interpreted + %basic.atts; + type CDATA #IMPLIED> + +<!ELEMENT literal (#PCDATA)> +<!ATTLIST literal %basic.atts;> + +<!ELEMENT reference (%text.model;)> +<!ATTLIST reference + %basic.atts; + %reference.atts; + %anonymous.att;> + +<!ELEMENT footnote_reference (#PCDATA)> +<!ATTLIST footnote_reference + %basic.atts; + %reference.atts; + %auto.att;> + +<!ELEMENT citation_reference (#PCDATA)> +<!ATTLIST citation_reference + %basic.atts; + %reference.atts;> + +<!ELEMENT substitution_reference (%text.model;)> +<!ATTLIST substitution_reference + %basic.atts; + %refname.att;> + +<!ELEMENT problematic (%text.model;)> +<!ATTLIST problematic + %basic.atts; + %refid.att;> + + +<!-- +Local Variables: +mode: sgml +indent-tabs-mode: nil +fill-column: 70 +End: +--> diff --git a/docs/ref/rst/directives.txt b/docs/ref/rst/directives.txt new file mode 100644 index 000000000..cbb8b4609 --- /dev/null +++ b/docs/ref/rst/directives.txt @@ -0,0 +1,360 @@ +============================= + reStructuredText Directives +============================= +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ + +This document describes the directives implemented in the reference +reStructuredText parser. + + +.. contents:: + + +------------- + Admonitions +------------- + +DTD elements: attention, caution, danger, error, hint, important, +note, tip, warning. + +Directive block: directive data and all following indented text +are interpreted as body elements. + +Admonitions are specially marked "topics" that can appear anywhere an +ordinary body element can. They contain arbitrary body elements. +Typically, an admonition is rendered as an offset block in a document, +sometimes outlined or shaded, with a title matching the admonition +type. For example:: + + .. DANGER:: + Beware killer rabbits! + +This directive might be rendered something like this:: + + +------------------------+ + | !DANGER! | + | | + | Beware killer rabbits! | + +------------------------+ + +The following admonition directives have been implemented: + +- attention +- caution +- danger +- error +- hint +- important +- note +- tip +- warning + +Any text immediately following the directive indicator (on the same +line and/or indented on following lines) is interpreted as a directive +block and is parsed for normal body elements. For example, the +following "note" admonition directive contains one paragraph and a +bullet list consisting of two list items:: + + .. note:: This is a note admonition. + This is the second line of the first paragraph. + + - The note contains all indented body elements + following. + - It includes this bullet list. + + +-------- + Images +-------- + +There are two image directives: "image" and "figure". + + +Image +===== + +DTD element: image. + +Directive block: directive data and following indented lines (up to +the first blank line) are interpreted as image URI and optional +attributes. + +An "image" is a simple picture:: + + .. image:: picture.png + +The URI for the image source file is specified in the directive data. +As with hyperlink targets, the image URI may begin on the same line as +the explicit markup start and target name, or it may begin in an +indented text block immediately following, with no intervening blank +lines. If there are multiple lines in the link block, they are +stripped of leading and trailing whitespace and joined together. + +Optionally, the image link block may end with a flat field list, the +_`image attributes`. For example:: + + .. image:: picture.png + :height: 100 + :width: 200 + :scale: 50 + :alt: alternate text + +The following attributes are recognized: + +``alt`` : text + Alternate text: a short description of the image, displayed by + applications that cannot display images, or spoken by applications + for visually impaired users. +``height`` : integer + The height of the image in pixels, used to reserve space or scale + the image vertically. +``width`` : integer + The width of the image in pixels, used to reserve space or scale + the image horizontally. +``scale`` : integer + The uniform scaling factor of the image, a percentage (but no "%" + symbol is required or allowed). "100" means full-size. + + +Figure +====== + +DTD elements: figure, image, caption, legend. + +Directive block: directive data and all following indented text are +interpreted as an image URI, optional attributes, a caption, and an +optional legend. + +A "figure" consists of image_ data (optionally including `image +attributes`_), an optional caption (a single paragraph), and an +optional legend (arbitrary body elements):: + + .. figure:: picture.png + :scale: 50 + :alt: map to buried treasure + + This is the caption of the figure (a simple paragraph). + + The legend consists of all elements after the caption. In this + case, the legend consists of this paragraph and the following + table: + + +-----------------------+-----------------------+ + | Symbol | Meaning | + +=======================+=======================+ + | .. image:: tent.png | Campground | + +-----------------------+-----------------------+ + | .. image:: waves.png | Lake | + +-----------------------+-----------------------+ + | .. image:: peak.png | Mountain | + +-----------------------+-----------------------+ + +There must be a blank line before the caption paragraph and before the +legend. To specify a legend without a caption, use an empty comment +("..") in place of the caption. + + +--------------------- + Document Components +--------------------- + +Table of Contents +================= + +DTD elements: pending, topic. + +Directive block: directive data and following indented lines (up to +the first blank line) are interpreted as the topic title and optional +attributes. + +The "contents" directive inserts a table of contents (TOC) in two +passes: initial parse and transform. During the initial parse, a +"pending" element is generated which acts as a placeholder, storing +the TOC title and any attributes internally. At a later stage in the +processing, the "pending" element is replaced by a "topic" element, a +title and the table of contents proper. + +The directive in its simplest form:: + + .. contents:: + +Language-dependent boilerplate text will be used for the title. The +English default title text is "Contents". + +An explicit title, may be specified:: + + .. contents:: Table of Contents + +The title may span lines, although it is not recommended:: + + .. contents:: Here's a very long Table of + Contents title + +Attributes may be specified for the directive, using a field list:: + + .. contents:: Table of Contents + :depth: 2 + +If the default title is to be used, the attribute field list may begin +on the same line as the directive marker:: + + .. contents:: :depth: 2 + +The following attributes are recognized: + +``depth`` : integer + The number of section levels that are collected in the table of + contents. +``local`` : empty + Generate a local table of contents. Entries will only include + subsections of the section in which the directive is given. If no + explicit title is given, the table of contents will not be titled. + + +Footnotes +========= + +DTD elements: pending, topic. + +@@@ + + +Citations +========= + +DTD elements: pending, topic. + +@@@ + + +Topic +===== + +DTD element: topic. + +@@@ + + +--------------- + HTML-Specific +--------------- + +Meta +==== + +Non-standard element: meta. + +Directive block: directive data and following indented lines (up to +the first blank line) are parsed for a flat field list. + +The "meta" directive is used to specify HTML metadata stored in HTML +META tags. "Metadata" is data about data, in this case data about web +pages. Metadata is used to describe and classify web pages in the +World Wide Web, in a form that is easy for search engines to extract +and collate. + +Within the directive block, a flat field list provides the syntax for +metadata. The field name becomes the contents of the "name" attribute +of the META tag, and the field body (interpreted as a single string +without inline markup) becomes the contents of the "content" +attribute. For example:: + + .. meta:: + :description: The reStructuredText plaintext markup language + :keywords: plaintext, markup language + +This would be converted to the following HTML:: + + <meta name="description" + content="The reStructuredText plaintext markup language"> + <meta name="keywords" content="plaintext, markup language"> + +Support for other META attributes ("http-equiv", "scheme", "lang", +"dir") are provided through field arguments, which must be of the form +"attr=value":: + + .. meta:: + :description lang=en: An amusing story + :description lang=fr: Un histoire amusant + +And their HTML equivalents:: + + <meta name="description" lang="en" content="An amusing story"> + <meta name="description" lang="fr" content="Un histoire amusant"> + +Some META tags use an "http-equiv" attribute instead of the "name" +attribute. To specify "http-equiv" META tags, simply omit the name:: + + .. meta:: + :http-equiv=Content-Type: text/html; charset=ISO-8859-1 + +HTML equivalent:: + + <meta http-equiv="Content-Type" + content="text/html; charset=ISO-8859-1"> + + +Imagemap +======== + +Non-standard element: imagemap. + + +--------------- + Miscellaneous +--------------- + +Raw Data Pass-Through +===================== + +DTD element: pending. + +Directive block: the directive data is interpreted as an output format +type, and all following indented text is stored verbatim, +uninterpreted. + +The "raw" directive indicates non-reStructuredText data that is to be +passed untouched to the Writer. The name of the output format is +given in the directive data. During the initial parse, a "pending" +element is generated which acts as a placeholder, storing the format +and raw data internally. The interpretation of the code is up to the +Writer. A Writer may ignore any raw output not matching its format. + +For example, the following input would be passed untouched by an HTML +Writer:: + + .. raw:: html + <hr width=50 size=10> + +A LaTeX Writer could insert the following raw content into its +output stream:: + + .. raw:: latex + \documentclass[twocolumn]{article} + + +Restructuredtext-Test-Directive +=============================== + +DTD element: system_warning. + +Directive block: directive data is stored, and all following indented +text is interpreted as a literal block. + +This directive is provided for test purposes only. (Nobody is +expected to type in a name *that* long!) It is converted into a +level-1 (info) system message showing the directive data, possibly +followed by a literal block containing the rest of the directive +block. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/ref/rst/introduction.txt b/docs/ref/rst/introduction.txt new file mode 100644 index 000000000..3d7cfc5f8 --- /dev/null +++ b/docs/ref/rst/introduction.txt @@ -0,0 +1,307 @@ +===================================== + An Introduction to reStructuredText +===================================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ + +reStructuredText_ is an easy-to-read, what-you-see-is-what-you-get +plaintext markup syntax and parser system. It is useful for in-line +program documentation (such as Python docstrings), for quickly +creating simple web pages, and for standalone documents. +reStructuredText_ is a proposed revision and reinterpretation of the +StructuredText_ and Setext_ lightweight markup systems. + +reStructuredText is designed for extensibility for specific +application domains. Its parser is a component of Docutils_. + +This document defines the goals_ of reStructuredText and provides a +history_ of the project. It is written using the reStructuredText +markup, and therefore serves as an example of its use. Please also +see an analysis of the `problems with StructuredText`_ and the +`reStructuredText markup specification`_ itself at project's web page, +http://docutils.sourceforge.net/rst.html. + +.. _reStructuredText: http://docutils.sourceforge.net/rst.html +.. _StructuredText: + http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage +.. _Setext: http://docutils.sourceforge.net/mirror/setext.html +.. _Docutils: http://docutils.sourceforge.net/ +.. _Problems with StructuredText: problems.html +.. _reStructuredText Markup Specification: reStructuredText.html + + +Goals +===== + +The primary goal of reStructuredText_ is to define a markup syntax for +use in Python docstrings and other documentation domains, that is +readable and simple, yet powerful enough for non-trivial use. The +intended purpose of the reStructuredText markup is twofold: + +- the establishment of a set of standard conventions allowing the + expression of structure within plaintext, and + +- the conversion of such documents into useful structured data + formats. + +The secondary goal of reStructuredText is to be accepted by the Python +community (by way of being blessed by PythonLabs and the BDFL [#]_) as +a standard for Python inline documentation (possibly one of several +standards, to account for taste). + +.. [#] Python's creator and "Benevolent Dictator For Life", + Guido van Rossum. + +To clarify the primary goal, here are specific design goals, in order, +beginning with the most important: + +1. Readable. The marked-up text must be easy to read without any + prior knowledge of the markup language. It should be as easily + read in raw form as in processed form. + +2. Unobtrusive. The markup that is used should be as simple and + unobtrusive as possible. The simplicity of markup constructs + should be roughly proporional to their frequency of use. The most + common constructs, with natural and obvious markup, should be the + simplest and most unobtrusive. Less common contstructs, for which + there is no natural or obvious markup, should be distinctive. + +3. Unambiguous. The rules for markup must not be open for + interpretation. For any given input, there should be one and only + one possible output (including error output). + +4. Unsurprising. Markup constructs should not cause unexpected output + upon processing. As a fallback, there must be a way to prevent + unwanted markup processing when a markup construct is used in a + non-markup context (for example, when documenting the markup syntax + itself). + +5. Intuitive. Markup should be as obvious and easily remembered as + possible, for the author as well as for the reader. Constructs + should take their cues from such naturally occurring sources as + plaintext email messages, newsgroup postings, and text + documentation such as README.txt files. + +6. Easy. It should be easy to mark up text using any ordinary text + editor. + +7. Scalable. The markup should be applicable regardless of the length + of the text. + +8. Powerful. The markup should provide enough constructs to produce a + reasonably rich structured document. + +9. Language-neutral. The markup should apply to multiple natural (as + well as artificial) languages, not only English. + +10. Extensible. The markup should provide a simple syntax and + interface for adding more complex general markup, and custom + markup. + +11. Output-format-neutral. The markup will be appropriate for + processing to multiple output formats, and will not be biased + toward any particular format. + +The design goals above were used as criteria for accepting or +rejecting syntax, or selecting between alternatives. + +It is emphatically *not* the goal of reStructuredText to define +docstring semantics, such as docstring contents or docstring length. +These issues are orthogonal to the markup syntax and beyond the scope +of this specification. + +Also, it is not the goal of reStructuredText to maintain compatibility +with StructuredText_ or Setext_. reStructuredText shamelessly steals +their great ideas and ignores the not-so-great. + +Author's note: + + Due to the nature of the problem we're trying to solve (or, + perhaps, due to the nature of the proposed solution), the above + goals unavoidably conflict. I have tried to extract and distill + the wisdom accumulated over the years in the Python Doc-SIG_ + mailing list and elsewhere, to come up with a coherent and + consistent set of syntax rules, and the above goals by which to + measure them. + + There will inevitably be people who disagree with my particular + choices. Some desire finer control over their markup, others + prefer less. Some are concerned with very short docstrings, + others with full-length documents. This specification is an + effort to provide a reasonably rich set of markup constructs in a + reasonably simple form, that should satisfy a reasonably large + group of reasonable people. + + David Goodger (goodger@users.sourceforge.net), 2001-04-20 + +.. _Doc-SIG: http://www.python.org/sigs/doc-sig/ + + +History +======= + +reStructuredText_, the specification, is based on StructuredText_ and +Setext_. StructuredText was developed by Jim Fulton of `Zope +Corporation`_ (formerly Digital Creations) and first released in 1996. +It is now released as a part of the open-source 'Z Object Publishing +Environment' (ZOPE_). Ian Feldman's and Tony Sanders' earlier Setext_ +specification was either an influence on StructuredText or, by their +similarities, at least evidence of the correctness of this approach. + +I discovered StructuredText_ in late 1999 while searching for a way to +document the Python modules in one of my projects. Version 1.1 of +StructuredText was included in Daniel Larsson's pythondoc_. Although +I was not able to get pythondoc to work for me, I found StructuredText +to be almost ideal for my needs. I joined the Python Doc-SIG_ +(Documentation Special Interest Group) mailing list and found an +ongoing discussion of the shortcomings of the StructuredText +'standard'. This discussion has been going on since the inception of +the mailing list in 1996, and possibly predates it. + +I decided to modify the original module with my own extensions and +some suggested by the Doc-SIG members. I soon realized that the +module was not written with extension in mind, so I embarked upon a +general reworking, including adapting it to the 're' regular +expression module (the original inspiration for the name of this +project). Soon after I completed the modifications, I discovered that +StructuredText.py was up to version 1.23 in the ZOPE distribution. +Implementing the new syntax extensions from version 1.23 proved to be +an exercise in frustration, as the complexity of the module had become +overwhelming. + +In 2000, development on StructuredTextNG_ ("Next Generation") began at +`Zope Corporation`_ (then Digital Creations). It seems to have many +improvements, but still suffers from many of the problems of classic +StructuredText. + +I decided that a complete rewrite was in order, and even started a +`reStructuredText SourceForge project`_ (now inactive). My +motivations (the 'itches' I aim to 'scratch') are as follows: + +- I need a standard format for inline documentation of the programs I + write. This inline documentation has to be convertible to other + useful formats, such as HTML. I believe many others have the same + need. + +- I believe in the Setext/StructuredText idea and want to help + formalize the standard. However, I feel the current specifications + and implementations have flaws that desperately need fixing. + +- reStructuredText could form part of the foundation for a + documentation extraction and processing system, greatly benefitting + Python. But it is only a part, not the whole. reStructuredText is + a markup language specification and a reference parser + implementation, but it does not aspire to be the entire system. I + don't want reStructuredText or a hypothetical Python documentation + processor to die stillborn because of overambition. + +- Most of all, I want to help ease the documentation chore, the bane + of many a programmer. + +Unfortunately I was sidetracked and stopped working on this project. +In November 2000 I made the time to enumerate the problems of +StructuredText and possible solutions, and complete the first draft of +a specification. This first draft was posted to the Doc-SIG in three +parts: + +- `A Plan for Structured Text`__ +- `Problems With StructuredText`__ +- `reStructuredText: Revised Structured Text Specification`__ + +__ http://mail.python.org/pipermail/doc-sig/2000-November/001239.html +__ http://mail.python.org/pipermail/doc-sig/2000-November/001240.html +__ http://mail.python.org/pipermail/doc-sig/2000-November/001241.html + +In March 2001 a flurry of activity on the Doc-SIG spurred me to +further revise and refine my specification, the result of which you +are now reading. An offshoot of the reStructuredText project has been +the realization that a single markup scheme, no matter how well +thought out, may not be enough. In order to tame the endless debates +on Doc-SIG, a flexible `Docstring Processing System framework`_ needed +to be constructed. This framework has become the more important of +the two projects; reStructuredText_ has found its place as one +possible choice for a single component of the larger framework. + +The project web site and the first project release were rolled out in +June 2001, including posting the second draft of the spec [#spec-2]_ +and the first draft of PEPs 256, 257, and 258 [#peps-1]_ to the +Doc-SIG. These documents and the project implementation proceeded to +evolve at a rapid pace. Implementation history details can be found +in the project file, HISTORY.txt_. + +In November 2001, the reStructuredText parser was nearing completion. +Development of the parser continued with the addition of small +convenience features, improvements to the syntax, the filling in of +gaps, and bug fixes. After a long holiday break, in early 2002 most +development moved over to the other Docutils components, the +"Readers", "Writers", and "Transforms". A "standalone" reader +(processes standalone text file documents) was completed in February, +and a basic HTML writer (producing HTML 4.01, using CSS-1) was +completed in early March. + +`PEP 287`_, "reStructuredText Standard Docstring Format", was created +to formally propose reStructuredText as a standard format for Python +docstrings, PEPs, and other files. It was first posted to +comp.lang.python_ and the Python-dev_ mailing list on 2002-04-02. + +Version 0.4 of the reStructuredText__ and `Docstring Processing +System`_ projects were released in April 2002. The two projects were +immediately merged, renamed to "Docutils_", and a 0.1 release soon +followed. + +.. __: `reStructuredText SourceForge project`_ + +.. [#spec-2] + - `An Introduction to reStructuredText`__ + - `Problems With StructuredText`__ + - `reStructuredText Markup Specification`__ + - `Python Extensions to the reStructuredText Markup + Specification`__ + + __ http://mail.python.org/pipermail/doc-sig/2001-June/001858.html + __ http://mail.python.org/pipermail/doc-sig/2001-June/001859.html + __ http://mail.python.org/pipermail/doc-sig/2001-June/001860.html + __ http://mail.python.org/pipermail/doc-sig/2001-June/001861.html + +.. [#peps-1] + - `PEP 256: Docstring Processing System Framework`__ + - `PEP 258: DPS Generic Implementation Details`__ + - `PEP 257: Docstring Conventions`__ + + Current working versions of the PEPs can be found in + http://docutils.sourceforge.net/spec/, and official versions can be + found in the `master PEP repository`_. + + __ http://mail.python.org/pipermail/doc-sig/2001-June/001855.html + __ http://mail.python.org/pipermail/doc-sig/2001-June/001856.html + __ http://mail.python.org/pipermail/doc-sig/2001-June/001857.html + + +.. _Zope Corporation: http://www.zope.com +.. _ZOPE: http://www.zope.org +.. _reStructuredText SourceForge project: + http://structuredtext.sourceforge.net/ +.. _pythondoc: http://starship.python.net/crew/danilo/pythondoc/ +.. _StructuredTextNG: + http://dev.zope.org/Members/jim/StructuredTextWiki/StructuredTextNG +.. _HISTORY.txt: + http://docutils.sourceforge.net/HISTORY.txt +.. _PEP 287: http://docutils.sourceforge.net/spec/pep-0287.txt +.. _Docstring Processing System framework: + http://docutils.sourceforge.net/spec/pep-0256.txt +.. _comp.lang.python: news:comp.lang.python +.. _Python-dev: http://mail.python.org/pipermail/python-dev/ +.. _Docstring Processing System: http://docstring.sourceforge.net/ +.. _Docutils: http://docutils.sourceforge.net/ +.. _master PEP repository: http://www.python.org/peps/ + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/ref/rst/restructuredtext.txt b/docs/ref/rst/restructuredtext.txt new file mode 100644 index 000000000..149ef3fd4 --- /dev/null +++ b/docs/ref/rst/restructuredtext.txt @@ -0,0 +1,2344 @@ +======================================= + reStructuredText Markup Specification +======================================= +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ + +reStructuredText_ is plain text that uses simple and intuitive +constructs to indicate the structure of a document. These constructs +are equally easy to read in raw and processed forms. This document is +itself an example of reStructuredText (raw, if you are reading the +text file, or processed, if you are reading an HTML document, for +example). The reStructuredText parser is a component of Docutils_. + +Simple, implicit markup is used to indicate special constructs, such +as section headings, bullet lists, and emphasis. The markup used is +as minimal and unobtrusive as possible. Less often-used constructs +and extensions to the basic reStructuredText syntax may have more +elaborate or explicit markup. + +reStructuredText is applicable to documents of any length, from the +very small (such as inline program documentation fragments, e.g. +Python docstrings) to the quite large (this document). + +The first section gives a quick overview of the syntax of the +reStructuredText markup by example. A complete specification is given +in the `Syntax Details`_ section. + +`Literal blocks`_ (in which no markup processing is done) are used for +examples throughout this document, to illustrate the plain text +markup. + + +.. contents:: + + +----------------------- + Quick Syntax Overview +----------------------- + +A reStructuredText document is made up of body or block-level +elements, and may be structured into sections. Sections_ are +indicated through title style (underlines & optional overlines). +Sections contain body elements and/or subsections. Some body elements +contain further elements, such as lists containing list items, which +in turn may contain paragraphs and other body elemens. Others, such +as paragraphs, contain text and `inline markup`_ elements. + +Here are examples of `body elements`_: + +- Paragraphs_ (and `inline markup`_):: + + Paragraphs contain text and may contain inline markup: + *emphasis*, **strong emphasis**, `interpreted text`, ``inline + literals``, standalone hyperlinks (http://www.python.org), + external hyperlinks (Python_), internal cross-references + (example_), footnote references ([1]_), citation references + ([CIT2002]_), substitution references (|example|), and _`inline + hyperlink targets`. + + Paragraphs are separated by blank lines and are left-aligned. + +- Five types of lists: + + 1. `Bullet lists`_:: + + - This is a bullet list. + + - Bullets can be "-", "*", or "+". + + 2. `Enumerated lists`_:: + + 1. This is an enumerated list. + + 2. Enumerators may be arabic numbers, letters, or roman + numerals. + + 3. `Definition lists`_:: + + what + Definition lists associate a term with a definition. + + how + The term is a one-line phrase, and the definition is one + or more paragraphs or body elements, indented relative to + the term. + + 4. `Field lists`_:: + + :what: Field lists map field names to field bodies, like + database records. They are often part of an extension + syntax. + + :how: The field marker is a colon, the field name, optional + field arguments, and a colon. + + The field body may contain one or more body elements, + indented relative to the field marker. + + 5. `Option lists`_, for listing command-line options:: + + -a command-line option "a" + -b file options can have arguments + and long descriptions + --long options can be long also + --input=file long options can also have + arguments + /V DOS/VMS-style options too + + There must be at least two spaces between the option and the + description. + +- `Literal blocks`_:: + + Literal blocks are indented, and indicated with a double-colon + ("::") at the end of the preceding paragraph (right here -->):: + + if literal_block: + text = 'is left as-is' + spaces_and_linebreaks = 'are preserved' + markup_processing = None + +- `Block quotes`_:: + + Block quotes consist of indented body elements: + + This theory, that is mine, is mine. + + Anne Elk (Miss) + +- `Doctest blocks`_:: + + >>> print 'Python-specific usage examples; begun with ">>>"' + Python-specific usage examples; begun with ">>>" + >>> print '(cut and pasted from interactive Python sessions)' + (cut and pasted from interactive Python sessions) + +- Tables_:: + + +------------------------+------------+----------+ + | Header row, column 1 | Header 2 | Header 3 | + +========================+============+==========+ + | body row 1, column 1 | column 2 | column 3 | + +------------------------+------------+----------+ + | body row 2 | Cells may span | + +------------------------+-----------------------+ + +- `Explicit markup blocks`_ all begin with an explicit block marker, + two periods and a space: + + - Footnotes_:: + + .. [1] A footnote contains body elements, consistently + indented by at least 3 spaces. + + - Citations_:: + + .. [CIT2002] Just like a footnote, except the label is + textual. + + - `Hyperlink targets`_:: + + .. _Python: http://www.python.org + + .. _example: + + The "_example" target above points to this paragraph. + + - Directives_:: + + .. image:: mylogo.png + + - `Substitution definitions`_:: + + .. |symbol here| image:: symbol.png + + - Comments_:: + + .. Comments begin with two dots and a space. Anything may + follow, except for the syntax of footnotes/citations, + hyperlink targets, directives, or substitution definitions. + + +---------------- + Syntax Details +---------------- + +Descriptions below list "DTD elements" (XML "generic identifiers") +corresponding to syntax constructs. For details on the hierarchy of +elements, please see `Docutils Document Tree Structure`_ and the +`Generic Plaintext Document Interface DTD`_ XML document type +definition. + + +Whitespace +========== + +Spaces are recommended for indentation_, but tabs may also be used. +Tabs will be converted to spaces. Tab stops are at every 8th column. + +Other whitespace characters (form feeds [chr(12)] and vertical tabs +[chr(11)]) are converted to single spaces before processing. + + +Blank Lines +----------- + +Blank lines are used to separate paragraphs and other elements. +Multiple successive blank lines are equivalent to a single blank line, +except within literal blocks (where all whitespace is preserved). +Blank lines may be omitted when the markup makes element separation +unambiguous, in conjunction with indentation. The first line of a +document is treated as if it is preceded by a blank line, and the last +line of a document is treated as if it is followed by a blank line. + + +Indentation +----------- + +Indentation is used to indicate, and is only significant in +indicating: + +- multi-line contents of list items, +- multiple body elements within a list item (including nested lists), +- the definition part of a definition list item, +- block quotes, +- the extent of literal blocks, and +- the extent of explicit markup blocks. + +Any text whose indentation is less than that of the current level +(i.e., unindented text or "dedents") ends the current level of +indentation. + +Since all indentation is significant, the level of indentation must be +consistent. For example, indentation is the sole markup indicator for +`block quotes`_:: + + This is a top-level paragraph. + + This paragraph belongs to a first-level block quote. + + Paragraph 2 of the first-level block quote. + +Multiple levels of indentation within a block quote will result in +more complex structures:: + + This is a top-level paragraph. + + This paragraph belongs to a first-level block quote. + + This paragraph belongs to a second-level block quote. + + Another top-level paragraph. + + This paragraph belongs to a second-level block quote. + + This paragraph belongs to a first-level block quote. The + second-level block quote above is inside this first-level + block quote. + +When a paragraph or other construct consists of more than one line of +text, the lines must be left-aligned:: + + This is a paragraph. The lines of + this paragraph are aligned at the left. + + This paragraph has problems. The + lines are not left-aligned. In addition + to potential misinterpretation, warning + and/or error messages will be generated + by the parser. + +Several constructs begin with a marker, and the body of the construct +must be indented relative to the marker. For constructs using simple +markers (`bullet lists`_, `enumerated lists`_, footnotes_, citations_, +`hyperlink targets`_, directives_, and comments_), the level of +indentation of the body is determined by the position of the first +line of text, which begins on the same line as the marker. For +example, bullet list bodies must be indented by at least two columns +relative to the left edge of the bullet:: + + - This is the first line of a bullet list + item's paragraph. All lines must align + relative to the first line. [1]_ + + This indented paragraph is interpreted + as a block quote. + + Because it is not sufficiently indented, + this paragraph does not belong to the list + item. + + .. [1] Here's a footnote. The second line is aligned + with the beginning of the footnote label. The ".." + marker is what determines the indentation. + +For constructs using complex markers (`field lists`_ and `option +lists`_), where the marker may contain arbitrary text, the indentation +of the first line *after* the marker determines the left edge of the +body. For example, field lists may have very long markers (containing +the field names):: + + :Hello: This field has a short field name, so aligning the field + body with the first line is feasible. + + :Number-of-African-swallows-requried-to-carry-a-coconut: It would + be very difficult to align the field body with the left edge + of the first line. It may even be preferable not to begin the + body on the same line as the marker. + + +Escaping Mechanism +================== + +The character set universally available to plain text documents, 7-bit +ASCII, is limited. No matter what characters are used for markup, +they will already have multiple meanings in written text. Therefore +markup characters *will* sometimes appear in text **without being +intended as markup**. Any serious markup system requires an escaping +mechanism to override the default meaning of the characters used for +the markup. In reStructuredText we use the backslash, commonly used +as an escaping character in other domains. + +A backslash followed by any character escapes that character. The +escaped character represents the character itself, and is prevented +from playing a role in any markup interpretation. The backslash is +removed from the output. A literal backslash is represented by two +backslashes in a row (the first backslash "escapes" the second, +preventing it being interpreted in an "escaping" role). + +There are two contexts in which backslashes have no special meaning: +literal blocks and inline literals. In these contexts, a single +backslash represents a literal backslash, without having to double up. + +Please note that the reStructuredText specification and parser do not +address the issue of the representation or extraction of text input +(how and in what form the text actually *reaches* the parser). +Backslashes and other characters may serve a character-escaping +purpose in certain contexts and must be dealt with appropriately. For +example, Python uses backslashes in strings to escape certain +characters, but not others. The simplest solution when backslashes +appear in Python docstrings is to use raw docstrings:: + + r"""This is a raw docstring. Backslashes (\) are not touched.""" + + +Reference Names +=============== + +Simple reference names are single words consisting of alphanumerics +plus internal hypens, underscores, and periods; no whitespace or other +characters are allowed. Footnote labels (Footnotes_ & `Footnote +References`_), citation labels (Citations_ & `Citation References`_), +`interpreted text`_ roles, and some `hyperlink references`_ use the +simple reference name syntax. + +Reference names using punctuation or whose names are phrases (two or +more space-separated words) are called "phrase-references". +Phrase-references are expressed by enclosing the phrase in backquotes +and treating the backquoted text as a reference name:: + + Want to learn about `my favorite programming language`_? + + .. _my favorite programming language: http://www.python.org + +Simple reference names may also optionally use backquotes. + +Reference names are whitespace-neutral and case-insensitive. When +resolving reference names internally: + +- whitespace is normalized (one or more spaces, horizontal or vertical + tabs, newlines, carriage returns, or form feeds, are interpreted as + a single space), and + +- case is normalized (all alphabetic characters are converted to + lowercase). + +For example, the following `hyperlink references`_ are equivalent:: + + - `A HYPERLINK`_ + - `a hyperlink`_ + - `A + Hyperlink`_ + +Hyperlinks_, footnotes_, and citations_ all share the same namespace +for reference names. The labels of citations (simple reference names) +and manually-numbered footnotes (numbers) are entered into the same +database as other hyperlink names. This means that a footnote +(defined as "``.. [1]``") which can be referred to by a footnote +reference (``[1]_``), can also be referred to by a plain hyperlink +reference (1_). Of course, each type of reference (hyperlink, +footnote, citation) may be processed and rendered differently. Some +care should be taken to avoid reference name conflicts. + + +Document Structure +================== + +Document +-------- + +DTD element: document. + +The top-level element of a parsed reStructuredText document is the +"document" element. After initial parsing, the document element is a +simple container for a document fragment, consisting of `body +elements`_, transitions_, and sections_, but lacking a document title +or other bibliographic elements. The code that calls the parser may +choose to run one or more optional post-parse transforms_, +rearranging the document fragment into a complete document with a +title and possibly other metadata elements (author, date, etc.; see +`Bibliographic Fields`_). + +Specifically, there is no way to specify a document title and subtitle +explicitly in reStructuredText. Instead, a lone top-level section +title (see Sections_ below) can be treated as the document +title. Similarly, a lone second-level section title immediately after +the "document title" can become the document subtitle. See the +`DocTitle transform`_ for details. + + +Sections +-------- + +DTD elements: section, title. + +Sections are identified through their titles, which are marked up with +adornment: "underlines" below the title text, and, in some cases, +matching "overlines" above the title. An underline/overline is a +single repeated punctuation character that begins in column 1 and +forms a line extending at least as far as the right edge of the title +text. Specifically, an underline/overline character may be any +non-alphanumeric printable 7-bit ASCII character [#]_. An +underline/overline must be at least 4 characters long (to avoid +mistaking ellipses ["..."] for overlines). When an overline is used, +the length and character used must match the underline. There may be +any number of levels of section titles, although some output formats +may have limits (HTML has 6 levels). + +.. [#] The following are all valid section title adornment + characters:: + + ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~ + + Some characters are more suitable than others. The following are + recommended:: + + = - ` : ' " ~ ^ _ * + # < > + +Rather than imposing a fixed number and order of section title +adornment styles, the order enforced will be the order as encountered. +The first style encountered will be an outermost title (like HTML H1), +the second style will be a subtitle, the third will be a subsubtitle, +and so on. + +Below are examples of section title styles:: + + =============== + Section Title + =============== + + --------------- + Section Title + --------------- + + Section Title + ============= + + Section Title + ------------- + + Section Title + ````````````` + + Section Title + ''''''''''''' + + Section Title + ............. + + Section Title + ~~~~~~~~~~~~~ + + Section Title + ************* + + Section Title + +++++++++++++ + + Section Title + ^^^^^^^^^^^^^ + +When a title has both an underline and an overline, the title text may +be inset, as in the first two examples above. This is merely +aesthetic and not significant. Underline-only title text may *not* be +inset. + +A blank line after a title is optional. All text blocks up to the +next title of the same or higher level are included in a section (or +subsection, etc.). + +All section title styles need not be used, nor need any specific +section title style be used. However, a document must be consistent +in its use of section titles: once a hierarchy of title styles is +established, sections must use that hierarchy. + +Each section title automatically generates a hyperlink target pointing +to the section. The text of the hyperlink target (the "reference +name") is the same as that of the section title. See `Implicit +Hyperlink Targets`_ for a complete description. + +Sections may contain `body elements`_, transitions_, and nested +sections. + + +Transitions +----------- + +DTD element: transition. + + Instead of subheads, extra space or a type ornament between + paragraphs may be used to mark text divisions or to signal + changes in subject or emphasis. + + (The Chicago Manual of Style, 14th edition, section 1.80) + +Transitions are commonly seen in novels and short fiction, as a gap +spanning one or more lines, with or without a type ornament such as a +row of asterisks. Transitions separate other body elements. A +transition should not begin or end a section or document, nor should +two transitions be immediately adjacent. + +The syntax for a transition marker is a horizontal line of 4 or more +repeated punctuation characters. The syntax is the same as section +title underlines without title text. Transition markers require blank +lines before and after:: + + Para. + + ---------- + + Para. + +Unlike section title underlines, no hierarchy of transition markers is +enforced, nor do differences in transition markers accomplish +anything. It is recommended that a single consistent style be used. + +The processing system is free to render transitions in output in any +way it likes. For example, horizontal rules (``<HR>``) in HTML output +would be an obvious choice. + + +Body Elements +============= + +Paragraphs +---------- + +DTD element: paragraph. + +Paragraphs consist of blocks of left-aligned text with no markup +indicating any other body element. Blank lines separate paragraphs +from each other and from other body elements. Paragraphs may contain +`inline markup`_. + +Syntax diagram:: + + +------------------------------+ + | paragraph | + | | + +------------------------------+ + + +------------------------------+ + | paragraph | + | | + +------------------------------+ + + +Bullet Lists +------------ + +DTD elements: bullet_list, list_item. + +A text block which begins with a "-", "*", or "+", followed by +whitespace, is a bullet list item (a.k.a. "unordered" list item). +List item bodies must be left-aligned and indented relative to the +bullet; the text immediately after the bullet determines the +indentation. For example:: + + - This is the first bullet list item. The blank line above the + first list item is required; blank lines between list items + (such as below this paragraph) are optional. + + - This is the first paragraph in the second item in the list. + + This is the second paragraph in the second item in the list. + The blank line above this paragraph is required. The left edge + of this paragraph lines up with the paragraph above, both + indented relative to the bullet. + + - This is a sublist. The bullet lines up with the left edge of + the text blocks above. A sublist is a new list so requires a + blank line above and below. + + - This is the third item of the main list. + + This paragraph is not part of the list. + +Here are examples of **incorrectly** formatted bullet lists:: + + - This first line is fine. + A blank line is required between list items and paragraphs. + (Warning) + + - The following line appears to be a new sublist, but it is not: + - This is a paragraph contination, not a sublist (since there's + no blank line). This line is also incorrectly indented. + - Warnings may be issued by the implementation. + +Syntax diagram:: + + +------+-----------------------+ + | "- " | list item | + +------| (body elements)+ | + +-----------------------+ + + +Enumerated Lists +---------------- + +DTD elements: enumerated_list, list_item. + +Enumerated lists (a.k.a. "ordered" lists) are similar to bullet lists, +but use enumerators instead of bullets. An enumerator consists of an +enumeration sequence member and formatting, followed by whitespace. +The following enumeration sequences are recognized: + +- arabic numerals: 1, 2, 3, ... (no upper limit). +- uppercase alphabet characters: A, B, C, ..., Z. +- lower-case alphabet characters: a, b, c, ..., z. +- uppercase Roman numerals: I, II, III, IV, ..., MMMMCMXCIX (4999). +- lowercase Roman numerals: i, ii, iii, iv, ..., mmmmcmxcix (4999). + +The following formatting types are recognized: + +- suffixed with a period: "1.", "A.", "a.", "I.", "i.". +- surrounded by parentheses: "(1)", "(A)", "(a)", "(I)", "(i)". +- suffixed with a right-parenthesis: "1)", "A)", "a)", "I)", "i)". + +A system message will be generated for each of the following cases: + +- The enumerators do not all have the same format and sequence type. + +- The enumerators are not in sequence (i.e., "1.", "3." generates a + level-1 [info] system message and produces two separate lists). + +It is recommended that the enumerator of the first list item be +ordinal-1 ("1", "A", "a", "I", or "i"). Although other start-values +will be recognized, they may not be supported by the output format. + +Lists using Roman numerals must begin with "I"/"i" or a +multi-character value, such as "II" or "XV". Any other +single-character Roman numeral ("V", "X", etc.) will be interpreted as +a letter of the alphabet, not as a Roman numeral. Likewise, lists +using letters of the alphabet may not begin with "I"/"i", since these +are recognized as Roman numeral 1. + +Nested enumerated lists must be created with indentation. For +example:: + + 1. Item 1. + + a) Item 1a. + b) Item 1b. + +Example syntax diagram:: + + +-------+----------------------+ + | "1. " | list item | + +-------| (body elements)+ | + +----------------------+ + + +Definition Lists +---------------- + +DTD elements: definition_list, definition_list_item, term, classifier, +definition. + +Each definition list item contains a term, an optional classifier, and +a definition. A term is a simple one-line word or phrase. An +optional classifier may follow the term on the same line, after " : " +(space, colon, space). A definition is a block indented relative to +the term, and may contain multiple paragraphs and other body elements. +There may be no blank line between a term and a definition (this +distinguishes definition lists from `block quotes`_). Blank lines are +required before the first and after the last definition list item, but +are optional in-between. For example:: + + term 1 + Definition 1. + + term 2 + Definition 2, paragraph 1. + + Definition 2, paragraph 2. + + term 3 : classifier + Definition 3. + +A definition list may be used in various ways, including: + +- As a dictionary or glossary. The term is the word itself, a + classifier may be used to indicate the usage of the term (noun, + verb, etc.), and the definition follows. + +- To describe program variables. The term is the variable name, a + classifier may be used to indicate the type of the variable (string, + integer, etc.), and the definition describes the variable's use in + the program. This usage of definition lists supports the classifier + syntax of Grouch_, a system for describing and enforcing a Python + object schema. + +Syntax diagram:: + + +---------------------------+ + | term [ " : " classifier ] | + +--+------------------------+--+ + | definition | + | (body elements)+ | + +---------------------------+ + + +Field Lists +----------- + +DTD elements: field_list, field, field_name, field_argument, +field_body. + +Field lists are mappings from field names to field bodies, modeled on +RFC822_ headers. A field name is made up of one or more letters, +numbers, and punctuation, except colons (":") and whitespace. Field +names are case-insensitive. There may be additional data separated +from the field name, called field arguments. The field name and +optional field argument(s), along with a single colon prefix and +suffix, together form the field marker. The field marker is followed +by whitespace and the field body. The field body may contain multiple +body elements, indented relative to the field marker. The first line +after the field name marker determines the indentation of the field +body. For example:: + + :Date: 2001-08-16 + :Version: 1 + :Authors: - Me + - Myself + - I + :Indentation: Since the field marker may be quite long, the second + and subsequent lines of the field body do not have to line up + with the first line, but they must be indented relative to the + field name marker, and they must line up with each other. + :Parameter i: integer + +Field arguments are separated from the field name and each other by +whitespace, and may not contain colons (":"). The interpretation of +field arguments is up to the application. For example:: + + :name1 word number=1: + Both "word" and "number=1" are single words. + +The syntax for field arguments may be extended in the future. For +example, quoted phrases may be treated as a single argument, and +direct support for the "name=value" syntax may be added. + +Applications of reStructuredText may recognize field names and +transform fields or field bodies in certain contexts; they are often +used as part of an extension syntax. See `Bibliographic Fields`_ +below for one example, or the "image" directive in `reStructuredText +Directives`_ for another. + +Standard RFC822 headers cannot be used for this construct because they +are ambiguous. A word followed by a colon at the beginning of a line +is common in written text. However, in well-defined contexts such as +when a field list invariably occurs at the beginning of a document +(PEPs and email messages), standard RFC822 headers could be used. + +Syntax diagram (simplified):: + + +------------------------------+------------+ + | ":" name (" " argument)* ":" | field body | + +-------+----------------------+ | + | (body elements)+ | + +-----------------------------------+ + + +Bibliographic Fields +```````````````````` + +DTD elements: docinfo, author, authors, organization, contact, +version, status, date, copyright, topic. + +When a field list is the first non-comment element in a document +(after the document title, if there is one), it may have certain +specific fields transformed to document bibliographic data. This +bibliographic data corresponds to the front matter of a book, such as +the title page and copyright page. + +Certain field names (listed below) are recognized and transformed to +the corresponding DTD elements, most becoming child elements of the +"docinfo" element. No ordering is required of these fields, although +they may be rearranged to fit the document structure, as noted. +Unless otherwise indicated in the list below, each of the +bibliographic elements' field bodies may contain a single paragraph +only. Field bodies may be checked for `RCS keywords`_ and cleaned up. +Any unrecognized fields will remain in a generic field list in the +document body. + +The registered bibliographic field names and their corresponding DTD +elements are as follows: + +- Field name "Author": author element. +- "Authors": authors. May contain either: a single paragraph + consisting of a list of authors, separated by ";" or ","; or a + bullet list whose elements each contain a single paragraph per + author. +- "Organization": organization. +- "Contact": contact. +- "Version": version. +- "Status": status. +- "Date": date. +- "Copyright": copyright. +- "Abstract": topic. May contain arbitrary body elements. Only one + abstract is allowed. The abstract becomes a topic element with + title "Abstract" (or language equivalent) immediately following the + docinfo element. + +This field-name-to-element mapping can be extended, or replaced for +other languages. See the `DocInfo transform`_ implementation +documentation for details. + + +RCS Keywords +```````````` + +`Bibliographic fields`_ recognized by the parser are normally checked +for RCS [#]_ keywords and cleaned up [#]_. RCS keywords may be +entered into source files as "$keyword$", and once stored under RCS or +CVS [#]_, they are expanded to "$keyword: expansion text $". For +example, a "Status" field will be transformed to a "status" element:: + + :Status: $keyword: expansion text $ + +.. [#] Revision Control System. +.. [#] RCS keyword processing can be turned off (unimplemented). +.. [#] Concurrent Versions System. CVS uses the same keywords as RCS. + +Processed, the "status" element's text will become simply "expansion +text". The dollar sign delimiters and leading RCS keyword name are +removed. + +The RCS keyword processing only kicks in when all of these conditions +hold: + +1. The field list is in bibliographic context (first non-comment + contstruct in the document, after a document title if there is + one). + +2. The field name is a recognized bibliographic field name. + +3. The sole contents of the field is an expanded RCS keyword, of the + form "$Keyword: data $". + + +Option Lists +------------ + +DTD elements: option_list, option_list_item, option_group, option, +option_string, option_argument, description. + +Option lists are two-column lists of command-line options and +descriptions, documenting a program's options. For example:: + + -a Output all. + -b Output both (this description is + quite long). + -c arg Output just arg. + --long Output all day long. + + -p This option has two paragraphs in the description. + This is the first. + + This is the second. Blank lines may be omitted between + options (as above) or left in (as here and below). + + --very-long-option A VMS-syle option. Note the adjustment for + the required two spaces. + + --an-even-longer-option + The description can also start on the next line. + + -2, --two This option has two variants. + + -f FILE, --file=FILE These two options are synonyms; both have + arguments. + + /V A VMS/DOS-style option. + +There are several types of options recognized by reStructuredText: + +- Short POSIX options consist of one dash and an option letter. +- Long POSIX options consist of two dashes and an option word; some + systems use a single dash. +- Old GNU-style "plus" options consist of one plus and an option + letter ("plus" options are deprecated now, their use discouraged). +- DOS/VMS options consist of a slash and an option letter or word. + +Please note that both POSIX-style and DOS/VMS-style options may be +used by DOS or Windows software. These and other variations are +sometimes used mixed together. The names above have been chosen for +convenience only. + +The syntax for short and long POSIX options is based on the syntax +supported by Python's getopt.py_ module, which implements an option +parser similar to the `GNU libc getopt_long()`_ function but with some +restrictions. There are many variant option systems, and +reStructuredText option lists do not support all of them. + +Although long POSIX and DOS/VMS option words may be allowed to be +truncated by the operating system or the application when used on the +command line, reStructuredText option lists do not show or support +this with any special syntax. The complete option word should be +given, supported by notes about truncation if and when applicable. + +Options may be followed by an argument placeholder, whose role and +syntax should be explained in the description text. Either a space or +an equals sign may be used as a delimiter between options and option +argument placeholders. + +Multiple option "synonyms" may be listed, sharing a single +description. They must be separated by comma-space. + +There must be at least two spaces between the option(s) and the +description. The description may contain multiple body elements. The +first line after the option marker determines the indentation of the +description. As with other types of lists, blank lines are required +before the first option list item and after the last, but are optional +between option entries. + +Syntax diagram (simplified):: + + +----------------------------+-------------+ + | option [" " argument] " " | description | + +-------+--------------------+ | + | (body elements)+ | + +----------------------------------+ + + +Literal Blocks +-------------- + +DTD element: literal_block. + +A paragraph consisting of two colons ("::") signifies that all +following **indented** text blocks comprise a literal block. No +markup processing is done within a literal block. It is left as-is, +and is typically rendered in a monospaced typeface:: + + This is a typical paragraph. A literal block follows. + + :: + + for a in [5,4,3,2,1]: # this is program code, shown as-is + print a + print "it's..." + # a literal block continues until the indentation ends + + This text has returned to the indentation of the first paragraph, + is outside of the literal block, and is therefore treated as an + ordinary paragraph. + +The paragraph containing only "::" will be completely removed from the +output; no empty paragraph will remain. + +As a convenience, the "::" is recognized at the end of any paragraph. +If immediately preceded by whitespace, both colons will be removed +from the output (this is the "partially minimized" form). When text +immediately precedes the "::", *one* colon will be removed from the +output, leaving only one colon visible (i.e., "::" will be replaced by +":"; this is the "fully minimized" form). + +In other words, these are all equivalent (please pay attention to the +colons after "Paragraph"): + +1. Expanded form:: + + Paragraph: + + :: + + Literal block + +2. Partially minimized form:: + + Paragraph: :: + + Literal block + +3. Fully minimized form:: + + Paragraph:: + + Literal block + +The minimum leading whitespace will be removed from each line of the +literal block. Other than that, all whitespace (including line +breaks) is preserved. Blank lines are required before and after a +literal block, but these blank lines are not included as part of the +literal block. + +Syntax diagram:: + + +------------------------------+ + | paragraph | + | (ends with "::") | + +------------------------------+ + +---------------------------+ + | literal block | + +---------------------------+ + + +Block Quotes +------------ + +DTD element: block_quote. + +A text block that is indented relative to the preceding text, without +markup indicating it to be a literal block, is a block quote. All +markup processing (for body elements and inline markup) continues +within the block quote:: + + This is an ordinary paragraph, introducing a block quote. + + "It is my business to know things. That is my trade." + + -- Sherlock Holmes + +Blank lines are required before and after a block quote, but these +blank lines are not included as part of the block quote. + +Syntax diagram:: + + +------------------------------+ + | (current level of | + | indentation) | + +------------------------------+ + +---------------------------+ + | block quote | + | (body elements)+ | + +---------------------------+ + + +Doctest Blocks +-------------- + +DTD element: doctest_block. + +Doctest blocks are interactive Python sessions cut-and-pasted into +docstrings. They are meant to illustrate usage by example, and +provide an elegant and powerful testing environment via the `doctest +module`_ in the Python standard library. + +Doctest blocks are text blocks which begin with ``">>> "``, the Python +interactive interpreter main prompt, and end with a blank line. +Doctest blocks are treated as a special case of literal blocks, +without requiring the literal block syntax. If both are present, the +literal block syntax takes priority over Doctest block syntax:: + + This is an ordinary paragraph. + + >>> print 'this is a Doctest block' + this is a Doctest block + + The following is a literal block:: + + >>> This is not recognized as a doctest block by + reStructuredText. It *will* be recognized by the doctest + module, though! + +Indentation is not required for doctest blocks. + + +Tables +------ + +DTD elements: table, tgroup, colspec, thead, tbody, row, entry. + +Tables are described with a visual outline made up of the characters +"-", "=", "|", and "+". The hyphen ("-") is used for horizontal lines +(row separators). The equals sign ("=") may be used to separate +optional header rows from the table body. The vertical bar ("|") is +used for vertical lines (column separators). The plus sign ("+") is +used for intersections of horizontal and vertical lines. + +Each table cell is treated as a miniature document; the top and bottom +cell boundaries act as delimiting blank lines. Each cell contains +zero or more body elements. Cell contents may include left and/or +right margins, which are removed before processing. Example:: + + +------------------------+------------+----------+----------+ + | Header row, column 1 | Header 2 | Header 3 | Header 4 | + | (header rows optional) | | | | + +========================+============+==========+==========+ + | body row 1, column 1 | column 2 | column 3 | column 4 | + +------------------------+------------+----------+----------+ + | body row 2 | Cells may span columns. | + +------------------------+------------+---------------------+ + | body row 3 | Cells may | - Table cells | + +------------------------+ span rows. | - contain | + | body row 4 | | - body elements. | + +------------------------+------------+---------------------+ + +As with other body elements, blank lines are required before and after +tables. Tables' left edges should align with the left edge of +preceding text blocks; otherwise, the table is considered to be part +of a block quote. + +Some care must be taken with tables to avoid undesired interactions +with cell text in rare cases. For example, the following table +contains a cell in row 2 spanning from column 2 to column 4:: + + +--------------+----------+-----------+-----------+ + | row 1, col 1 | column 2 | column 3 | column 4 | + +--------------+----------+-----------+-----------+ + | row 2 | | + +--------------+----------+-----------+-----------+ + | row 3 | | | | + +--------------+----------+-----------+-----------+ + +If a vertical bar is used in the text of that cell, it could have +unintended effects if accidentally aligned with column boundaries:: + + +--------------+----------+-----------+-----------+ + | row 1, col 1 | column 2 | column 3 | column 4 | + +--------------+----------+-----------+-----------+ + | row 2 | Use the command ``ls | more``. | + +--------------+----------+-----------+-----------+ + | row 3 | | | | + +--------------+----------+-----------+-----------+ + +Several solutions are possible. All that is needed is to break the +continuity of the cell outline rectangle. One possibility is to shift +the text by adding an extra space before:: + + +--------------+----------+-----------+-----------+ + | row 1, col 1 | column 2 | column 3 | column 4 | + +--------------+----------+-----------+-----------+ + | row 2 | Use the command ``ls | more``. | + +--------------+----------+-----------+-----------+ + | row 3 | | | | + +--------------+----------+-----------+-----------+ + +Another possibility is to add an extra line to row 2:: + + +--------------+----------+-----------+-----------+ + | row 1, col 1 | column 2 | column 3 | column 4 | + +--------------+----------+-----------+-----------+ + | row 2 | Use the command ``ls | more``. | + | | | + +--------------+----------+-----------+-----------+ + | row 3 | | | | + +--------------+----------+-----------+-----------+ + + +Explicit Markup Blocks +---------------------- + +An explicit markup block is a text block: + +- whose first line begins with ".." followed by whitespace (the + "explicit markup start"), +- whose second and subsequent lines (if any) are indented relative to + the first, and +- which ends before an unindented line. + +Explicit markup blocks are analogous to bullet list items, with ".." +as the bullet. The text immediately after the explicit markup start +determines the indentation of the block body. Blank lines are +required between explicit markup blocks and other elements, but are +optional between explicit markup blocks where unambiguous. + +The explicit markup syntax is used for footnotes, citations, hyperlink +targets, directives, and comments. + + +Footnotes +````````` + +DTD elements: footnote, label. + +Each footnote consists of an explicit markup start (".. "), a left +square bracket, the footnote label, a right square bracket, and +whitespace, followed by indented body elements. A footnote label can +be: + +- a whole decimal number consisting of one or more digits, + +- a single "#" (denoting `auto-numbered footnotes`_), + +- a "#" followed by a simple reference name (an `autonumber label`_), + or + +- a single "*" (denoting `auto-symbol footnotes`_). + +If the first body element within a footnote is a simple paragraph, it +may begin on the same line as the footnote label. Other elements must +begin on a new line, consistently indented (by at least 3 spaces) and +left-aligned. + +Footnotes may occur anywhere in the document, not only at the end. +Where or how they appear in the processed output depends on the +processing system. + +Here is a manually numbered footnote:: + + .. [1] Body elements go here. + +Each footnote automatically generates a hyperlink target pointing to +itself. The text of the hyperlink target name is the same as that of +the footnote label. `Auto-numbered footnotes`_ generate a number as +their footnote label and reference name. See `Implicit Hyperlink +Targets`_ for a complete description of the mechanism. + +Syntax diagram:: + + +-------+-------------------------+ + | ".. " | "[" label "]" footnote | + +-------+ | + | (body elements)+ | + +-------------------------+ + + +Auto-Numbered Footnotes +....................... + +A number sign ("#") may be used as the first character of a footnote +label to request automatic numbering of the footnote or footnote +reference. + +The first footnote to request automatic numbering is assigned the +label "1", the second is assigned the label "2", and so on (assuming +there are no manually numbered footnotes present; see `Mixed Manual +and Auto-Numbered Footnotes`_ below). A footnote which has +automatically received a label "1" generates an implicit hyperlink +target with name "1", just as if the label was explicitly specified. + +.. _autonumber label: `autonumber labels`_ + +A footnote may specify a label explicitly while at the same time +requesting automatic numbering: ``[#label]``. These labels are called +_`autonumber labels`. Autonumber labels do two things: + +- On the footnote itself, they generate a hyperlink target whose name + is the autonumber label (doesn't include the "#"). + +- They allow an automatically numbered footnote to be referred to more + than once, as a footnote reference or hyperlink reference. For + example:: + + If [#note]_ is the first footnote reference, it will show up as + "[1]". We can refer to it again as [#note]_ and again see + "[1]". We can also refer to it as note_ (an ordinary internal + hyperlink reference). + + .. [#note] This is the footnote labeled "note". + +The numbering is determined by the order of the footnotes, not by the +order of the references. For footnote references without autonumber +labels (``[#]_``), the footnotes and footnote references must be in +the same relative order but need not alternate in lock-step. For +example:: + + [#]_ is a reference to footnote 1, and [#]_ is a reference to + footnote 2. + + .. [#] This is footnote 1. + .. [#] This is footnote 2. + .. [#] This is footnote 3. + + [#]_ is a reference to footnote 3. + +Special care must be taken if footnotes themselves contain +auto-numbered footnote references, or if multiple references are made +in close proximity. Footnotes and references are noted in the order +they are encountered in the document, which is not necessarily the +same as the order in which a person would read them. + + +Auto-Symbol Footnotes +..................... + +An asterisk ("*") may be used for footnote labels to request automatic +symbol generation for footnotes and footnote references. The asterisk +may be the only character in the label. For example:: + + Here is a symbolic footnote reference: [*]_. + + .. [*] This is the footnote. + +A transform will insert symbols as labels into corresponding footnotes +and footnote references. + +The standard Docutils system uses the following symbols for footnote +marks [#]_: + +- asterisk/star ("*") +- dagger (HTML character entity "†") +- double dagger ("‡") +- section mark ("§") +- pilcrow or paragraph mark ("¶") +- number sign ("#") +- spade suit ("♠") +- heart suit ("♥") +- diamond suit ("♦") +- club suit ("♣") + +.. [#] This list was inspired by the list of symbols for "Note + Reference Marks" in The Chicago Manual of Style, 14th edition, + section 12.51. "Parallels" ("\|\|") were given in CMoS instead of + the pilcrow. The last four symbols (the card suits) were added + arbitrarily. + +If more than ten symbols are required, the same sequence will be +reused, doubled and then tripled, and so on ("**" etc.). + + +Mixed Manual and Auto-Numbered Footnotes +........................................ + +Manual and automatic footnote numbering may both be used within a +single document, although the results may not be expected. Manual +numbering takes priority. Only unused footnote numbers are assigned +to auto-numbered footnotes. The following example should be +illustrative:: + + [2]_ will be "2" (manually numbered), + [#]_ will be "3" (anonymous auto-numbered), and + [#label]_ will be "1" (labeled auto-numbered). + + .. [2] This footnote is labeled manually, so its number is fixed. + + .. [#label] This autonumber-labeled footnote will be labeled "1". + It is the first auto-numbered footnote and no other footnote + with label "1" exists. The order of the footnotes is used to + determine numbering, not the order of the footnote references. + + .. [#] This footnote will be labeled "3". It is the second + auto-numbered footnote, but footnote label "2" is already used. + + +Citations +````````` + +Citations are identical to footnotes except that they use only +non-numeric labels such as ``[note]`` or ``[GVR2001]``. Citation +labels are simple `reference names`_ (case-insensitive single words +consisting of alphanumerics plus internal hyphens, underscores, and +periods; no whitespace). Citations may be rendered separately and +differently from footnotes. For example:: + + Here is a citation reference: [CIT2002]_. + + .. [CIT2002] This is the citation. It's just like a footnote, + except the label is textual. + + +.. _hyperlinks: + +Hyperlink Targets +````````````````` + +DTD element: target. + +These are also called _`explicit hyperlink targets`, to differentiate +them from `implicit hyperlink targets`_ defined below. + +Hyperlink targets identify a location within or outside of a document, +which may be linked to by `hyperlink references`_. + +Hyperlink targets may be named or anonymous. Named hyperlink targets +consist of an explicit markup start (".. "), an underscore, the +reference name (no trailing underscore), a colon, whitespace, and a +link block:: + + .. _hyperlink-name: link-block + +Reference names are whitespace-neutral and case-insensitive. See +`Reference Names`_ for details and examples. + +Anonymous hyperlink targets consist of an explicit markup start +(".. "), two underscores, a colon, whitespace, and a link block; there +is no reference name:: + + .. __: anonymous-hyperlink-target-link-block + +An alternate syntax for anonymous hyperlinks consists of two +underscores, a space, and a link block:: + + __ anonymous-hyperlink-target-link-block + +See `Anonymous Hyperlinks`_ below. + +There are three types of hyperlink targets: internal, external, and +indirect. + +1. _`Internal hyperlink targets` have empty link blocks. They provide + an end point allowing a hyperlink to connect one place to another + within a document. An internal hyperlink target points to the + element following the target. For example:: + + Clicking on this internal hyperlink will take us to the target_ + below. + + .. _target: + + The hyperlink target above points to this paragraph. + + Internal hyperlink targets may be "chained". Multiple adjacent + internal hyperlink targets all point to the same element:: + + .. _target1: + .. _target2: + + The targets "target1" and "target2" are synonyms; they both + point to this paragraph. + + If the element "pointed to" is an external hyperlink target (with a + URI in its link block; see #2 below) the URI from the external + hyperlink target is propagated to the internal hyperlink targets; + they will all "point to" the same URI. There is no need to + duplicate a URI. For example, all three of the following hyperlink + targets refer to the same URI:: + + .. _Python DOC-SIG mailing list archive: + .. _archive: + .. _Doc-SIG: http://mail.python.org/pipermail/doc-sig/ + + An inline form of internal hyperlink target is available; see + `Inline Hyperlink Targets`_. + +2. _`External hyperlink targets` have an absolute or relative URI in + their link blocks. For example, take the following input:: + + See the Python_ home page for info. + + .. _Python: http://www.python.org + + After processing into HTML, the hyperlink might be expressed as:: + + See the <A HREF="http://www.python.org">Python</A> home page + for info. + + An external hyperlink's URI may begin on the same line as the + explicit markup start and target name, or it may begin in an + indented text block immediately following, with no intervening + blank lines. If there are multiple lines in the link block, they + are stripped of leading and trailing whitespace and concatenated. + The following external hyperlink targets are equivalent:: + + .. _one-liner: http://docutils.sourceforge.net/rst.html + + .. _starts-on-this-line: http:// + docutils.sourceforge.net/rst.html + + .. _entirely-below: + http://docutils. + sourceforge.net/rst.html + + If an external hyperlink target's URI contains an underscore as its + last character, it must be escaped to avoid being mistaken for an + indirect hyperlink target:: + + This link_ refers to a file called ``underscore_``. + + .. _link: underscore\_ + +3. _`Indirect hyperlink targets` have a hyperlink reference in their + link blocks. In the following example, target "one" indirectly + references whatever target "two" references, and target "two" + references target "three", an internal hyperlink target. In + effect, all three reference the same thing:: + + .. _one: two_ + .. _two: three_ + .. _three: + + Just as with `hyperlink references`_ anywhere else in a document, + if a phrase-reference is used in the link block it must be enclosed + in backquotes. As with `external hyperlink targets`_, the link + block of an indirect hyperlink target may begin on the same line as + the explicit markup start or the next line. It may also be split + over multiple lines, in which case the lines are joined with + whitespace before being normalized. + + For example, the following indirect hyperlink targets are + equivalent:: + + .. _one-liner: `A HYPERLINK`_ + .. _entirely-below: + `a hyperlink`_ + .. _split: `A + Hyperlink`_ + +If a reference name contains a colon followed by whitespace, either: + +- the phrase must be enclosed in backquotes:: + + .. _`FAQTS: Computers: Programming: Languages: Python`: + http://python.faqts.com/ + +- or the colon(s) must be backslash-escaped in the link target:: + + .. _Chapter One\: "Tadpole Days": + + It's not easy being green... + +See `Implicit Hyperlink Targets`_ below for the resolution of +duplicate reference names. + +Syntax diagram:: + + +-------+----------------------+ + | ".. " | "_" name ":" link | + +-------+ block | + | | + +----------------------+ + + +Anonymous Hyperlinks +.................... + +The `World Wide Web Consortium`_ recommends in its `HTML Techniques +for Web Content Accessibility Guidelines`_ that authors should +"clearly identify the target of each link." Hyperlink references +should be as verbose as possible, but duplicating a verbose hyperlink +name in the target is onerous and error-prone. Anonymous hyperlinks +are designed to allow convenient verbose hyperlink references, and are +analogous to `Auto-Numbered Footnotes`_. They are particularly useful +in short or one-off documents. + +Anonymous `hyperlink references`_ are specified with two underscores +instead of one:: + + See `the web site of my favorite programming language`__. + +Anonymous targets begin with ".. __:"; no reference name is required +or allowed:: + + .. __: http://www.python.org + +As a convenient alternative, anonymous targets may begin with "__" +only:: + + __ http://www.python.org + +The reference name of the reference is not used to match the reference +to its target. Instead, the order of anonymous hyperlink references +and targets within the document is significant: the first anonymous +reference will link to the first anonymous target. The number of +anonymous hyperlink references in a document must match the number of +anonymous targets. + + +Directives +`````````` + +DTD elements: depend on the directive. + +Directives are indicated by an explicit markup start (".. ") followed +by the directive type, two colons, and whitespace. Directive types +are case-insensitive single words (alphanumerics plus internal +hyphens, underscores, and periods; no whitespace). Two colons are +used after the directive type for these reasons: + +- To avoid clashes with common comment text like:: + + .. Danger: modify at your own risk! + +- If an implementation of reStructuredText does not recognize a + directive (i.e., the directive-handler is not installed), the entire + directive block (including the directive itself) will be treated as + a literal block, and a level-3 (error) system message generated. + Thus "::" is a natural choice. + +Any text on the first line after the directive indicator is directive +data. The interpretation of directive data is up to the directive +code. Directive data may be interpreted as arguments to the +directive, or simply as the first line of the directive's text block. + +Actions taken in response to directives and the interpretation of text +in the directive block or subsequent text block(s) are +directive-dependent. Indented text following a directive may be +interpreted as a directive block. Simple directives may not require +any text beyond the directive data (if that), and will not process any +following indented text. + +Directives which have been implemented and registered in the reference +reStructuredText parser are described in the `reStructuredText +Directives`_ document. Below are examples of implemented directives. + +Directives are meant for the arbitrary processing of their contents +(the directive data & text block), which can be transformed into +something possibly unrelated to the original text. Directives are +used as an extension mechanism for reStructuredText, a way of adding +support for new constructs without adding new syntax. For example, +here's how an image may be placed:: + + .. image:: mylogo.png + +A figure (a graphic with a caption) may placed like this:: + + .. figure:: larch.png + The larch. + +An admonition (note, caution, etc.) contains other body elements:: + + .. note:: This is a paragraph + + - Here is a bullet list. + +It may also be possible for directives to be used as pragmas, to +modify the behavior of the parser, such as to experiment with +alternate syntax. There is no parser support for this functionality +at present; if a reasonable need for pragma directives is found, they +may be supported. + +Directives normally do not survive as "directive" elements past the +parsing stage; they are a *parser construct* only, and have no +intrinsic meaning outside of reStructuredText. Instead, the parser +will transform recognized directives into (possibly specialized) +document elements. Unknown directives will trigger level-3 (error) +system messages. + +Syntax diagram:: + + +-------+--------------------------+ + | ".. " | directive type "::" data | + +-------+ directive block | + | | + +--------------------------+ + + +Substitution Definitions +```````````````````````` + +DTD element: substitution_definition. + +Substitution definitions are indicated by an explicit markup start +(".. ") followed by a vertical bar, the substitution text, another +vertical bar, whitespace, and the definition block. Substitution text +may not begin or end with whitespace. A substitution definition block +contains an embedded inline-compatible directive (without the leading +".. "), such as an image. For example:: + + The |biohazard| symbol must be used on containers used to + dispose of medical waste. + + .. |biohazard| image:: biohazard.png + +It is an error for a substitution definition block to directly or +indirectly contain a circular substitution reference. + +`Substitution references`_ are replaced in-line by the processed +contents of the corresponding definition (linked by matching +substitution text). Substitution definitions allow the power and +flexibility of block-level directives_ to be shared by inline text. +They are a way to include arbitrarily complex inline structures within +text, while keeping the details out of the flow of text. They are the +equivalent of SGML/XML's named entities or programming language +macros. + +Without the substitution mechanism, every time someone wants an +application-specific new inline structure, they would have to petition +for a syntax change. In combination with existing directive syntax, +any inline structure can be coded without new syntax (except possibly +a new directive). + +Syntax diagram:: + + +-------+-----------------------------------------------------+ + | ".. " | "|" substitution text "| " directive type "::" data | + +-------+ directive block | + | | + +-----------------------------------------------------+ + +Following are some use cases for the substitution mechanism. Please +note that most of the embedded directives shown are examples only and +have not been implemented. + +Objects + Substitution references may be used to associate ambiguous text + with a unique object identifier. + + For example, many sites may wish to implement an inline "user" + directive:: + + |Michael| and |Jon| are our widget-wranglers. + + .. |Michael| user:: mjones + .. |Jon| user:: jhl + + Depending on the needs of the site, this may be used to index the + document for later searching, to hyperlink the inline text in + various ways (mailto, homepage, mouseover Javascript with profile + and contact information, etc.), or to customize presentation of + the text (include username in the inline text, include an icon + image with a link next to the text, make the text bold or a + different color, etc.). + + The same approach can be used in documents which frequently refer + to a particular type of objects with unique identifiers but + ambiguous common names. Movies, albums, books, photos, court + cases, and laws are possible. For example:: + + |The Transparent Society| offers a fascinating alternate view + on privacy issues. + + .. |The Transparent Society| book:: isbn=0738201448 + + Classes or functions, in contexts where the module or class names + are unclear and/or interpreted text cannot be used, are another + possibility:: + + 4XSLT has the convenience method |runString|, so you don't + have to mess with DOM objects if all you want is the + transformed output. + + .. |runString| function:: module=xml.xslt class=Processor + +Images + Images are a common use for substitution references:: + + West led the |H| 3, covered by dummy's |H| Q, East's |H| K, + and trumped in hand with the |S| 2. + + .. |H| image:: /images/heart.png + :height: 11 + :width: 11 + .. |S| image:: /images/spade.png + :height: 11 + :width: 11 + + * |Red light| means stop. + * |Green light| means go. + * |Yellow light| means go really fast. + + .. |Red light| image:: red_light.png + .. |Green light| image:: green_light.png + .. |Yellow light| image:: yellow_light.png + + |-><-| is the official symbol of POEE_. + + .. |-><-| image:: discord.png + .. _POEE: http://www.poee.org/ + + The "image" directive has been implemented. + +Styles [#]_ + Substitution references may be used to associate inline text with + an externally defined presentation style:: + + Even |the text in Texas| is big. + + .. |the text in Texas| style:: big + + The style name may be meaningful in the context of some particular + output format (CSS class name for HTML output, LaTeX style name + for LaTeX, etc), or may be ignored for other output formats (often + for plain text). + + .. @@@ This needs to be rethought & rewritten or removed: + + Interpreted text is unsuitable for this purpose because the set + of style names cannot be predefined - it is the domain of the + content author, not the author of the parser and output + formatter - and there is no way to associate a stylename + argument with an interpreted text style role. Also, it may be + desirable to use the same mechanism for styling blocks:: + + .. style:: motto + At Bob's Underwear Shop, we'll do anything to get in + your pants. + + .. style:: disclaimer + All rights reversed. Reprint what you like. + + .. [#] There may be sufficient need for a "style" mechanism to + warrant simpler syntax such as an extension to the interpreted + text role syntax. The substitution mechanism is cumbersome for + simple text styling. + +Templates + Inline markup may be used for later processing by a template + engine. For example, a Zope_ author might write:: + + Welcome back, |name|! + + .. |name| tal:: replace user/getUserName + + After processing, this ZPT output would result:: + + Welcome back, + <span tal:replace="user/getUserName">name</span>! + + Zope would then transform this to something like "Welcome back, + David!" during a session with an actual user. + +Replacement text + The substitution mechanism may be used for simple macro + substitution. This may be appropriate when the replacement text + is repeated many times throughout one or more documents, + especially if it may need to change later. A short example is + unavoidably contrived:: + + |RST| is a little annoying to type over and over, especially + when writing about |RST| itself, and spelling out the + bicapitalized word |RST| every time isn't really necessary for + |RST| source readability. + + .. |RST| replace:: reStructuredText_ + .. _reStructuredText: http://docutils.sourceforge.net/rst.html + + Substitution is also appropriate when the replacement text cannot + be represented using other inline constructs, or is obtrusively + long:: + + But still, that's nothing compared to a name like + |j2ee-cas|__. + + .. |j2ee-cas| replace:: + the Java `TM`:super: 2 Platform, Enterprise Edition Client + Access Services + __ http://developer.java.sun.com/developer/earlyAccess/ + j2eecas/ + + +Comments +```````` + +DTD element: comment. + +Arbitrary indented text may follow the explicit markup start and will +be processed as a comment element. No further processing is done on +the comment block text; a comment contains a single "text blob". +Depending on the output formatter, comments may be removed from the +processed output. The only restriction on comments is that they not +use the same syntax as directives, footnotes, citations, or hyperlink +targets. + +A explicit markup start followed by a blank line and nothing else +(apart from whitespace) is an "empty comment". It serves to terminate +a preceding construct, and does **not** consume any indented text +following. To have a block quote follow a list or any indented +construct, insert an unindented empty comment in-between. + +Syntax diagram:: + + +-------+----------------------+ + | ".. " | comment | + +-------+ block | + | | + +----------------------+ + + +Implicit Hyperlink Targets +========================== + +Implicit hyperlink targets are generated by section titles, footnotes, +and citations, and may also be generated by extension constructs. +Implicit hyperlink targets otherwise behave identically to explicit +`hyperlink targets`_. + +Problems of ambiguity due to conflicting duplicate implicit and +explicit reference names are avoided by following this procedure: + +1. `Explicit hyperlink targets`_ override any implicit targets having + the same reference name. The implicit hyperlink targets are + removed, and level-1 (info) system messages are inserted. + +2. Duplicate implicit hyperlink targets are removed, and level-1 + (info) system messages inserted. For example, if two or more + sections have the same title (such as "Introduction" subsections of + a rigidly-structured document), there will be duplicate implicit + hyperlink targets. + +3. Duplicate explicit hyperlink targets are removed, and level-2 + (warning) system messages are inserted. Exception: duplicate + `external hyperlink targets`_ (identical hyperlink names and + referenced URIs) do not conflict, and are not removed. + +System messages are inserted where target links have been removed. +See "Error Handling" in `PEP 258`_. + +The parser must return a set of *unique* hyperlink targets. The +calling software (such as the Docutils_) can warn of unresolvable +links, giving reasons for the messages. + + +Inline Markup +============= + +In reStructuredText, inline markup applies to words or phrases within +a text block. The same whitespace and punctuation that serves to +delimit words in written text is used to delimit the inline markup +syntax constructs. The text within inline markup may not begin or end +with whitespace. Arbitrary character-level markup is not supported; +it is not possible to mark up individual characters within a word. +Inline markup cannot be nested. + +There are nine inline markup constructs. Five of the constructs use +identical start-strings and end-strings to indicate the markup: + +- emphasis_: "*" +- `strong emphasis`_: "**" +- `interpreted text`_: "`" +- `inline literals`_: "``" +- `substitution references`_: "|" + +Three constructs use different start-strings and end-strings: + +- `inline hyperlink targets`_: "_`" and "`" +- `footnote references`_: "[" and "]_" +- `hyperlink references`_: "`" and "\`_" (phrases), or just a + trailing "_" (single words) + +`Standalone hyperlinks`_ are recognized implicitly, and use no extra +markup. + +The inline markup start-string and end-string recognition rules are as +follows. If any of the conditions are not met, the start-string or +end-string will not be recognized or processed. + +1. Inline markup start-strings must start a text block or be + immediately preceded by whitespace, single or double quotes, "(", + "[", "{", or "<". + +2. Inline markup start-strings must be immediately followed by + non-whitespace. + +3. Inline markup end-strings must be immediately preceded by + non-whitespace. + +4. Inline markup end-strings must end a text block or be immediately + followed by whitespace or one of:: + + ' " . , : ; ! ? - ) ] } > + +5. If an inline markup start-string is immediately preceded by a + single or double quote, "(", "[", "{", or "<", it must not be + immediately followed by the corresponding single or double quote, + ")", "]", "}", or ">". + +6. An inline markup end-string must be separated by at least one + character from the start-string. + +7. An unescaped backslash preceding a start-string or end-string will + disable markup recognition, except for the end-string of `inline + literals`_. See `Escaping Mechanism`_ above for details. + +For example, none of the following are recognized as containing inline +markup start-strings: " * ", '"*"', "'*'", "(*)", "(* ", "[*]", "{*}", +"\*", " ` ", etc. + +The inline markup recognition rules were devised intentionally to +allow 90% of non-markup uses of "*", "`", "_", and "|" *without* +resorting to backslashes. For 9 of the remaining 10%, use inline +literals or literal blocks:: + + "``\*``" -> "\*" (possibly in another font or quoted) + +Only those who understand the escaping and inline markup rules should +attempt the remaining 1%. ;-) + +Inline markup delimiter characters are used for multiple constructs, +so to avoid ambiguity there must be a specific recognition order for +each character. The inline markup recognition order is as follows: + +- Asterisks: `Strong emphasis`_ ("**") is recognized before emphasis_ + ("*"). + +- Backquotes: `Inline literals`_ ("``"), `inline hyperlink targets`_ + (leading "_`", trailing "`"), are mutually independent, and are + recognized before phrase `hyperlink references`_ (leading "`", + trailing "\`_") and `interpreted text`_ ("`"). + +- Trailing underscores: Footnote references ("[" + label + "]_") and + simple `hyperlink references`_ (name + trailing "_") are mutually + independent. + +- Vertical bars: `Substitution references`_ ("|") are independently + recognized. + +- `Standalone hyperlinks`_ are the last to be recognized. + + +Emphasis +-------- + +DTD element: emphasis. + +Start-string = end-string = "*". + +Text enclosed by single asterisk characters is emphasized:: + + This is *emphasized text*. + +Emphasized text is typically displayed in italics. + + +Strong Emphasis +--------------- + +DTD element: strong. + +Start-string = end-string = "**". + +Text enclosed by double-asterisks is emphasized strongly:: + + This is **strong text**. + +Strongly emphasized text is typically displayed in boldface. + + +Interpreted Text +---------------- + +DTD element: interpreted. + +Start-string = end-string = "`". + +Text enclosed by single backquote characters is interpreted:: + + This is `interpreted text`. + +Interpreted text is text that is meant to be related, indexed, linked, +summarized, or otherwise processed, but the text itself is left +alone. The text is "tagged" directly, in-place. The semantics of +interpreted text are domain-dependent. It can be used as implicit or +explicit descriptive markup (such as for program identifiers, as in +the `Python Source Reader`_), for cross-reference interpretation (such +as index entries), or for other applications where context can be +inferred. + +The role of the interpreted text determines how the text is +interpreted. It is normally inferred implicitly. The role of the +interpreted text may also be indicated explicitly, using a role +marker, either as a prefix or as a suffix to the interpreted text, +depending on which reads better:: + + :role:`interpreted text` + + `interpreted text`:role: + +Roles are simply extensions of the available inline constructs; to +emphasis_, `strong emphasis`_, `inline literals`_, and `hyperlink +references`_, we can add "index entry", "acronym", "class", "red", +"blinking" or anything else we want. + +A role marker consists of a colon, the role name, and another colon. +A role name is a single word consisting of alphanumerics plus internal +hypens, underscores, and periods; no whitespace or other characters +are allowed. + + +Inline Literals +--------------- + +DTD element: literal. + +Start-string = end-string = "``". + +Text enclosed by double-backquotes is treated as inline literals:: + + This text is an example of ``inline literals``. + +Inline literals may contain any characters except two adjacent +backquotes in an end-string context (according to the recognition +rules above). No markup interpretation (including backslash-escape +interpretation) is done within inline literals. + +Line breaks are *not* preserved in inline literals. Although a +reStructuredText parser will preserve runs of spaces in its output, +the final representation of the processed document is dependent on the +output formatter, thus the preservation of whitespace cannot be +guaranteed. If the preservation of line breaks and/or other +whitespace is important, `literal blocks`_ should be used. + +Inline literals are useful for short code snippets. For example:: + + The regular expression ``[+-]?(\d+(\.\d*)?|\.\d+)`` matches + floating-point numbers (without exponents). + + +Hyperlink References +-------------------- + +DTD element: reference. + +- Named hyperlink references: + + - Start-string = "" (empty string), end-string = "_". + - Start-string = "`", end-string = "\`_". (Phrase references.) + +- Anonymous hyperlink references: + + - Start-string = "" (empty string), end-string = "__". + - Start-string = "`", end-string = "\`__". (Phrase references.) + +Hyperlink references are indicated by a trailing underscore, "_", +except for `standalone hyperlinks`_ which are recognized +independently. The underscore can be thought of as a right-pointing +arrow. The trailing underscores point away from hyperlink references, +and the leading underscores point toward `hyperlink targets`_. + +Hyperlinks consist of two parts. In the text body, there is a source +link, a reference name with a trailing underscore (or two underscores +for `anonymous hyperlinks`_):: + + See the Python_ home page for info. + +A target link with a matching reference name must exist somewhere else +in the document. See `Hyperlink Targets`_ for a full description). + +`Anonymous hyperlinks`_ (which see) do not use reference names to +match references to targets, but otherwise behave similarly to named +hyperlinks. + + +Inline Hyperlink Targets +------------------------ + +DTD element: target. + +Start-string = "_`", end-string = "`". + +Inline hyperlink targets are the equivalent of explicit `internal +hyperlink targets`_, but may appear within running text. The syntax +begins with an underscore and a backquote, is followed by a hyperlink +name or phrase, and ends with a backquote. Inline hyperlink targets +may not be anonymous. + +For example, the following paragraph contains a hyperlink target named +"Norwegian Blue":: + + Oh yes, the _`Norwegian Blue`. What's, um, what's wrong with it? + +See `Implicit Hyperlink Targets`_ for the resolution of duplicate +reference names. + + +Footnote References +------------------- + +DTD element: footnote_reference. + +Start-string = "[", end-string = "]_". + +Each footnote reference consists of a square-bracketed label followed +by a trailing underscore. Footnote labels are one of: + +- one or more digits (i.e., a number), + +- a single "#" (denoting `auto-numbered footnotes`_), + +- a "#" followed by a simple reference name (an `autonumber label`_), + or + +- a single "*" (denoting `auto-symbol footnotes`_). + +For example:: + + Please RTFM [1]_. + + .. [1] Read The Fine Manual + + +Citation References +------------------- + +DTD element: citation_reference. + +Start-string = "[", end-string = "]_". + +Each citation reference consists of a square-bracketed label followed +by a trailing underscore. Citation labels are simple `reference +names`_ (case-insensitive single words, consisting of alphanumerics +plus internal hyphens, underscores, and periods; no whitespace). + +For example:: + + Here is a citation reference: [CIT2002]_. + +See Citations_ for the citation itself. + + +Substitution References +----------------------- + +DTD element: substitution_reference, reference. + +Start-string = "|", end-string = "|" (optionally followed by "_" or +"__"). + +Vertical bars are used to bracket the substitution reference text. A +substitution reference may also be a hyperlink reference by appending +a "_" (named) or "__" (anonymous) suffix; the substitution text is +used for the reference text in the named case. + +The processing system replaces substitution references with the +processed contents of the corresponding `substitution definitions`_. +Substitution definitions produce inline-compatible elements. + +Examples:: + + This is a simple |substitution reference|. It will be replaced by + the processing system. + + This is a combination |substitution and hyperlink reference|_. In + addition to being replaced, the replacement text or element will + refer to the "substitution and hyperlink reference" target. + + +Standalone Hyperlinks +--------------------- + +DTD element: link. + +Start-string = end-string = "" (empty string). + +A URI (absolute URI [#URI]_ or standalone email address) within a text +block is treated as a general external hyperlink with the URI itself +as the link's text. For example:: + + See http://www.python.org for info. + +would be marked up in HTML as:: + + See <A HREF="http://www.python.org">http://www.python.org</A> for + info. + +Two forms of URI are recognized: + +1. Absolute URIs. These consist of a scheme, a colon (":"), and a + scheme-specific part whose interpretation depends on the scheme. + + The scheme is the name of the protocol, such as "http", "ftp", + "mailto", or "telnet". The scheme consists of an initial letter, + followed by letters, numbers, and/or "+", "-", ".". Recognition is + limited to known schemes, per the W3C's `Index of WWW Addressing + Schemes`_. + + The scheme-specific part of the resource identifier may be either + hierarchical or opaque: + + - Hierarchical identifiers begin with one or two slashes and may + use slashes to separate hierarchical components of the path. + Examples are web pages and FTP sites:: + + http://www.python.org + + ftp://ftp.python.org/pub/python + + - Opaque identifiers do not begin with slashes. Examples are + email addresses and newsgroups:: + + mailto:someone@somewhere.com + + news:comp.lang.python + + With queries, fragments, and %-escape sequences, URIs can become + quite complicated. A reStructuredText parser must be able to + recognize any absolute URI, as defined in RFC2396_ and RFC2732_. + +2. Standalone email addresses, which are treated as if they were + ablsolute URIs with a "mailto:" scheme. Example:: + + someone@somewhere.com + +Punctuation at the end of a URI is not considered part of the URI. + +.. [#URI] Uniform Resource Identifier. URIs are a general form of + URLs (Uniform Resource Locators). For the syntax of URIs see + RFC2396_ and RFC2732_. + + +---------------- + Error Handling +---------------- + +DTD element: system_message, problematic. + +Markup errors are handled according to the specification in `PEP +258`_. + + +.. _reStructuredText: http://docutils.sourceforge.net/rst.html +.. _Docutils: http://docutils.sourceforge.net/ +.. _Docutils Document Tree Structure: + http://docutils.sourceforge.net/spec/doctree.txt +.. _Generic Plaintext Document Interface DTD: + http://docutils.sourceforge.net/spec/gpdi.dtd +.. _transforms: + http://docutils.sourceforge.net/docutils/transforms/ +.. _Grouch: http://www.mems-exchange.org/software/grouch/ +.. _RFC822: http://www.rfc-editor.org/rfc/rfc822.txt +.. _DocTitle transform: +.. _DocInfo transform: + http://docutils.sourceforge.net/docutils/transforms/frontmatter.py +.. _doctest module: + http://www.python.org/doc/current/lib/module-doctest.html +.. _getopt.py: + http://www.python.org/doc/current/lib/module-getopt.html +.. _GNU libc getopt_long(): + http://www.gnu.org/manual/glibc-2.2.3/html_node/libc_516.html +.. _Index of WWW Addressing Schemes: + http://www.w3.org/Addressing/schemes.html +.. _World Wide Web Consortium: http://www.w3.org/ +.. _HTML Techniques for Web Content Accessibility Guidelines: + http://www.w3.org/TR/WCAG10-HTML-TECHS/#link-text +.. _reStructuredText Directives: directives.html +.. _Python Source Reader: + http://docutils.sourceforge.net/spec/pysource.txt +.. _RFC2396: http://www.rfc-editor.org/rfc/rfc2396.txt +.. _RFC2732: http://www.rfc-editor.org/rfc/rfc2732.txt +.. _Zope: http://www.zope.com/ +.. _PEP 258: http://docutils.sourceforge.net/spec/pep-0258.txt + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/ref/soextblx.dtd b/docs/ref/soextblx.dtd new file mode 100644 index 000000000..56ba311ba --- /dev/null +++ b/docs/ref/soextblx.dtd @@ -0,0 +1,312 @@ +<!-- +=========================================================================== + OASIS XML Exchange Table Model Declaration Module +=========================================================================== +:Date: 1999-03-15 +--> + +<!-- This set of declarations defines the XML version of the Exchange + Table Model as of the date shown in the Formal Public Identifier + (FPI) for this entity. + + This set of declarations may be referred to using a public external + entity declaration and reference as shown in the following three + lines: + + <!ENTITY % calstblx + PUBLIC "-//OASIS//DTD XML Exchange Table Model 19990315//EN"> + %calstblx; + + If various parameter entities used within this set of declarations + are to be given non-default values, the appropriate declarations + should be given before calling in this package (i.e., before the + "%calstblx;" reference). +--> + +<!-- The motivation for this XML version of the Exchange Table Model + is simply to create an XML version of the SGML Exchange Table + Model. By design, no effort has been made to "improve" the model. + + This XML version incorporates the logical bare minimum changes + necessary to make the Exchange Table Model a valid XML DTD. +--> + +<!-- The XML version of the Exchange Table Model differs from + the SGML version in the following ways: + + The following parameter entities have been removed: + + - tbl.table.excep, tbl.hdft.excep, tbl.row.excep, tbl.entry.excep + There are no exceptions in XML. The following normative statement + is made in lieu of exceptions: the exchange table model explicitly + forbids a table from occurring within another table. If the + content model of an entry includes a table element, then this + cannot be enforced by the DTD, but it is a deviation from the + exchange table model to include a table within a table. + + - tbl.hdft.name, tbl.hdft.mdl, tbl.hdft.excep, tbl.hdft.att + The motivation for these elements was to change the table + header/footer elements. Since XML does not allow element declarations + to contain name groups, and the exchange table model does not + allow a table to contain footers, the continued presence of these + attributes seems unnecessary. + + The following parameter entity has been added: + + - tbl.thead.att + This entity parameterizes the attributes on thead. It replaces + the tbl.hdft.att parameter entity. + + Other miscellaneous changes: + + - Tag ommission indicators have been removed + - Comments have been removed from declarations + - NUMBER attributes have been changed to NMTOKEN + - NUTOKEN attributes have been to changed to NMTOKEN + - Removed the grouping characters around the content model + parameter entry for the 'entry' element. This is necessary + so that an entry can contain #PCDATA and be defined as an + optional, repeatable OR group beginning with #PCDATA. +--> + +<!-- This entity includes a set of element and attribute declarations + that partially defines the Exchange table model. However, the model + is not well-defined without the accompanying natural language + description of the semantics (meanings) of these various elements, + attributes, and attribute values. The semantic writeup, also available + from SGML Open, should be used in conjunction with this entity. +--> + +<!-- In order to use the Exchange table model, various parameter entity + declarations are required. A brief description is as follows: + + ENTITY NAME WHERE USED WHAT IT IS + + %yesorno In ATTLIST of: An attribute declared value + almost all elements for a "boolean" attribute + + %paracon In content model of: The "text" (logical content) + <entry> of the model group for <entry> + + %titles In content model of: The "title" part of the model + table element group for the table element + + %tbl.table.name In declaration of: The name of the "table" + table element element + + %tbl.table-titles.mdl In content model of: The model group for the title + table elements part of the content model for + table element + + %tbl.table.mdl In content model of: The model group for the content + table elements model for table element, + often (and by default) defined + in terms of %tbl.table-titles.mdl + and tgroup + + %tbl.table.att In ATTLIST of: Additional attributes on the + table element table element + + %bodyatt In ATTLIST of: Additional attributes on the + table element table element (for backward + compatibility with the SGML + model) + + %tbl.tgroup.mdl In content model of: The model group for the content + <tgroup> model for <tgroup> + + %tbl.tgroup.att In ATTLIST of: Additional attributes on the +4 <tgroup> <tgroup> element + + %tbl.thead.att In ATTLIST of: Additional attributes on the + <thead> <thead> element + + %tbl.tbody.att In ATTLIST of: Additional attributes on the + <tbody> <tbody> element + + %tbl.colspec.att In ATTLIST of: Additional attributes on the + <colspec> <colspec> element + + %tbl.row.mdl In content model of: The model group for the content + <row> model for <row> + + %tbl.row.att In ATTLIST of: Additional attributes on the + <row> <row> element + + %tbl.entry.mdl In content model of: The model group for the content + <entry> model for <entry> + + %tbl.entry.att In ATTLIST of: Additional attributes on the + <entry> <entry> element + + This set of declarations will use the default definitions shown below + for any of these parameter entities that are not declared before this + set of declarations is referenced. +--> + +<!-- These definitions are not directly related to the table model, but are + used in the default CALS table model and may be defined elsewhere (and + prior to the inclusion of this table module) in the referencing DTD. --> + +<!ENTITY % yesorno 'NMTOKEN'> <!-- no if zero(s), yes if any other value --> +<!ENTITY % titles 'title?'> +<!ENTITY % paracon '#PCDATA'> <!-- default for use in entry content --> + +<!-- +The parameter entities as defined below change and simplify the CALS table +model as published (as part of the Example DTD) in MIL-HDBK-28001. The +resulting simplified DTD has support from the SGML Open vendors and is +therefore more interoperable among different systems. + +These following declarations provide the Exchange default definitions +for these entities. However, these entities can be redefined (by giving +the appropriate parameter entity declaration(s) prior to the reference +to this Table Model declaration set entity) to fit the needs of the +current application. + +Note, however, that changes may have significant effect on the ability to +interchange table information. These changes may manifest themselves +in useability, presentation, and possible structure information degradation. +--> + +<!ENTITY % tbl.table.name "table"> +<!ENTITY % tbl.table-titles.mdl "%titles;,"> +<!ENTITY % tbl.table-main.mdl "tgroup+"> +<!ENTITY % tbl.table.mdl "%tbl.table-titles.mdl; %tbl.table-main.mdl;"> +<!ENTITY % tbl.table.att " + pgwide %yesorno; #IMPLIED "> +<!ENTITY % bodyatt ""> +<!ENTITY % tbl.tgroup.mdl "colspec*,thead?,tbody"> +<!ENTITY % tbl.tgroup.att ""> +<!ENTITY % tbl.thead.att ""> +<!ENTITY % tbl.tbody.att ""> +<!ENTITY % tbl.colspec.att ""> +<!ENTITY % tbl.row.mdl "entry+"> +<!ENTITY % tbl.row.att ""> +<!ENTITY % tbl.entry.mdl "(%paracon;)*"> +<!ENTITY % tbl.entry.att ""> + +<!-- ===== Element and attribute declarations follow. ===== --> + +<!-- + Default declarations previously defined in this entity and + referenced below include: + ENTITY % tbl.table.name "table" + ENTITY % tbl.table-titles.mdl "%titles;," + ENTITY % tbl.table.mdl "%tbl.table-titles; tgroup+" + ENTITY % tbl.table.att " + pgwide %yesorno; #IMPLIED " +--> + +<!ELEMENT %tbl.table.name; (%tbl.table.mdl;)> + +<!ATTLIST %tbl.table.name; + frame (top|bottom|topbot|all|sides|none) #IMPLIED + colsep %yesorno; #IMPLIED + rowsep %yesorno; #IMPLIED + %tbl.table.att; + %bodyatt; +> + +<!-- + Default declarations previously defined in this entity and + referenced below include: + ENTITY % tbl.tgroup.mdl "colspec*,thead?,tbody" + ENTITY % tbl.tgroup.att "" +--> + +<!ELEMENT tgroup (%tbl.tgroup.mdl;) > + +<!ATTLIST tgroup + cols NMTOKEN #REQUIRED + colsep %yesorno; #IMPLIED + rowsep %yesorno; #IMPLIED + align (left|right|center|justify|char) #IMPLIED + %tbl.tgroup.att; +> + +<!-- + Default declarations previously defined in this entity and + referenced below include: + ENTITY % tbl.colspec.att "" +--> + +<!ELEMENT colspec EMPTY > + +<!ATTLIST colspec + colnum NMTOKEN #IMPLIED + colname NMTOKEN #IMPLIED + colwidth CDATA #IMPLIED + colsep %yesorno; #IMPLIED + rowsep %yesorno; #IMPLIED + align (left|right|center|justify|char) #IMPLIED + char CDATA #IMPLIED + charoff NMTOKEN #IMPLIED + %tbl.colspec.att; +> + +<!-- + Default declarations previously defined in this entity and + referenced below include: + ENTITY % tbl.thead.att "" +--> + +<!ELEMENT thead (row+)> + +<!ATTLIST thead + valign (top|middle|bottom) #IMPLIED + %tbl.thead.att; +> + +<!-- + Default declarations previously defined in this entity and + referenced below include: + ENTITY % tbl.tbody.att "" +--> + +<!ELEMENT tbody (row+)> + +<!ATTLIST tbody + valign (top|middle|bottom) #IMPLIED + %tbl.tbody.att; +> + +<!-- + Default declarations previously defined in this entity and + referenced below include: + ENTITY % tbl.row.mdl "entry+" + ENTITY % tbl.row.att "" +--> + +<!ELEMENT row (%tbl.row.mdl;)> + +<!ATTLIST row + rowsep %yesorno; #IMPLIED + valign (top|middle|bottom) #IMPLIED + %tbl.row.att; +> + + +<!-- + Default declarations previously defined in this entity and + referenced below include: + ENTITY % paracon "#PCDATA" + ENTITY % tbl.entry.mdl "(%paracon;)*" + ENTITY % tbl.entry.att "" +--> + +<!ELEMENT entry %tbl.entry.mdl;> + +<!ATTLIST entry + colname NMTOKEN #IMPLIED + namest NMTOKEN #IMPLIED + nameend NMTOKEN #IMPLIED + morerows NMTOKEN #IMPLIED + colsep %yesorno; #IMPLIED + rowsep %yesorno; #IMPLIED + align (left|right|center|justify|char) #IMPLIED + char CDATA #IMPLIED + charoff NMTOKEN #IMPLIED + valign (top|middle|bottom) #IMPLIED + %tbl.entry.att; +> diff --git a/docs/user/rst/images/ball1.gif b/docs/user/rst/images/ball1.gif Binary files differnew file mode 100644 index 000000000..3e14441d9 --- /dev/null +++ b/docs/user/rst/images/ball1.gif diff --git a/docs/user/rst/images/biohazard.bmp b/docs/user/rst/images/biohazard.bmp Binary files differnew file mode 100644 index 000000000..aceb52948 --- /dev/null +++ b/docs/user/rst/images/biohazard.bmp diff --git a/docs/user/rst/images/biohazard.gif b/docs/user/rst/images/biohazard.gif Binary files differnew file mode 100644 index 000000000..7e1ea34ed --- /dev/null +++ b/docs/user/rst/images/biohazard.gif diff --git a/docs/user/rst/images/biohazard.png b/docs/user/rst/images/biohazard.png Binary files differnew file mode 100644 index 000000000..ae4629d8b --- /dev/null +++ b/docs/user/rst/images/biohazard.png diff --git a/docs/user/rst/images/title.png b/docs/user/rst/images/title.png Binary files differnew file mode 100644 index 000000000..cc6218efe --- /dev/null +++ b/docs/user/rst/images/title.png diff --git a/docs/user/rst/quickref.html b/docs/user/rst/quickref.html new file mode 100644 index 000000000..886a02107 --- /dev/null +++ b/docs/user/rst/quickref.html @@ -0,0 +1,1096 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> +<html> + <head> + <title>Quick reStructuredText</title> + + </head> + + <body> + <h1>Quick <i>re</i><font size="+4"><tt>Structured</tt></font><i>Text</i></h1> + <!-- Hmm - does that (relative) font size work for you? --> + <!-- If David produces a smaller version of the reST title --> + <!-- page's title image, we could do something like: --> + <!-- <h1>Quick <img src="images/title.png" --> + <!-- alt="Quick reStructuredText"></h1> --> + <!-- which might be a better idea... --> + + <!-- Caveat: if you're reading the HTML for the examples, --> + <!-- beware that it was hand-generated, not by Docutils/ReST. --> + + <p align="right"><em><a href="http://docutils.sourceforge.net/docs/rst/quickref.html" + >http://docutils.sourceforge.net/docs/rst/quickref.html</a></em> + <br align="right"><em>Being a cheat-sheet for reStructuredText</em> + <br align="right"><em>Version 0.8 of 2002-04-19</em> + + + <p>The full details may be found on the + <a href="http://docutils.sourceforge.net/rest.html">reStructuredText</a> + page. This document is just intended as a reminder. + + <p>Links that look like "(<a href="#details">details?</a>)" point + into the HTML version of the full <a + href="../../spec/rst/reStructuredText.html">reStructuredText + specification</a> document. These are relative links; if they + don't work, please use the <a + href="http://docutils.sourceforge.net/docs/rst/quickref.html" + >master "Quick reStructuredText"</a> document. + + <h2><a name="contents">Contents</a></h2> + + <ul> + <li><a href="#inline-markup">Inline Markup</a></li> + <li><a href="#escaping">Escaping with Bashslashes</a></li> + <li><a href="#section-structure">Section Structure</a></li> + <li><a href="#paragraphs">Paragraphs</a></li> + <li><a href="#bullet-lists">Bullet Lists</a></li> + <li><a href="#enumerated-lists">Enumerated Lists</a></li> + <li><a href="#definition-lists">Definition Lists</a></li> + <li><a href="#field-lists">Field Lists</a></li> + <li><a href="#option-lists">Option Lists</a></li> + <li><a href="#literal-blocks">Literal Blocks</a></li> + <li><a href="#block-quotes">Block Quotes</a></li> + <li><a href="#doctest-blocks">Doctest Blocks</a></li> + <li><a href="#tables">Tables</a></li> + <li><a href="#transitions">Transitions</a></li> + <li><a href="#footnotes">Footnotes</a></li> + <li><a href="#citations">Citations</a></li> + <li><a href="#hyperlink-targets">Hyperlink Targets</a></li> + <ul> + <li><a href="#external-hyperlink-targets">External Hyperlink Targets</a></li> + <li><a href="#internal-hyperlink-targets">Internal Hyperlink Targets</a></li> + <li><a href="#indirect-hyperlink-targets">Indirect Hyperlink Targets</a></li> + <li><a href="#implicit-hyperlink-targets">Implicit Hyperlink Targets</a></li> + </ul> + <li><a href="#directives">Directives</a></li> + <li><a href="#substitution-references-and-definitions">Substitution References and Definitions</a></li> + <li><a href="#comments">Comments</a></li> + </ul> + + <h2><a href="#contents" name="inline-markup">Inline Markup</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#inline-markup">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th>Plain text + <th>Typical result + <th>Notes + </thead> + <tbody> + <tr valign="top"> + <td nowrap><samp>*emphasis*</samp> + <td><em>emphasis</em> + <td> + + <tr valign="top"> + <td nowrap><samp>**strong emphasis**</samp> + <td><strong>strong emphasis</strong> + <td> + + <tr valign="top"> + <td nowrap><samp>`interpreted text`</samp> + <td>interpreted text + <td>What interpreted text <em>means</em> is domain dependent. + + <tr valign="top"> + <td nowrap><samp>``inline literal``</samp> + <td><code>inline literal</code> + <td>Spaces should be preserved, but line breaks will not be. + + <tr valign="top"> + <td nowrap><samp>reference_</samp> + <td><a href="#hyperlink-targets">reference</a> + <td>A simple, one-word hyperlink reference. See <a href="#hyperlinks">Hyperlinks</a>. + + <tr valign="top"> + <td nowrap><samp>`phrase reference`_</samp> + <td><a href="#hyperlink-targets">phrase reference</a> + <td>A hyperlink reference with spaces or punctuation needs to be quoted with backquotes. + See <a href="#hyperlink-targets">Hyperlinks</a>. + + <tr valign="top"> + <td nowrap><samp>anonymous__</samp> + <td><a href="#hyperlink-targets">anonymous</a> + <td>Both simple and phrase references may be anonymous (two underscores). + See <a href="#hyperlink-targets">Hyperlinks</a>. + + <tr valign="top"> + <td nowrap><samp>_`inline hyperlink target`</samp> + <td><a name="inline-hyperlink-target">inline hyperlink target</a> + <td>A crossreference target within text. + See <a href="#hyperlink-targets">Hyperlinks</a>. + + <tr valign="top"> + <td nowrap><samp>|substitution reference|</samp> + <td>(see note) + <td>The result is substituted in from the <a href="#substitution-references-and-definitions">substitution definition</a>. + It could be text, an image, a hyperlink, or a combination of these and others. + + <tr valign="top"> + <td nowrap><samp>footnote reference [1]_</samp> + <td>footnote reference <a href="#footnotes">[1]</a> + <td>See <a href="#footnotes">Footnotes</a>. + + <tr valign="top"> + <td nowrap><samp>citation reference [CIT2002]_</samp> + <td>citation reference <a href="#citations">[CIT2002]</a> + <td>See <a href="#citations">Citations</a>. + + <tr valign="top"> + <td nowrap><samp>http://docutils.sf.net/</samp> + <td><a href="http://docutils.sf.net/">http://docutils.sf.net/</a> + <td>A standalone hyperlink. + + </table> + + <p>Asterisk, backquote, vertical bar, and underscore are inline + delimiter characters. Asterisk, backquote, and vertical bar act + like quote marks; matching characters surround the marked-up word + or phrase, whitespace or other quoting is required outside them, + and there can't be whitespace just inside them. If you want to use + inline delimiter characters literally, <a href="#escaping">escape + (with backslash)</a> or quote them (with double backquotes; i.e. + use inline literals). + + <p>In detail, the reStructuredText specifications says that in + inline markup: + <ol> + <li>The start-string must start a text block or be + immediately preceded by whitespace, + <samp>' " ( [ {</samp> or <samp><</samp>. + <li>The start-string must be immediately followed by non-whitespace. + <li>The end-string must be immediately preceded by non-whitespace. + <li>The end-string must end a text block or be immediately + followed by whitespace, + <samp>' " . , : ; ! ? - ) ] }</samp> or <samp>></samp>. + <li>If a start-string is immediately preceded by one of + <samp>' " ( [ {</samp> or <samp><</samp>, it must not be + immediately followed by the corresponding character from + <samp>' " ) ] }</samp> or <samp>></samp>. + <li>An end-string must be separated by at least one + character from the start-string. + <li>An <a href="#escaping">unescaped</a> backslash preceding a start-string or end-string will + disable markup recognition, except for the end-string of inline + literals. + </ol> + + <p>Also remember that inline markup may not be nested (well, + except that inline literals can contain any of the other inline + markup delimiter characters, but that doesn't count because + nothing is processed). + + <h2><a href="#contents" name="escaping">Escaping with Bashslashes</a></h2> + + <p>(<a + href="../../spec/rst/reStructuredText.html#backslashes">details?</a>) + + <p>reStructuredText uses backslashes ("\") to override the special + meaning given to markup characters and get the literal characters + themselves. To get a literal backslash, use an escaped backslash + ("\\"). For example: + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Raw reStructuredText + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"><td> + <samp>*escape* ``with`` "\"</samp> + <td><em>escape</em> <samp>with</samp> "" + <tr valign="top"><td> + <samp>\*escape* \``with`` "\\"</samp> + <td>*escape* ``with`` "\" + </table> + + <p>In Python strings it will, of course, be necessary + to escape any backslash characters so that they actually + <em>reach</em> reStructuredText. + The simplest way to do this is to use raw strings: + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Python string + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"><td> + <samp>r"""\*escape* \`with` "\\""""</samp> + <td>*escape* `with` "\" + <tr valign="top"><td> + <samp> """\\*escape* \\`with` "\\\\""""</samp> + <td>*escape* `with` "\" + <tr valign="top"><td> + <samp> """\*escape* \`with` "\\""""</samp> + <td><em>escape</em> with "" + </table> + + <h2><a href="#contents" name="section-structure">Section Structure</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#sections">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> + <samp>=====</samp> + <br><samp>Title</samp> + <br><samp>=====</samp> + <br><samp>Subtitle</samp> + <br><samp>--------</samp> + <br><samp>Titles are underlined (or over-</samp> + <br><samp>and underlined) with a printing</samp> + <br><samp>nonalphanumeric 7-bit ASCII</samp> + <br><samp>character. Recommended choices</samp> + <br><samp>are "``= - ` : ' " ~ ^ _ * + # < >``".</samp> + <td> + <font size="+2"><strong>Title</strong></font> + <p><font size="+1"><strong>Subtitle</strong></font> + <p>Titles are underlined (or over- + and underlined) with a printing + nonalphanumeric 7-bit ASCII + character. Recommended choices + are "<samp>= - ` : ' " ~ ^ _ * + # < ></samp>". + </table> + + <h2><a href="#contents" name="paragraphs">Paragraphs</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#paragraphs">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> +<p><samp>This is a paragraph.</samp> + +<p><samp>Paragraphs line up at their left</samp> +<br><samp>edges, and are normally separated</samp> +<br><samp>by blank lines.</samp> + + <td> + <p>This is a paragraph. + + <p>Paragraphs line up at their left edges, and are normally + separated by blank lines. + + </table> + + <h2><a href="#contents" name="bullet-lists">Bullet Lists</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#bullet-lists">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> +<samp>Bullet lists:</samp> + +<p><samp>- This is item 1</samp> +<br><samp>- This is item 2</samp> + +<p><samp>- Bullets are "-", "*" or "+".</samp> +<br><samp> Continuing text must be aligned</samp> +<br><samp> after the bullet and whitespace.</samp> + +<p><samp>Note that a blank line is required</samp> +<br><samp>before the first item and after the</samp> +<br><samp>last, but is optional between items.</samp> + <td>Bullet lists: + <ul> + <li>This is item 1 + <li>This is item 2 + <li>Bullets are "-", "*" or "+". + Continuing text must be aligned + after the bullet and whitespace. + </ul> + <p>Note that a blank line is required before the first + item and after the last, but is optional between items. + </table> + + <h2><a href="#contents" name="enumerated-lists">Enumerated Lists</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#enumerated-lists">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> +<samp>Enumerated lists:</samp> + +<p><samp>3. This is the first item</samp> +<br><samp>4. This is the second item</samp> +<br><samp>5. Enumerators are arabic numbers,</samp> +<br><samp> single letters, or roman numerals</samp> +<br><samp>6. List items should be sequentially</samp> +<br><samp> numbered, but need not start at 1</samp> +<br><samp> (although not all formatters will</samp> +<br><samp> honour the first index).</samp> + <td>Enumerated lists: + <ol type="1"> + <li value="3">This is the first item + <li>This is the second item + <li>Enumerators are arabic numbers, single letters, + or roman numerals + <li>List items should be sequentially numbered, + but need not start at 1 (although not all + formatters will honour the first index). + </ol> + </table> + + <h2><a href="#contents" name="definition-lists">Definition Lists</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#definition-lists">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> +<samp>Definition lists:</samp> +<br> +<br><samp>what</samp> +<br><samp> Definition lists associate a term with</samp> +<br><samp> a definition.</samp> +<br> +<br><samp>how</samp> +<br><samp> The term is a one-line phrase, and the</samp> +<br><samp> definition is one or more paragraphs or</samp> +<br><samp> body elements, indented relative to the</samp> +<br><samp> term. Blank lines are not allowed</samp> +<br><samp> between term and definition.</samp> + <td>Definition lists: + <dl> + <dt><strong>what</strong> + <dd>Definition lists associate a term with + a definition. + + <dt><strong>how</strong> + <dd>The term is a one-line phrase, and the + definition is one or more paragraphs or + body elements, indented relative to the + term. Blank lines are not allowed + between term and definition. + </dl> + </table> + + <h2><a href="#contents" name="field-lists">Field Lists</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#field-lists">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> +<samp>:Authors:</samp> +<br><samp> Tony J. (Tibs) Ibbs,</samp> +<br><samp> David Goodger</samp> + +<p><samp> (and sundry other good-natured folks)</samp> + +<p><samp>:Version: 1.0 of 2001/08/08</samp> +<br><samp>:Dedication: To my father.</samp> + <td> + <table> + <tr valign="top"> + <td><strong>Authors:</strong> + <td>Tony J. (Tibs) Ibbs, + David Goodger + <tr><td><td>(and sundry other good-natured folks) + <tr><td><strong>Version:</strong><td>1.0 of 2001/08/08 + <tr><td><strong>Dedication:</strong><td>To my father. + </table> + </table> + + <h2><a href="#contents" name="option-lists">Option Lists</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#option-lists">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> + <p><samp> +-a command-line option "a" +<br>-b file options can have arguments +<br> and long descriptions +<br>--long options can be long also +<br>--input=file long options can also have +<br> arguments +<br>/V DOS/VMS-style options too +</samp> + + <td> + <table border="0" width="100%"> + <tbody valign="top"> + <tr> + <td width="30%"><p><samp>-a</samp> + <td><p>command-line option "a" + <tr> + <td><p><samp>-b <i>file</i></samp> + <td><p>options can have arguments and long descriptions + <tr> + <td><p><samp>--long</samp> + <td><p>options can be long also + <tr> + <td><p><samp>--input=<i>file</i></samp> + <td><p>long options can also have arguments + <tr> + <td><p><samp>/V</samp> + <td><p>DOS/VMS-style options too + </table> + </table> + + <p>There must be at least two spaces between the option and the + description. + + <h2><a href="#contents" name="literal-blocks">Literal Blocks</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#literal-blocks">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> +<samp>A paragraph containing only two colons</samp> +<br><samp>indicates that the following indented</samp> +<br><samp>text is a literal block.</samp> +<br> +<br><samp>::</samp> +<br> +<br><samp> Whitespace, newlines, blank lines, and</samp> +<br><samp> all kinds of markup (like *this* or</samp> +<br><samp> \this) is preserved by literal blocks.</samp> +<br> +<br><samp> The paragraph containing only '::'</samp> +<br><samp> will be omitted from the result.</samp> +<br> +<br><samp>The ``::`` may be tacked onto the very</samp> +<br><samp>end of any paragraph. The ``::`` will be</samp> +<br><samp>omitted if it is preceded by whitespace.</samp> +<br><samp>The ``::`` will be converted to a single</samp> +<br><samp>colon if preceded by text, like this::</samp> +<br> +<br><samp> It's very convenient to use this form.</samp> +<br> +<br><samp>Literal blocks end when text returns to</samp> +<br><samp>the preceding paragraph's indentation.</samp> +<br><samp>This means that something like::</samp> +<br> +<br><samp> We start here</samp> +<br><samp> and continue here</samp> +<br><samp> and end here.</samp> +<br> +<br><samp>is possible.</samp> + + <td> + <p>A paragraph containing only two colons +indicates that the following indented +text is a literal block. + + <pre> + Whitespace, newlines, blank lines, and + all kinds of markup (like *this* or + \this) is preserved by literal blocks. + + The paragraph containing only '::' + will be omitted from the result.</pre> + + <p>The <samp>::</samp> may be tacked onto the very +end of any paragraph. The <samp>::</samp> will be +omitted if it is preceded by whitespace. +The <samp>::</samp> will be converted to a single +colon if preceded by text, like this: + + <pre> + It's very convenient to use this form.</pre> + + <p>Literal blocks end when text returns to +the preceding paragraph's indentation. +This means that something like: + + <pre> + We start here + and continue here + and end here.</pre> + + <p>is possible. + </table> + + <h2><a href="#contents" name="block-quotes">Block Quotes</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#block-quotes">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> +<samp>Block quotes are just:</samp> + +<p><samp> Indented paragraphs,</samp> + +<p><samp> and they may nest.</samp> + <td> + Block quotes are just: + <blockquote> + <p>Indented paragraphs, + <blockquote> + <p>and they may nest. + </blockquote> + </blockquote> + </table> + + <h2><a href="#contents" name="doctest-blocks">Doctest Blocks</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#doctest-blocks">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> + <p><samp>Doctest blocks are interactive +<br>Python sessions. They begin with +<br>"``>>>``" and end with a blank line.</samp> + + <p><samp>>>> print "This is a doctest block." +<br>This is a doctest block.</samp> + + <td> + <p>Doctest blocks are interactive + Python sessions. They begin with + "<samp>>>></samp>" and end with a blank line. + + <p><samp>>>> print "This is a doctest block." +<br>This is a doctest block.</samp> + </table> + + <p>"The <a + href="http://www.python.org/doc/current/lib/module-doctest.html">doctest</a> + module searches a module's docstrings for text that looks like an + interactive Python session, then executes all such sessions to + verify they still work exactly as shown." (From the doctest docs.) + + <h2><a href="#contents" name="tables">Tables</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#tables">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> +<samp> +------------+------------+-----------+</samp> +<br><samp> | Header 1 | Header 2 | Header 3 |</samp> +<br><samp> +============+============+===========+</samp> +<br><samp> | body row 1 | column 2 | column 3 |</samp> +<br><samp> +------------+------------+-----------+</samp> +<br><samp> | body row 2 | Cells may span columns.|</samp> +<br><samp> +------------+------------+-----------+</samp> +<br><samp> | body row 3 | Cells may | - Cells |</samp> +<br><samp> +------------+ span rows. | - contain |</samp> +<br><samp> | body row 4 | | - blocks. |</samp> +<br><samp> +------------+------------+-----------+</samp> + <td> + <table align="center" border="1"> + <tr valign="top"> + <th>Header 1 + <th>Header 2 + <th>Header 3 + <tr> + <td>body row 1 + <td>column 2 + <td>column 3 + <tr> + <td>body row 2 + <td colspan="2">Cells may span columns. + <tr valign="top"> + <td>body row 3 + <td rowspan="2">Cells may<br>span rows. + <td rowspan="2"> + <ul> + <li>Cells + <li>contain + <li>blocks. + </ul> + <tr valign="top"> + <td>body row 4 + </table> + </table> + + <h2><a href="#contents" name="transitions">Transitions</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#transitions">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td> + <p><samp> +A transition marker is a horizontal line +<br>of 4 or more repeated punctuation +<br>characters.</samp> + + <p><samp>------------</samp> + + <p><samp>A transition should not begin or end a +<br>section or document, nor should two +<br>transitions be immediately adjacent.</samp> + + <td> + <p>A transition marker is a horizontal line + of 4 or more repeated punctuation + characters.</p> + + <hr> + + <p>A transition should not begin or end a + section or document, nor should two + transitions be immediately adjacent. + </table> + + <p>Transitions are commonly seen in novels and short fiction, as a + gap spanning one or more lines, marking text divisions or + signaling changes in subject, time, point of view, or emphasis. + + <h2><a href="#contents" name="footnotes">Footnotes</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#footnotes">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + + <tr valign="top"> + <td> + <samp>Footnote references, like [5]_.</samp> + <br><samp>Note that footnotes may get</samp> + <br><samp>rearranged, e.g., to the bottom of</samp> + <br><samp>the "page".</samp> + + <p><samp>.. [5] A numerical footnote. Note</samp> + <br><samp> there's no colon after the ``]``.</samp> + + <td> + Footnote references, like <sup><a href="#5">[5]</a></sup>. + Note that footnotes may get rearranged, e.g., to the bottom of + the "page". + + <p><table> + <tr><td colspan="2"><hr> + <!-- <tr><td colspan="2">Footnotes: --> + <tr><td><a name="5"><strong>[5]</strong></a><td> A numerical footnote. + Note there's no colon after the <samp>]</samp>. + </table> + + <tr valign="top"> + <td> + <samp>Autonumbered footnotes are</samp> + <br><samp>possible, like using [#]_ and [#]_.</samp> + <p><samp>.. [#] This is the first one.</samp> + <br><samp>.. [#] This is the second one.</samp> + + <p><samp>They may be assigned 'autonumber</samp> + <br><samp>labels' - for instance, + <br>[#fourth]_ and [#third]_.</samp> + + <p><samp>.. [#third] a.k.a. third_</samp> + <p><samp>.. [#fourth] a.k.a. fourth_</samp> + <td> + Autonumbered footnotes are possible, like using <sup><a + href="#auto1">1</a></sup> and <sup><a href="#auto2">2</a></sup>. + + <p>They may be assigned 'autonumber labels' - for instance, + <sup><a href="#fourth">4</a></sup> and <sup><a + href="#third">3</a></sup>. + + <p><table> + <tr><td colspan="2"><hr> + <!-- <tr><td colspan="2">Footnotes: --> + <tr><td><a name="auto1"><strong>[1]</strong></a><td> This is the first one. + <tr><td><a name="auto2"><strong>[2]</strong></a><td> This is the second one. + <tr><td><a name="third"><strong>[3]</strong></a><td> a.k.a. <a href="#third">third</a> + <tr><td><a name="fourth"><strong>[4]</strong></a><td> a.k.a. <a href="#fourth">fourth</a> + </table> + + <tr valign="top"> + <td> + <samp>Auto-symbol footnotes are also</samp> + <br><samp>possible, like this: [*]_ and [*]_.</samp> + <p><samp>.. [*] This is the first one.</samp> + <br><samp>.. [*] This is the second one.</samp> + + <td> + Auto-symbol footnotes are also + possible, like this: <sup><a href="#symbol1">*</a></sup> + and <sup><a href="#symbol2">†</a></sup>. + + <p><table> + <tr><td colspan="2"><hr> + <!-- <tr><td colspan="2">Footnotes: --> + <tr><td><a name="symbol1"><strong>[*]</strong></a><td> This is the first symbol footnote + <tr><td><a name="symbol2"><strong>[†]</strong></a><td> This is the second one. + </table> + + </table> + + <p>The numbering of auto-numbered footnotes is determined by the + order of the footnotes, not of the references. For auto-numbered + footnote references without autonumber labels + ("<samp>[#]_</samp>"), the references and footnotes must be in the + same relative order. Similarly for auto-symbol footnotes + ("<samp>[*]_</samp>"). + + <h2><a href="#contents" name="citations">Citations</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#citations">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + + <tr valign="top"> + <td> + <samp>Citation references, like [CIT2002]_.</samp> + <br><samp>Note that citations may get</samp> + <br><samp>rearranged, e.g., to the bottom of</samp> + <br><samp>the "page".</samp> + + <p><samp>.. [CIT2002] A citation</samp> + <br><samp> (as often used in journals).</samp> + + <p><samp>Citation labels contain alphanumerics,</samp> + <br><samp>underlines, hyphens and fullstops.</samp> + <br><samp>Case is not significant.</samp> + + <p><samp>Given a citation like [this]_, one</samp> + <br><samp>can also refer to it like this_.</samp> + + <p><samp>.. [this] here.</samp> + + <td> + Citation references, like <a href="#cit2002">[CIT2002]</a>. + Note that citations may get rearranged, e.g., to the bottom of + the "page". + + <p>Citation labels contain alphanumerics, underlines, hyphens + and fullstops. Case is not significant. + + <p>Given a citation like <a href="#this">[this]</a>, one + can also refer to it like <a href="#this">this</a>. + + <p><table> + <tr><td colspan="2"><hr> + <!-- <tr><td colspan="2">Citations: --> + <tr><td><a name="cit2002"><strong>[CIT2002]</strong></a><td> A citation + (as often used in journals). + <tr><td><a name="this"><strong>[this]</strong></a><td> here. + </table> + + </table> + + <h2><a href="#contents" name="hyperlink-targets">Hyperlink Targets</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#hyperlink-targets">details?</a>) + + <h3><a href="#contents" name="external-hyperlink-targets">External Hyperlink Targets</a></h3> + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + + <tr valign="top"> + <td> + <samp>External hyperlinks, like Python_.</samp> + + <p><samp>.. _Python: http://www.python.org/</samp> + <td> + <table width="100%"> + <tr bgcolor="#99CCFF"><td><em>Fold-in form</em> + <tr><td>Indirect hyperlinks, like + <a href="http://www.python.org">Python</a>. + <tr bgcolor="#99CCFF"><td><em>Call-out form</em> + <tr><td>External hyperlinks, like + <a href="#labPython"><i>Python</i></a>. + + <p><table> + <tr><td colspan="2"><hr> + <tr><td><a name="labPython"><i>Python:</i></a> + <td> <a href="http://www.python.org/">http://www.python.org/</a> + </table> + </table> + </table> + + <p>"<em>Fold-in</em>" is the representation typically used in HTML + documents (think of the indirect hyperlink being "folded in" like + ingredients into a cake), and "<em>call-out</em>" is more suitable for + printed documents, where the link needs to be presented explicitly, for + example as a footnote. + + <h3><a href="#contents" name="internal-hyperlink-targets">Internal Hyperlink Targets</a></h3> + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + + <tr valign="top"> + <td><samp>Internal crossreferences, like example_.</samp> + + <p><samp>.. _example:</samp> + + <p><samp>This is an example crossreference target.</samp> + <td> + <table width="100%"> + <tr bgcolor="#99CCFF"><td><em>Fold-in form</em> + <!-- Note that some browsers may not like an "a" tag that --> + <!-- does not have any content, so we could arbitrarily --> + <!-- use the first word as content - *or* just trust to --> + <!-- luck! --> + <tr><td>Internal crossreferences, like <a href="#example-foldin">example</a> + <p><a name="example-foldin">This</a> is an example + crossreference target. + <tr><td bgcolor="#99CCFF"><em>Call-out form</em> + <tr><td>Internal crossreferences, like <a href="#example-callout">example</a> + + <p><a name="example-callout"><i>example:</i></a> + <br>This is an example crossreference target. + </table> + + </table> + + <h3><a href="#contents" name="indirect-hyperlink-targets">Indirect Hyperlink Targets</a></h3> + + <p>(<a href="../../spec/rst/reStructuredText.html#indirect-hyperlink-targets">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + + <tr valign="top"> + <td> + <samp>Python_ is `my favourite +<br>programming language`__.</samp> + + <p><samp>.. _Python: http://www.python.org/</samp> + + <p><samp>__ Python_</samp> + + <td> + <p><a href="http://www.python.org/">Python</a> is + <a href="http://www.python.org/">my favourite + programming language</a>. + + </table> + + <p>The second hyperlink target (the line beginning with + "<samp>__</samp>") is both an indirect hyperlink target + (<i>indirectly</i> pointing at the Python website via the + "<samp>Python_</samp>" reference) and an <b>anonymous hyperlink + target</b>. In the text, a double-underscore suffix is used to + indicate an <b>anonymous hyperlink reference</b>. + + <h3><a href="#contents" name="implicit-hyperlink-targets">Implicit Hyperlink Targets</a></h3> + + <p>(<a href="../../spec/rst/reStructuredText.html#implicit-hyperlink-targets">details?</a>) + + <p>Section titles, footnotes, and citations automatically generate + hyperlink targets (the title text or footnote/citation label is + used as the hyperlink name). + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + + <tr valign="top"> + <td> + <samp>Titles are targets, too</samp> + <br><samp>=======================</samp> + <br><samp>Implict references, like `Titles are</samp> + <br><samp>targets, too`_.</samp> + <td> + <font size="+2"><strong><a name="title">Titles are targets, too</a></strong></font> + <p>Implict references, like <a href="#Title">Titles are + targets, too</a>. + </table> + + <h2><a href="#contents" name="directives">Directives</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#directives">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td><samp>For instance:</samp> + + <p><samp>.. image:: images/ball1.gif</samp> + + <td> + For instance: + <p><img src="images/ball1.gif" alt="ball1"> + </table> + + <h2><a href="#contents" name="substitution-references-and-definitions">Substitution References and Definitions</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#substitution-definitions">details?</a>) + + <p>Substitutions are like inline directives, allowing graphics and + arbitrary constructs within text. + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td><samp> +The |biohazard| symbol must be +used on containers used to +dispose of medical waste.</samp> + + <p><samp> +.. |biohazard| image:: biohazard.png</samp> + + <td> + + <p>The <img src="images/biohazard.png" align="bottom" alt="biohazard"> symbol + must be used on containers used to dispose of medical waste. + + </table> + + <h2><a href="#contents" name="comments">Comments</a></h2> + + <p>(<a href="../../spec/rst/reStructuredText.html#comments">details?</a>) + + <p><table border="1" width="100%" bgcolor="#ffffcc" cellpadding="3"> + <thead> + <tr align="left" bgcolor="#99CCFF"> + <th width="50%">Plain text + <th width="50%">Typical result + </thead> + <tbody> + <tr valign="top"> + <td><samp>.. This text will not be shown</samp> + <br><samp> (but, for instance, in HTML might be</samp> + <br><samp> rendered as an HTML comment)</samp> + + <td> + <!-- This text will not be shown --> + <!-- (but, for instance in HTML might be --> + <!-- rendered as an HTML comment) --> + + <tr valign="top"> + <td> + <samp>An empty "comment" directive does not</samp> + <br><samp>"consume" following blocks.</samp> + <p><samp>..</samp> + <p><samp> So this block is not "lost",</samp> + <br><samp> despite its indentation.</samp> + <td> + An empty "comment" directive does not + "consume" following blocks. + <blockquote> + So this block is not "lost", + despite its indentation. + </blockquote> + </table> + + <p><hr> + <address> + <p>Authors: + <a href="http://www.tibsnjoan.co.uk/">Tibs</a> + (<a href="mailto:tony@lsl.co.uk"><tt>tony@lsl.co.uk</tt></a> or + <a href="mailto:tibs@tibsnjoan.co.uk"><tt>tibs@tibsnjoan.co.uk</tt></a>) + and David Goodger + (<a href="mailto:goodger@users.sourceforge.net">goodger@users.sourceforge.net</a>) + </address> + <!-- Created: Fri Aug 03 09:11:57 GMT Daylight Time 2001 --> + </body> +</html> diff --git a/docs/user/rst/quickstart.txt b/docs/user/rst/quickstart.txt new file mode 100644 index 000000000..be9139d60 --- /dev/null +++ b/docs/user/rst/quickstart.txt @@ -0,0 +1,301 @@ +A ReStructuredText Primer +========================= + +:Author: Richard Jones +:Version: $Revision$ + +The text below contains links that look like "(quickref__)". These +are relative links that point to the `Quick reStructuredText`_ user +reference. If these links don't work, please refer to the `master +quick reference`_ document. + +__ +.. _Quick reStructuredText: quickref.html +.. _master quick reference: + http://docutils.sourceforge.net/docs/rst/quickref.html + + +Structure +--------- + +From the outset, let me say that "Structured Text" is probably a bit +of a misnomer. It's more like "Relaxed Text" that uses certain +consistent patterns. These patterns are interpreted by a HTML +converter to produce "Very Structured Text" that can be used by a web +browser. + +The most basic pattern recognised is a **paragraph** (quickref__). +That's a chunk of text that is separated by blank lines (one is +enough). Paragraphs must have the same indentation -- that is, line +up at their left edge. Paragraphs that start indented will result in +indented quote paragraphs. For example:: + + This is a paragraph. It's quite + short. + + This paragraph will result in an indented block of + text, typically used for quoting other text. + + This is another one. + +Results in: + + This is a paragraph. It's quite + short. + + This paragraph will result in an indented block of + text, typically used for quoting other text. + + This is another one. + +__ quickref.html#paragraphs + +Text styles +----------- + +(quickref__) + +__ quickref.html#inline-markup + +Inside paragraphs and other bodies of text, you may additionally mark +text for *italics* with "``*italics*``" or **bold** with +"``**bold**``". + +If you want something to appear as a fixed-space literal, use +"````double back-quotes````". Note that no further fiddling is done +inside the double back-quotes -- so asterisks "``*``" etc. are left +alone. + +If you find that you want to use one of the "special" characters in +text, it will generally be OK -- ReST is pretty smart. For example, +this * asterisk is handled just fine. If you actually want text +\*surrounded by asterisks* to **not** be italicised, then you need to +indicate that the asterisk is not special. You do this by placing a +backslash just before it, like so "``\*``" (quickref__). + +__ quickref.html#escaping + +Lists +----- + +Lists of items come in three main flavours: **enumerated**, +**bulleted** and **definitions**. In all list cases, you may have as +many paragraphs, sublists, etc. as you want, as long as the left-hand +side of the paragraph or whatever aligns with the first line of text +in the list item. + +Lists must always start a new paragraph -- that is, they must appear +after a blank line. + +**enumerated** lists (numbers, letters or roman numerals; quickref__) + __ quickref.html#enumerated-lists + + Start a line off with a number or letter followed by a period ".", + right bracket ")" or surrounded by brackets "( )" -- whatever you're + comfortable with. All of the following forms are recognised:: + + 1. numbers + + A. upper-case letters + and it goes over many lines + + with two paragraphs and all! + + a. lower-case letters + + 3. with a sub-list starting at a different number + 4. make sure the numbers are in the correct sequence though! + + I. upper-case roman numerals + + i. lower-case roman numerals + + (1) numbers again + + 1) and again + + Results in (note: the different enumerated list styles are not + always supported by every web browser, so you may not get the full + effect here): + + 1. numbers + + A. upper-case letters + and it goes over many lines + + with two paragraphs and all! + + a. lower-case letters + + 3. with a sub-list starting at a different number + 4. make sure the numbers are in the correct sequence though! + + I. upper-case roman numerals + + i. lower-case roman numerals + + (1) numbers again + + 1) and again + +**bulleted** lists (quickref__) + __ quickref.html#bullet-lists + + Just like enumerated lists, start the line off with a bullet point + character - either "-", "+" or "*":: + + * a bullet point using "*" + + - a sub-list using "-" + + + yet another sub-list + + - another item + + Results in: + + * a bullet point using "*" + + - a sub-list using "-" + + + yet another sub-list + + - another item + +**definition** lists (quickref__) + __ quickref.html#definition-lists + + Unlike the other two, the definition lists consist of a term, and + the definition of that term. The format of a definition list is:: + + what + Definition lists associate a term with a definition. + + *how* + The term is a one-line phrase, and the definition is one or more + paragraphs or body elements, indented relative to the term. + Blank lines are not allowed between term and definition. + + Results in: + + what + Definition lists associate a term with a definition. + + *how* + The term is a one-line phrase, and the definition is one or more + paragraphs or body elements, indented relative to the term. + Blank lines are not allowed between term and definition. + +Preformatting (code samples) +---------------------------- +(quickref__) + +__ quickref.html#literal-blocks + +To just include a chunk of preformatted, never-to-be-fiddled-with +text, finish the prior paragraph with "``::``". The preformatted +block is finished when the text falls back to the same indentation +level as a paragraph prior to the preformatted block. For example:: + + An example:: + + Whitespace, newlines, blank lines, and all kinds of markup + (like *this* or \this) is preserved by literal blocks. + Lookie here, I've dropped an indentation level + (but not far enough) + + no more example + +Results in: + + An example:: + + Whitespace, newlines, blank lines, and all kinds of markup + (like *this* or \this) is preserved by literal blocks. + Lookie here, I've dropped an indentation level + (but not far enough) + + no more example + +Note that if a paragraph consists only of "``::``", then it's removed +from the output:: + + :: + + This is preformatted text, and the + last "::" paragraph is removed + +Results in: + +:: + + This is preformatted text, and the + last "::" paragraph is removed + +Sections +-------- + +(quickref__) + +__ quickref.html#section-structure + +To break longer text up into sections, you use **section headers**. +These are a single line of text (one or more words) with an underline +(and optionally an overline) in dashes "``-----``", equals +"``======``", tildes "``~~~~~~``" or any of the non-alphanumeric +characters ``= - ` : ' " ~ ^ _ * + # < >`` that you feel comfortable +with. Be consistent though, since all sections marked with the same +underline style are deemed to be at the same level:: + + Chapter 1 Title + =============== + + Section 1.1 Title + ----------------- + + Subsection 1.1.1 Title + ~~~~~~~~~~~~~~~~~~~~~~ + + Section 1.2 Title + ----------------- + + Chapter 2 Title + =============== + +results in: + +.. sorry, I change the heading style here, but it's only an example :) + +Chapter 1 Title +~~~~~~~~~~~~~~~ + +Section 1.1 Title +''''''''''''''''' + +Subsection 1.1.1 Title +"""""""""""""""""""""" + +Section 1.2 Title +''''''''''''''''' + +Chapter 2 Title +~~~~~~~~~~~~~~~ + +Note that section headers are available as link targets, just using +their name. To link to the Lists_ heading, I write "``Lists_``". If +the heading has a space in it like `text styles`_, we need to quote +the heading "```text styles`_``". + +What Next? +---------- + +This primer introduces the most common features of reStructuredText, +but there are a lot more to explore. The `Quick reStructuredText`_ +user reference is a good place to go next. For complete details, the +`reStructuredText Markup Specification`_ is the place to go [#]_. + +.. _reStructuredText Markup Specification: + ../../spec/rst/reStructuredText.html + +.. [#] If that relative link doesn't work, try the master document: + http://docutils.sourceforge.net/spec/rst/reStructuredText.html. diff --git a/docutils/__init__.py b/docutils/__init__.py new file mode 100644 index 000000000..0ee88d94a --- /dev/null +++ b/docutils/__init__.py @@ -0,0 +1,51 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +This is the Docutils (Python Documentation Utilities) package. + +Package Structure +================= + +Modules: + +- __init__.py: Contains the package docstring only (this text). + +- core.py: Contains the ``Publisher`` class and ``publish()`` convenience + function. + +- nodes.py: DPS document tree (doctree) node class library. + +- roman.py: Conversion to and from Roman numerals. Courtesy of Mark + Pilgrim (http://diveintopython.org/). + +- statemachine.py: A finite state machine specialized for + regular-expression-based text filters. + +- urischemes.py: Contains a complete mapping of known URI addressing + scheme names to descriptions. + +- utils.py: Contains the ``Reporter`` system warning class and miscellaneous + utilities. + +Subpackages: + +- languages: Language-specific mappings of terms. + +- parsers: Syntax-specific input parser modules or packages. + +- readers: Context-specific input handlers which understand the data + source and manage a parser. + +- transforms: Modules used by readers and writers to modify DPS + doctrees. + +- writers: Format-specific output translators. +""" + +__docformat__ = 'reStructuredText' diff --git a/docutils/core.py b/docutils/core.py new file mode 100644 index 000000000..b553b07b7 --- /dev/null +++ b/docutils/core.py @@ -0,0 +1,85 @@ +#! /usr/bin/env python + +""" +:Authors: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + + +""" + +__docformat__ = 'reStructuredText' + + +import readers, parsers, writers, utils + + +class Publisher: + + """ + Publisher encapsulates the high-level logic of a Docutils system. + """ + + reporter = None + """A `utils.Reporter` instance used for all document processing.""" + + def __init__(self, reader=None, parser=None, writer=None, reporter=None, + languagecode='en', warninglevel=2, errorlevel=4, + warningstream=None, debug=0): + """ + Initial setup. If any of `reader`, `parser`, or `writer` are + not specified, the corresponding 'set*' method should be + called. + """ + self.reader = reader + self.parser = parser + self.writer = writer + if not reporter: + reporter = utils.Reporter(warninglevel, errorlevel, warningstream, + debug) + self.reporter = reporter + self.languagecode = languagecode + + def setreader(self, readername, languagecode=None): + """Set `self.reader` by name.""" + readerclass = readers.get_reader_class(readername) + self.reader = readerclass(self.reporter, + languagecode or self.languagecode) + + def setparser(self, parsername): + """Set `self.parser` by name.""" + parserclass = parsers.get_parser_class(parsername) + self.parser = parserclass() + + def setwriter(self, writername): + """Set `self.writer` by name.""" + writerclass = writers.get_writer_class(writername) + self.writer = writerclass() + + def publish(self, source, destination): + """ + Run `source` through `self.reader`, then through `self.writer` to + `destination`. + """ + document = self.reader.read(source, self.parser) + self.writer.write(document, destination) + + +def publish(source=None, destination=None, + reader=None, readername='standalone', + parser=None, parsername='restructuredtext', + writer=None, writername='pprint', + reporter=None, languagecode='en', + warninglevel=2, errorlevel=4, warningstream=None, debug=0): + """Set up & run a `Publisher`.""" + pub = Publisher(reader, parser, writer, reporter, languagecode, + warninglevel, errorlevel, warningstream, debug) + if reader is None: + pub.setreader(readername) + if parser is None: + pub.setparser(parsername) + if writer is None: + pub.setwriter(writername) + pub.publish(source, destination) diff --git a/docutils/languages/__init__.py b/docutils/languages/__init__.py new file mode 100644 index 000000000..4c10d9124 --- /dev/null +++ b/docutils/languages/__init__.py @@ -0,0 +1,22 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +This package contains modules for language-dependent features of Docutils. +""" + +__docformat__ = 'reStructuredText' + +_languages = {} + +def getlanguage(languagecode): + if _languages.has_key(languagecode): + return _languages[languagecode] + module = __import__(languagecode, globals(), locals()) + _languages[languagecode] = module + return module diff --git a/docutils/languages/en.py b/docutils/languages/en.py new file mode 100644 index 000000000..5b97dadb7 --- /dev/null +++ b/docutils/languages/en.py @@ -0,0 +1,58 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +English-language mappings for language-dependent features of Docutils. +""" + +__docformat__ = 'reStructuredText' + + +from docutils import nodes + + +labels = { + 'author': 'Author', + 'authors': 'Authors', + 'organization': 'Organization', + 'contact': 'Contact', + 'version': 'Version', + 'revision': 'Revision', + 'status': 'Status', + 'date': 'Date', + 'copyright': 'Copyright', + 'abstract': 'Abstract', + 'attention': 'Attention!', + 'caution': 'Caution!', + 'danger': '!DANGER!', + 'error': 'Error', + 'hint': 'Hint', + 'important': 'Important', + 'note': 'Note', + 'tip': 'Tip', + 'warning': 'Warning', + 'contents': 'Contents'} +"""Mapping of node class name to label text.""" + +bibliographic_fields = { + 'author': nodes.author, + 'authors': nodes.authors, + 'organization': nodes.organization, + 'contact': nodes.contact, + 'version': nodes.version, + 'revision': nodes.revision, + 'status': nodes.status, + 'date': nodes.date, + 'copyright': nodes.copyright, + 'abstract': nodes.topic} +"""Field name (lowcased) to node class name mapping for bibliographic fields +(field_list).""" + +author_separators = [';', ','] +"""List of separator strings for the 'Authors' bibliographic field. Tried in +order.""" diff --git a/docutils/nodes.py b/docutils/nodes.py new file mode 100644 index 000000000..ece182c85 --- /dev/null +++ b/docutils/nodes.py @@ -0,0 +1,1112 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Docutils document tree element class library. + +Classes in CamelCase are abstract base classes or auxiliary classes. The one +exception is `Text`, for a text node; uppercase is used to differentiate from +element classes. + +Classes in lower_case_with_underscores are element classes, matching the XML +element generic identifiers in the DTD_. + +.. _DTD: http://docstring.sourceforge.net/spec/gpdi.dtd +""" + +import sys, os +import xml.dom.minidom +from types import IntType, SliceType, StringType, TupleType, ListType +from UserString import MutableString +import utils +import docutils + + +# ============================== +# Functional Node Base Classes +# ============================== + +class Node: + + """Abstract base class of nodes in a document tree.""" + + parent = None + """Back-reference to the `Node` containing this `Node`.""" + + def __nonzero__(self): + """Node instances are always true.""" + return 1 + + def asdom(self, dom=xml.dom.minidom): + """Return a DOM representation of this Node.""" + return self._dom_node(dom) + + def pformat(self, indent=' ', level=0): + """Return an indented pseudo-XML representation, for test purposes.""" + raise NotImplementedError + + def walk(self, visitor): + """ + Traverse a tree of `Node` objects, calling ``visit_...`` methods of + `visitor` when entering each node. If there is no + ``visit_particular_node`` method for a node of type + ``particular_node``, the ``unknown_visit`` method is called. + + Doesn't handle arbitrary modification in-place during the traversal. + Replacing one element with one element is OK. + + Parameter `visitor`: A `NodeVisitor` object, containing a + ``visit_...`` method for each `Node` subclass encountered. + """ + name = 'visit_' + self.__class__.__name__ + method = getattr(visitor, name, visitor.unknown_visit) + visitor.doctree.reporter.debug(name, category='nodes.Node.walk') + try: + method(self) + children = self.getchildren() + try: + for i in range(len(children)): + children[i].walk(visitor) + except SkipSiblings: + pass + except (SkipChildren, SkipNode): + pass + + def walkabout(self, visitor): + """ + Perform a tree traversal similarly to `Node.walk()`, except also call + ``depart_...`` methods before exiting each node. If there is no + ``depart_particular_node`` method for a node of type + ``particular_node``, the ``unknown_departure`` method is called. + + Parameter `visitor`: A `NodeVisitor` object, containing ``visit_...`` + and ``depart_...`` methods for each `Node` subclass encountered. + """ + name = 'visit_' + self.__class__.__name__ + method = getattr(visitor, name, visitor.unknown_visit) + visitor.doctree.reporter.debug(name, category='nodes.Node.walkabout') + try: + method(self) + children = self.getchildren() + try: + for i in range(len(children)): + children[i].walkabout(visitor) + except SkipSiblings: + pass + except SkipChildren: + pass + except SkipNode: + return + name = 'depart_' + self.__class__.__name__ + method = getattr(visitor, name, visitor.unknown_departure) + visitor.doctree.reporter.debug(name, category='nodes.Node.walkabout') + method(self) + + +class Text(Node, MutableString): + + tagname = '#text' + + def __repr__(self): + data = repr(self.data) + if len(data) > 70: + data = repr(self.data[:64] + ' ...') + return '<%s: %s>' % (self.tagname, data) + + def shortrepr(self): + data = repr(self.data) + if len(data) > 20: + data = repr(self.data[:16] + ' ...') + return '<%s: %s>' % (self.tagname, data) + + def _dom_node(self, dom): + return dom.Text(self.data) + + def _rooted_dom_node(self, domroot): + return domroot.createTextNode(self.data) + + def astext(self): + return self.data + + def pformat(self, indent=' ', level=0): + result = [] + indent = indent * level + for line in self.data.splitlines(): + result.append(indent + line + '\n') + return ''.join(result) + + def getchildren(self): + """Text nodes have no children. Return [].""" + return [] + + +class Element(Node): + + """ + `Element` is the superclass to all specific elements. + + Elements contain attributes and child nodes. Elements emulate dictionaries + for attributes, indexing by attribute name (a string). To set the + attribute 'att' to 'value', do:: + + element['att'] = 'value' + + Elements also emulate lists for child nodes (element nodes and/or text + nodes), indexing by integer. To get the first child node, use:: + + element[0] + + Elements may be constructed using the ``+=`` operator. To add one new + child node to element, do:: + + element += node + + To add a list of multiple child nodes at once, use the same ``+=`` + operator:: + + element += [node1, node2] + """ + + tagname = None + """The element generic identifier. If None, it is set as an instance + attribute to the name of the class.""" + + child_text_separator = '\n\n' + """Separator for child nodes, used by `astext()` method.""" + + def __init__(self, rawsource='', *children, **attributes): + self.rawsource = rawsource + """The raw text from which this element was constructed.""" + + self.children = [] + """List of child nodes (elements and/or `Text`).""" + + self.extend(children) # extend self.children w/ attributes + + self.attributes = {} + """Dictionary of attribute {name: value}.""" + + for att, value in attributes.items(): + self.attributes[att.lower()] = value + + if self.tagname is None: + self.tagname = self.__class__.__name__ + + def _dom_node(self, dom): + element = dom.Element(self.tagname) + for attribute, value in self.attributes.items(): + element.setAttribute(attribute, str(value)) + for child in self.children: + element.appendChild(child._dom_node(dom)) + return element + + def _rooted_dom_node(self, domroot): + element = domroot.createElement(self.tagname) + for attribute, value in self.attributes.items(): + element.setAttribute(attribute, str(value)) + for child in self.children: + element.appendChild(child._rooted_dom_node(domroot)) + return element + + def __repr__(self): + data = '' + for c in self.children: + data += c.shortrepr() + if len(data) > 60: + data = data[:56] + ' ...' + break + if self.hasattr('name'): + return '<%s "%s": %s>' % (self.__class__.__name__, + self.attributes['name'], data) + else: + return '<%s: %s>' % (self.__class__.__name__, data) + + def shortrepr(self): + if self.hasattr('name'): + return '<%s "%s"...>' % (self.__class__.__name__, + self.attributes['name']) + else: + return '<%s...>' % self.tagname + + def __str__(self): + if self.children: + return '%s%s%s' % (self.starttag(), + ''.join([str(c) for c in self.children]), + self.endtag()) + else: + return self.emptytag() + + def starttag(self): + parts = [self.tagname] + for name, value in self.attlist(): + if value is None: # boolean attribute + parts.append(name) + elif isinstance(value, ListType): + values = [str(v) for v in value] + parts.append('%s="%s"' % (name, ' '.join(values))) + else: + parts.append('%s="%s"' % (name, str(value))) + return '<%s>' % ' '.join(parts) + + def endtag(self): + return '</%s>' % self.tagname + + def emptytag(self): + return '<%s/>' % ' '.join([self.tagname] + + ['%s="%s"' % (n, v) + for n, v in self.attlist()]) + + def __len__(self): + return len(self.children) + + def __getitem__(self, key): + if isinstance(key, StringType): + return self.attributes[key] + elif isinstance(key, IntType): + return self.children[key] + elif isinstance(key, SliceType): + assert key.step is None, 'cannot handle slice with stride' + return self.children[key.start:key.stop] + else: + raise TypeError, ('element index must be an integer, a slice, or ' + 'an attribute name string') + + def __setitem__(self, key, item): + if isinstance(key, StringType): + self.attributes[key] = item + elif isinstance(key, IntType): + item.parent = self + self.children[key] = item + elif isinstance(key, SliceType): + assert key.step is None, 'cannot handle slice with stride' + for node in item: + node.parent = self + self.children[key.start:key.stop] = item + else: + raise TypeError, ('element index must be an integer, a slice, or ' + 'an attribute name string') + + def __delitem__(self, key): + if isinstance(key, StringType): + del self.attributes[key] + elif isinstance(key, IntType): + del self.children[key] + elif isinstance(key, SliceType): + assert key.step is None, 'cannot handle slice with stride' + del self.children[key.start:key.stop] + else: + raise TypeError, ('element index must be an integer, a simple ' + 'slice, or an attribute name string') + + def __add__(self, other): + return self.children + other + + def __radd__(self, other): + return other + self.children + + def __iadd__(self, other): + """Append a node or a list of nodes to `self.children`.""" + if isinstance(other, Node): + other.parent = self + self.children.append(other) + elif other is not None: + for node in other: + node.parent = self + self.children.extend(other) + return self + + def astext(self): + return self.child_text_separator.join( + [child.astext() for child in self.children]) + + def attlist(self): + attlist = self.attributes.items() + attlist.sort() + return attlist + + def get(self, key, failobj=None): + return self.attributes.get(key, failobj) + + def hasattr(self, attr): + return self.attributes.has_key(attr) + + def delattr(self, attr): + if self.attributes.has_key(attr): + del self.attributes[attr] + + def setdefault(self, key, failobj=None): + return self.attributes.setdefault(key, failobj) + + has_key = hasattr + + def append(self, item): + item.parent = self + self.children.append(item) + + def extend(self, item): + for node in item: + node.parent = self + self.children.extend(item) + + def insert(self, i, item): + assert isinstance(item, Node) + item.parent = self + self.children.insert(i, item) + + def pop(self, i=-1): + return self.children.pop(i) + + def remove(self, item): + self.children.remove(item) + + def index(self, item): + return self.children.index(item) + + def replace(self, old, new): + """Replace one child `Node` with another child or children.""" + index = self.index(old) + if isinstance(new, Node): + self[index] = new + elif new is not None: + self[index:index+1] = new + + def findclass(self, childclass, start=0, end=sys.maxint): + """ + Return the index of the first child whose class exactly matches. + + Parameters: + + - `childclass`: A `Node` subclass to search for, or a tuple of `Node` + classes. If a tuple, any of the classes may match. + - `start`: Initial index to check. + - `end`: Initial index to *not* check. + """ + if not isinstance(childclass, TupleType): + childclass = (childclass,) + for index in range(start, min(len(self), end)): + for c in childclass: + if isinstance(self[index], c): + return index + return None + + def findnonclass(self, childclass, start=0, end=sys.maxint): + """ + Return the index of the first child whose class does *not* match. + + Parameters: + + - `childclass`: A `Node` subclass to skip, or a tuple of `Node` + classes. If a tuple, none of the classes may match. + - `start`: Initial index to check. + - `end`: Initial index to *not* check. + """ + if not isinstance(childclass, TupleType): + childclass = (childclass,) + for index in range(start, min(len(self), end)): + match = 0 + for c in childclass: + if isinstance(self.children[index], c): + match = 1 + if not match: + return index + return None + + def pformat(self, indent=' ', level=0): + return ''.join(['%s%s\n' % (indent * level, self.starttag())] + + [child.pformat(indent, level+1) + for child in self.children]) + + def getchildren(self): + """Return this element's children.""" + return self.children + + +class TextElement(Element): + + """ + An element which directly contains text. + + Its children are all Text or TextElement nodes. + """ + + child_text_separator = '' + """Separator for child nodes, used by `astext()` method.""" + + def __init__(self, rawsource='', text='', *children, **attributes): + if text != '': + textnode = Text(text) + Element.__init__(self, rawsource, textnode, *children, + **attributes) + else: + Element.__init__(self, rawsource, *children, **attributes) + + +# ======== +# Mixins +# ======== + +class Resolvable: + + resolved = 0 + + +class BackLinkable: + + def add_backref(self, refid): + self.setdefault('backrefs', []).append(refid) + + +# ==================== +# Element Categories +# ==================== + +class Root: pass + +class Titular: pass + +class Bibliographic: pass + + +class PreBibliographic: + """Category of Node which may occur before Bibliographic Nodes.""" + pass + + +class Structural: pass + +class Body: pass + +class General(Body): pass + +class Sequential(Body): pass + +class Admonition(Body): pass + + +class Special(Body): + """Special internal body elements, not true document components.""" + pass + + +class Component: pass + +class Inline: pass + +class Referential(Resolvable): pass + #refnode = None + #"""Resolved reference to a node.""" + + +class Targetable(Resolvable): + + referenced = 0 + + +# ============== +# Root Element +# ============== + +class document(Root, Structural, Element): + + def __init__(self, reporter, languagecode, *args, **kwargs): + Element.__init__(self, *args, **kwargs) + + self.reporter = reporter + """System message generator.""" + + self.languagecode = languagecode + """ISO 639 2-letter language identifier.""" + + self.explicit_targets = {} + """Mapping of target names to explicit target nodes.""" + + self.implicit_targets = {} + """Mapping of target names to implicit (internal) target + nodes.""" + + self.external_targets = [] + """List of external target nodes.""" + + self.internal_targets = [] + """List of internal target nodes.""" + + self.indirect_targets = [] + """List of indirect target nodes.""" + + self.substitution_defs = {} + """Mapping of substitution names to substitution_definition nodes.""" + + self.refnames = {} + """Mapping of names to lists of referencing nodes.""" + + self.refids = {} + """Mapping of ids to lists of referencing nodes.""" + + self.nameids = {} + """Mapping of names to unique id's.""" + + self.ids = {} + """Mapping of ids to nodes.""" + + self.substitution_refs = {} + """Mapping of substitution names to lists of substitution_reference + nodes.""" + + self.footnote_refs = {} + """Mapping of footnote labels to lists of footnote_reference nodes.""" + + self.citation_refs = {} + """Mapping of citation labels to lists of citation_reference nodes.""" + + self.anonymous_targets = [] + """List of anonymous target nodes.""" + + self.anonymous_refs = [] + """List of anonymous reference nodes.""" + + self.autofootnotes = [] + """List of auto-numbered footnote nodes.""" + + self.autofootnote_refs = [] + """List of auto-numbered footnote_reference nodes.""" + + self.symbol_footnotes = [] + """List of symbol footnote nodes.""" + + self.symbol_footnote_refs = [] + """List of symbol footnote_reference nodes.""" + + self.footnotes = [] + """List of manually-numbered footnote nodes.""" + + self.citations = [] + """List of citation nodes.""" + + self.pending = [] + """List of pending elements @@@.""" + + self.autofootnote_start = 1 + """Initial auto-numbered footnote number.""" + + self.symbol_footnote_start = 0 + """Initial symbol footnote symbol index.""" + + self.id_start = 1 + """Initial ID number.""" + + self.messages = Element() + """System messages generated after parsing.""" + + def asdom(self, dom=xml.dom.minidom): + domroot = dom.Document() + domroot.appendChild(Element._rooted_dom_node(self, domroot)) + return domroot + + def set_id(self, node, msgnode=None): + if msgnode == None: + msgnode = self.messages + if node.has_key('id'): + id = node['id'] + if self.ids.has_key(id) and self.ids[id] is not node: + msg = self.reporter.severe('Duplicate ID: "%s".' % id) + msgnode += msg + else: + if node.has_key('name'): + id = utils.id(node['name']) + else: + id = '' + while not id or self.ids.has_key(id): + id = 'id%s' % self.id_start + self.id_start += 1 + node['id'] = id + self.ids[id] = node + if node.has_key('name'): + self.nameids[node['name']] = id + return id + + def note_implicit_target(self, target, msgnode=None): + if msgnode == None: + msgnode = self.messages + id = self.set_id(target, msgnode) + name = target['name'] + if self.explicit_targets.has_key(name) \ + or self.implicit_targets.has_key(name): + msg = self.reporter.info( + 'Duplicate implicit target name: "%s".' % name, backrefs=[id]) + msgnode += msg + self.clear_target_names(name, self.implicit_targets) + del target['name'] + target['dupname'] = name + self.implicit_targets[name] = None + else: + self.implicit_targets[name] = target + + def note_explicit_target(self, target, msgnode=None): + if msgnode == None: + msgnode = self.messages + id = self.set_id(target, msgnode) + name = target['name'] + if self.explicit_targets.has_key(name): + level = 2 + if target.has_key('refuri'): # external target, dups OK + refuri = target['refuri'] + t = self.explicit_targets[name] + if t.has_key('name') and t.has_key('refuri') \ + and t['refuri'] == refuri: + level = 1 # just inform if refuri's identical + msg = self.reporter.system_message( + level, 'Duplicate explicit target name: "%s".' % name, + backrefs=[id]) + msgnode += msg + self.clear_target_names(name, self.explicit_targets, + self.implicit_targets) + if level > 1: + del target['name'] + target['dupname'] = name + elif self.implicit_targets.has_key(name): + msg = self.reporter.info( + 'Duplicate implicit target name: "%s".' % name, backrefs=[id]) + msgnode += msg + self.clear_target_names(name, self.implicit_targets) + self.explicit_targets[name] = target + + def clear_target_names(self, name, *targetdicts): + for targetdict in targetdicts: + if not targetdict.has_key(name): + continue + node = targetdict[name] + if node.has_key('name'): + node['dupname'] = node['name'] + del node['name'] + + def note_refname(self, node): + self.refnames.setdefault(node['refname'], []).append(node) + + def note_refid(self, node): + self.refids.setdefault(node['refid'], []).append(node) + + def note_external_target(self, target): + self.external_targets.append(target) + + def note_internal_target(self, target): + self.internal_targets.append(target) + + def note_indirect_target(self, target): + self.indirect_targets.append(target) + if target.has_key('name'): + self.note_refname(target) + + def note_anonymous_target(self, target): + self.set_id(target) + self.anonymous_targets.append(target) + + def note_anonymous_ref(self, ref): + self.anonymous_refs.append(ref) + + def note_autofootnote(self, footnote): + self.set_id(footnote) + self.autofootnotes.append(footnote) + + def note_autofootnote_ref(self, ref): + self.set_id(ref) + self.autofootnote_refs.append(ref) + + def note_symbol_footnote(self, footnote): + self.set_id(footnote) + self.symbol_footnotes.append(footnote) + + def note_symbol_footnote_ref(self, ref): + self.set_id(ref) + self.symbol_footnote_refs.append(ref) + + def note_footnote(self, footnote): + self.set_id(footnote) + self.footnotes.append(footnote) + + def note_footnote_ref(self, ref): + self.set_id(ref) + self.footnote_refs.setdefault(ref['refname'], []).append(ref) + self.note_refname(ref) + + def note_citation(self, citation): + self.set_id(citation) + self.citations.append(citation) + + def note_citation_ref(self, ref): + self.set_id(ref) + self.citation_refs.setdefault(ref['refname'], []).append(ref) + self.note_refname(ref) + + def note_substitution_def(self, subdef, msgnode=None): + name = subdef['name'] + if self.substitution_defs.has_key(name): + msg = self.reporter.error( + 'Duplicate substitution definition name: "%s".' % name) + if msgnode == None: + msgnode = self.messages + msgnode += msg + oldnode = self.substitution_defs[name] + oldnode['dupname'] = oldnode['name'] + del oldnode['name'] + # keep only the last definition + self.substitution_defs[name] = subdef + + def note_substitution_ref(self, subref): + self.substitution_refs.setdefault( + subref['refname'], []).append(subref) + + def note_pending(self, pending): + self.pending.append(pending) + + +# ================ +# Title Elements +# ================ + +class title(Titular, PreBibliographic, TextElement): pass +class subtitle(Titular, PreBibliographic, TextElement): pass + + +# ======================== +# Bibliographic Elements +# ======================== + +class docinfo(Bibliographic, Element): pass +class author(Bibliographic, TextElement): pass +class authors(Bibliographic, Element): pass +class organization(Bibliographic, TextElement): pass +class contact(Bibliographic, TextElement): pass +class version(Bibliographic, TextElement): pass +class revision(Bibliographic, TextElement): pass +class status(Bibliographic, TextElement): pass +class date(Bibliographic, TextElement): pass +class copyright(Bibliographic, TextElement): pass + + +# ===================== +# Structural Elements +# ===================== + +class section(Structural, Element): pass + +class topic(Structural, Element): + + """ + Topics are terminal, "leaf" mini-sections, like block quotes with titles, + or textual figures. A topic is just like a section, except that it has no + subsections, and it doesn't have to conform to section placement rules. + + Topics are allowed wherever body elements (list, table, etc.) are allowed, + but only at the top level of a section or document. Topics cannot nest + inside topics or body elements; you can't have a topic inside a table, + list, block quote, etc. + """ + + pass + + +class transition(Structural, Element): pass + + +# =============== +# Body Elements +# =============== + +class paragraph(General, TextElement): pass +class bullet_list(Sequential, Element): pass +class enumerated_list(Sequential, Element): pass +class list_item(Component, Element): pass +class definition_list(Sequential, Element): pass +class definition_list_item(Component, Element): pass +class term(Component, TextElement): pass +class classifier(Component, TextElement): pass +class definition(Component, Element): pass +class field_list(Sequential, Element): pass +class field(Component, Element): pass +class field_name(Component, TextElement): pass +class field_argument(Component, TextElement): pass +class field_body(Component, Element): pass + + +class option(Component, Element): + + child_text_separator = '' + + +class option_argument(Component, TextElement): + + def astext(self): + return self.get('delimiter', ' ') + TextElement.astext(self) + + +class option_group(Component, Element): + + child_text_separator = ', ' + + +class option_list(Sequential, Element): pass + + +class option_list_item(Component, Element): + + child_text_separator = ' ' + + +class option_string(Component, TextElement): pass +class description(Component, Element): pass +class literal_block(General, TextElement): pass +class block_quote(General, Element): pass +class doctest_block(General, TextElement): pass +class attention(Admonition, Element): pass +class caution(Admonition, Element): pass +class danger(Admonition, Element): pass +class error(Admonition, Element): pass +class important(Admonition, Element): pass +class note(Admonition, Element): pass +class tip(Admonition, Element): pass +class hint(Admonition, Element): pass +class warning(Admonition, Element): pass +class comment(Special, PreBibliographic, TextElement): pass +class substitution_definition(Special, TextElement): pass +class target(Special, Inline, TextElement, Targetable): pass +class footnote(General, Element, BackLinkable): pass +class citation(General, Element, BackLinkable): pass +class label(Component, TextElement): pass +class figure(General, Element): pass +class caption(Component, TextElement): pass +class legend(Component, Element): pass +class table(General, Element): pass +class tgroup(Component, Element): pass +class colspec(Component, Element): pass +class thead(Component, Element): pass +class tbody(Component, Element): pass +class row(Component, Element): pass +class entry(Component, Element): pass + + +class system_message(Special, PreBibliographic, Element, BackLinkable): + + def __init__(self, comment=None, *children, **attributes): + if comment: + p = paragraph('', comment) + children = (p,) + children + Element.__init__(self, '', *children, **attributes) + + def astext(self): + return '%s (%s) %s' % (self['type'], self['level'], + Element.astext(self)) + + +class pending(Special, PreBibliographic, Element): + + """ + The "pending" element is used to encapsulate a pending operation: the + operation, the point at which to apply it, and any data it requires. Only + the pending operation's location within the document is stored in the + public document tree; the operation itself and its data are stored in + internal instance attributes. + + For example, say you want a table of contents in your reStructuredText + document. The easiest way to specify where to put it is from within the + document, with a directive:: + + .. contents:: + + But the "contents" directive can't do its work until the entire document + has been parsed (and possibly transformed to some extent). So the + directive code leaves a placeholder behind that will trigger the second + phase of the its processing, something like this:: + + <pending ...public attributes...> + internal attributes + + The "pending" node is also appended to `document.pending`, so that a later + stage of processing can easily run all pending transforms. + """ + + def __init__(self, transform, stage, details, + rawsource='', *children, **attributes): + Element.__init__(self, rawsource, *children, **attributes) + + self.transform = transform + """The `docutils.transforms.Transform` class implementing the pending + operation.""" + + self.stage = stage + """The stage of processing when the function will be called.""" + + self.details = details + """Detail data (dictionary) required by the pending operation.""" + + def pformat(self, indent=' ', level=0): + internals = [ + '.. internal attributes:', + ' .transform: %s.%s' % (self.transform.__module__, + self.transform.__name__), + ' .stage: %r' % self.stage, + ' .details:'] + details = self.details.items() + details.sort() + for key, value in details: + if isinstance(value, Node): + internals.append('%7s%s:' % ('', key)) + internals.extend(['%9s%s' % ('', line) + for line in value.pformat().splitlines()]) + else: + internals.append('%7s%s: %r' % ('', key, value)) + return (Element.pformat(self, indent, level) + + ''.join([(' %s%s\n' % (indent * level, line)) + for line in internals])) + + +class raw(Special, Inline, PreBibliographic, TextElement): + + """ + Raw data that is to be passed untouched to the Writer. + """ + + pass + + +# ================= +# Inline Elements +# ================= + +class emphasis(Inline, TextElement): pass +class strong(Inline, TextElement): pass +class interpreted(Inline, Referential, TextElement): pass +class literal(Inline, TextElement): pass +class reference(Inline, Referential, TextElement): pass +class footnote_reference(Inline, Referential, TextElement): pass +class citation_reference(Inline, Referential, TextElement): pass +class substitution_reference(Inline, TextElement): pass + + +class image(General, Inline, TextElement): + + def astext(self): + return self.get('alt', '') + + +class problematic(Inline, TextElement): pass + + +# ======================================== +# Auxiliary Classes, Functions, and Data +# ======================================== + +node_class_names = """ + Text + attention author authors + block_quote bullet_list + caption caution citation citation_reference classifier colspec + comment contact copyright + danger date definition definition_list definition_list_item + description docinfo doctest_block document + emphasis entry enumerated_list error + field field_argument field_body field_list field_name figure + footnote footnote_reference + hint + image important interpreted + label legend list_item literal literal_block + note + option option_argument option_group option_list option_list_item + option_string organization + paragraph pending problematic + raw reference revision row + section status strong substitution_definition + substitution_reference subtitle system_message + table target tbody term tgroup thead tip title topic transition + version + warning""".split() +"""A list of names of all concrete Node subclasses.""" + + +class NodeVisitor: + + """ + "Visitor" pattern [GoF95]_ abstract superclass implementation for document + tree traversals. + + Each node class has corresponding methods, doing nothing by default; + override individual methods for specific and useful behaviour. The + "``visit_`` + node class name" method is called by `Node.walk()` upon + entering a node. `Node.walkabout()` also calls the "``depart_`` + node + class name" method before exiting a node. + + .. [GoF95] Gamma, Helm, Johnson, Vlissides. *Design Patterns: Elements of + Reusable Object-Oriented Software*. Addison-Wesley, Reading, MA, USA, + 1995. + """ + + def __init__(self, doctree): + self.doctree = doctree + + def unknown_visit(self, node): + """ + Called when entering unknown `Node` types. + + Raise an exception unless overridden. + """ + raise NotImplementedError('visiting unknown node type: %s' + % node.__class__.__name__) + + def unknown_departure(self, node): + """ + Called before exiting unknown `Node` types. + + Raise exception unless overridden. + """ + raise NotImplementedError('departing unknown node type: %s' + % node.__class__.__name__) + + # Save typing with dynamic definitions. + for name in node_class_names: + exec """def visit_%s(self, node): pass\n""" % name + exec """def depart_%s(self, node): pass\n""" % name + del name + + +class GenericNodeVisitor(NodeVisitor): + + """ + Generic "Visitor" abstract superclass, for simple traversals. + + Unless overridden, each ``visit_...`` method calls `default_visit()`, and + each ``depart_...`` method (when using `Node.walkabout()`) calls + `default_departure()`. `default_visit()` (`default_departure()`) must be + overridden in subclasses. + + Define fully generic visitors by overriding `default_visit()` + (`default_departure()`) only. Define semi-generic visitors by overriding + individual ``visit_...()`` (``depart_...()``) methods also. + + `NodeVisitor.unknown_visit()` (`NodeVisitor.unknown_departure()`) should + be overridden for default behavior. + """ + + def default_visit(self, node): + """Override for generic, uniform traversals.""" + raise NotImplementedError + + def default_departure(self, node): + """Override for generic, uniform traversals.""" + raise NotImplementedError + + # Save typing with dynamic definitions. + for name in node_class_names: + exec """def visit_%s(self, node): + self.default_visit(node)\n""" % name + exec """def depart_%s(self, node): + self.default_departure(node)\n""" % name + del name + + +class VisitorException(Exception): pass +class SkipChildren(VisitorException): pass +class SkipSiblings(VisitorException): pass +class SkipNode(VisitorException): pass diff --git a/docutils/parsers/__init__.py b/docutils/parsers/__init__.py new file mode 100644 index 000000000..72e2e4e49 --- /dev/null +++ b/docutils/parsers/__init__.py @@ -0,0 +1,37 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. +""" + +__docformat__ = 'reStructuredText' + + +class Parser: + + def parse(self, inputstring, docroot): + """Override to parse `inputstring` into document tree `docroot`.""" + raise NotImplementedError('subclass must override this method') + + def setup_parse(self, inputstring, docroot): + """Initial setup, used by `parse()`.""" + self.inputstring = inputstring + self.docroot = docroot + + +_parser_aliases = { + 'restructuredtext': 'rst', + 'rest': 'rst', + 'rtxt': 'rst',} + +def get_parser_class(parsername): + """Return the Parser class from the `parsername` module.""" + parsername = parsername.lower() + if _parser_aliases.has_key(parsername): + parsername = _parser_aliases[parsername] + module = __import__(parsername, globals(), locals()) + return module.Parser diff --git a/docutils/parsers/rst/__init__.py b/docutils/parsers/rst/__init__.py new file mode 100644 index 000000000..06589513b --- /dev/null +++ b/docutils/parsers/rst/__init__.py @@ -0,0 +1,68 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +This is ``the docutils.parsers.restructuredtext`` package. It exports a single +class, `Parser`. + +Usage +===== + +1. Create a parser:: + + parser = docutils.parsers.restructuredtext.Parser() + + Several optional arguments may be passed to modify the parser's behavior. + Please see `docutils.parsers.Parser` for details. + +2. Gather input (a multi-line string), by reading a file or the standard + input:: + + input = sys.stdin.read() + +3. Create a new empty `docutils.nodes.document` tree:: + + docroot = docutils.utils.newdocument() + + See `docutils.utils.newdocument()` for parameter details. + +4. Run the parser, populating the document tree:: + + document = parser.parse(input, docroot) + +Parser Overview +=============== + +The reStructuredText parser is implemented as a state machine, examining its +input one line at a time. To understand how the parser works, please first +become familiar with the `docutils.statemachine` module, then see the +`states` module. +""" + +__docformat__ = 'reStructuredText' + + +import docutils.parsers +import docutils.statemachine +import states + + +class Parser(docutils.parsers.Parser): + + """The reStructuredText parser.""" + + def parse(self, inputstring, docroot): + """Parse `inputstring` and populate `docroot`, a document tree.""" + self.setup_parse(inputstring, docroot) + debug = docroot.reporter[''].debug + self.statemachine = states.RSTStateMachine( + stateclasses=states.stateclasses, initialstate='Body', + debug=debug) + inputlines = docutils.statemachine.string2lines( + inputstring, convertwhitespace=1) + self.statemachine.run(inputlines, docroot) diff --git a/docutils/parsers/rst/directives/__init__.py b/docutils/parsers/rst/directives/__init__.py new file mode 100644 index 000000000..43b0c1dd3 --- /dev/null +++ b/docutils/parsers/rst/directives/__init__.py @@ -0,0 +1,88 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +This package contains directive implementation modules. + +The interface for directive functions is as follows:: + + def directivefn(match, type, data, state, statemachine, attributes) + +Where: + +- ``match`` is a regular expression match object which matched the first line + of the directive. ``match.group(1)`` gives the directive name. +- ``type`` is the directive type or name. +- ``data`` contains the remainder of the first line of the directive after the + "::". +- ``state`` is the state which called the directive function. +- ``statemachine`` is the state machine which controls the state which called + the directive function. +- ``attributes`` is a dictionary of extra attributes which may be added to the + element the directive produces. Currently, only an "alt" attribute is passed + by substitution definitions (value: the substitution name), which may by + used by an embedded image directive. +""" + +__docformat__ = 'reStructuredText' + + +_directive_registry = { + 'attention': ('admonitions', 'attention'), + 'caution': ('admonitions', 'caution'), + 'danger': ('admonitions', 'danger'), + 'error': ('admonitions', 'error'), + 'important': ('admonitions', 'important'), + 'note': ('admonitions', 'note'), + 'tip': ('admonitions', 'tip'), + 'hint': ('admonitions', 'hint'), + 'warning': ('admonitions', 'warning'), + 'image': ('images', 'image'), + 'figure': ('images', 'figure'), + 'contents': ('components', 'contents'), + 'footnotes': ('components', 'footnotes'), + 'citations': ('components', 'citations'), + 'topic': ('components', 'topic'), + 'meta': ('html', 'meta'), + 'imagemap': ('html', 'imagemap'), + 'raw': ('misc', 'raw'), + 'restructuredtext-test-directive': ('misc', 'directive_test_function'),} +"""Mapping of directive name to (module name, function name). The directive +'name' is canonical & must be lowercase; language-dependent names are defined +in the language package.""" + +_modules = {} +"""Cache of imported directive modules.""" + +_directives = {} +"""Cache of imported directive functions.""" + +def directive(directivename, languagemodule): + """ + Locate and return a directive function from its language-dependent name. + """ + normname = directivename.lower() + if _directives.has_key(normname): + return _directives[normname] + try: + canonicalname = languagemodule.directives[normname] + modulename, functionname = _directive_registry[canonicalname] + except KeyError: + return None + if _modules.has_key(modulename): + module = _modules[modulename] + else: + try: + module = __import__(modulename, globals(), locals()) + except ImportError: + return None + try: + function = getattr(module, functionname) + except AttributeError: + return None + return function diff --git a/docutils/parsers/rst/directives/admonitions.py b/docutils/parsers/rst/directives/admonitions.py new file mode 100644 index 000000000..f594cd431 --- /dev/null +++ b/docutils/parsers/rst/directives/admonitions.py @@ -0,0 +1,55 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Admonition directives. +""" + +__docformat__ = 'reStructuredText' + + +from docutils.parsers.rst import states +from docutils import nodes + + +def admonition(nodeclass, match, typename, data, state, statemachine, + attributes): + indented, indent, lineoffset, blankfinish \ + = statemachine.getfirstknownindented(match.end()) + text = '\n'.join(indented) + admonitionnode = nodeclass(text) + if text: + state.nestedparse(indented, lineoffset, admonitionnode) + return [admonitionnode], blankfinish + +def attention(*args, **kwargs): + return admonition(nodes.attention, *args, **kwargs) + +def caution(*args, **kwargs): + return admonition(nodes.caution, *args, **kwargs) + +def danger(*args, **kwargs): + return admonition(nodes.danger, *args, **kwargs) + +def error(*args, **kwargs): + return admonition(nodes.error, *args, **kwargs) + +def important(*args, **kwargs): + return admonition(nodes.important, *args, **kwargs) + +def note(*args, **kwargs): + return admonition(nodes.note, *args, **kwargs) + +def tip(*args, **kwargs): + return admonition(nodes.tip, *args, **kwargs) + +def hint(*args, **kwargs): + return admonition(nodes.hint, *args, **kwargs) + +def warning(*args, **kwargs): + return admonition(nodes.warning, *args, **kwargs) diff --git a/docutils/parsers/rst/directives/components.py b/docutils/parsers/rst/directives/components.py new file mode 100644 index 000000000..8463f41b0 --- /dev/null +++ b/docutils/parsers/rst/directives/components.py @@ -0,0 +1,59 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Document component directives. +""" + +__docformat__ = 'reStructuredText' + + +from docutils import nodes +import docutils.transforms.components + + +contents_attribute_spec = {'depth': int, + 'local': (lambda x: x)} + +def contents(match, typename, data, state, statemachine, attributes): + lineno = statemachine.abslineno() + lineoffset = statemachine.lineoffset + datablock, indent, offset, blankfinish = \ + statemachine.getfirstknownindented(match.end(), uptoblank=1) + blocktext = '\n'.join(statemachine.inputlines[ + lineoffset : lineoffset + len(datablock) + 1]) + for i in range(len(datablock)): + if datablock[i][:1] == ':': + attlines = datablock[i:] + datablock = datablock[:i] + break + else: + attlines = [] + i = 0 + titletext = ' '.join([line.strip() for line in datablock]) + if titletext: + textnodes, messages = state.inline_text(titletext, lineno) + title = nodes.title(titletext, '', *textnodes) + else: + messages = [] + title = None + pending = nodes.pending(docutils.transforms.components.Contents, + 'last_reader', {'title': title}, blocktext) + if attlines: + success, data, blankfinish = state.parse_extension_attributes( + contents_attribute_spec, attlines, blankfinish) + if success: # data is a dict of attributes + pending.details.update(data) + else: # data is an error string + error = statemachine.memo.reporter.error( + 'Error in "%s" directive attributes at line %s:\n%s.' + % (match.group(1), lineno, data), '', + nodes.literal_block(blocktext, blocktext)) + return [error] + messages, blankfinish + statemachine.memo.document.note_pending(pending) + return [pending] + messages, blankfinish diff --git a/docutils/parsers/rst/directives/html.py b/docutils/parsers/rst/directives/html.py new file mode 100644 index 000000000..d971300e0 --- /dev/null +++ b/docutils/parsers/rst/directives/html.py @@ -0,0 +1,89 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Directives for typically HTML-specific constructs. +""" + +__docformat__ = 'reStructuredText' + + +from docutils import nodes, utils +from docutils.parsers.rst import states + + +def meta(match, typename, data, state, statemachine, attributes): + lineoffset = statemachine.lineoffset + block, indent, offset, blankfinish = \ + statemachine.getfirstknownindented(match.end(), uptoblank=1) + node = nodes.Element() + if block: + newlineoffset, blankfinish = state.nestedlistparse( + block, offset, node, initialstate='MetaBody', + blankfinish=blankfinish, statemachinekwargs=metaSMkwargs) + if (newlineoffset - offset) != len(block): # incomplete parse of block? + blocktext = '\n'.join(statemachine.inputlines[ + lineoffset : statemachine.lineoffset+1]) + msg = statemachine.memo.reporter.error( + 'Invalid meta directive at line %s.' + % statemachine.abslineno(), '', + nodes.literal_block(blocktext, blocktext)) + node += msg + else: + msg = statemachine.memo.reporter.error( + 'Empty meta directive at line %s.' % statemachine.abslineno()) + node += msg + return node.getchildren(), blankfinish + +def imagemap(match, typename, data, state, statemachine, attributes): + return [], 0 + + +class MetaBody(states.SpecializedBody): + + class meta(nodes.Special, nodes.PreBibliographic, nodes.Element): + """HTML-specific "meta" element.""" + pass + + def field_marker(self, match, context, nextstate): + """Meta element.""" + node, blankfinish = self.parsemeta(match) + self.statemachine.node += node + return [], nextstate, [] + + def parsemeta(self, match): + name, args = self.parse_field_marker(match) + indented, indent, lineoffset, blankfinish = \ + self.statemachine.getfirstknownindented(match.end()) + node = self.meta() + node['content'] = ' '.join(indented) + if not indented: + line = self.statemachine.line + msg = self.statemachine.memo.reporter.info( + 'No content for meta tag "%s".' % name, '', + nodes.literal_block(line, line)) + self.statemachine.node += msg + try: + attname, val = utils.extract_name_value(name)[0] + node[attname.lower()] = val + except utils.NameValueError: + node['name'] = name + for arg in args: + try: + attname, val = utils.extract_name_value(arg)[0] + node[attname.lower()] = val + except utils.NameValueError, detail: + line = self.statemachine.line + msg = self.statemachine.memo.reporter.error( + 'Error parsing meta tag attribute "%s": %s' + % (arg, detail), '', nodes.literal_block(line, line)) + self.statemachine.node += msg + return node, blankfinish + + +metaSMkwargs = {'stateclasses': (MetaBody,)} diff --git a/docutils/parsers/rst/directives/images.py b/docutils/parsers/rst/directives/images.py new file mode 100644 index 000000000..7a719333b --- /dev/null +++ b/docutils/parsers/rst/directives/images.py @@ -0,0 +1,97 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Directives for figures and simple images. +""" + +__docformat__ = 'reStructuredText' + + +import sys +from docutils.parsers.rst import states +from docutils import nodes, utils + +def unchanged(arg): + return arg # unchanged! + +image_attribute_spec = {'alt': unchanged, + 'height': int, + 'width': int, + 'scale': int} + +def image(match, typename, data, state, statemachine, attributes): + lineno = statemachine.abslineno() + lineoffset = statemachine.lineoffset + datablock, indent, offset, blankfinish = \ + statemachine.getfirstknownindented(match.end(), uptoblank=1) + blocktext = '\n'.join(statemachine.inputlines[ + lineoffset : lineoffset + len(datablock) + 1]) + for i in range(len(datablock)): + if datablock[i][:1] == ':': + attlines = datablock[i:] + datablock = datablock[:i] + break + else: + attlines = [] + if not datablock: + error = statemachine.memo.reporter.error( + 'Missing image URI argument at line %s.' % lineno, '', + nodes.literal_block(blocktext, blocktext)) + return [error], blankfinish + attoffset = lineoffset + i + reference = ''.join([line.strip() for line in datablock]) + if reference.find(' ') != -1: + error = statemachine.memo.reporter.error( + 'Image URI at line %s contains whitespace.' % lineno, '', + nodes.literal_block(blocktext, blocktext)) + return [error], blankfinish + if attlines: + success, data, blankfinish = state.parse_extension_attributes( + image_attribute_spec, attlines, blankfinish) + if success: # data is a dict of attributes + attributes.update(data) + else: # data is an error string + error = statemachine.memo.reporter.error( + 'Error in "%s" directive attributes at line %s:\n%s.' + % (match.group(1), lineno, data), '', + nodes.literal_block(blocktext, blocktext)) + return [error], blankfinish + attributes['uri'] = reference + imagenode = nodes.image(blocktext, **attributes) + return [imagenode], blankfinish + +def figure(match, typename, data, state, statemachine, attributes): + lineoffset = statemachine.lineoffset + (imagenode,), blankfinish = image(match, typename, data, state, + statemachine, attributes) + indented, indent, offset, blankfinish \ + = statemachine.getfirstknownindented(sys.maxint) + blocktext = '\n'.join(statemachine.inputlines[lineoffset: + statemachine.lineoffset+1]) + if isinstance(imagenode, nodes.system_message): + if indented: + imagenode[-1] = nodes.literal_block(blocktext, blocktext) + return [imagenode], blankfinish + figurenode = nodes.figure('', imagenode) + if indented: + node = nodes.Element() # anonymous container for parsing + state.nestedparse(indented, lineoffset, node) + firstnode = node[0] + if isinstance(firstnode, nodes.paragraph): + caption = nodes.caption(firstnode.rawsource, '', + *firstnode.children) + figurenode += caption + elif not (isinstance(firstnode, nodes.comment) and len(firstnode) == 0): + error = statemachine.memo.reporter.error( + 'Figure caption must be a paragraph or empty comment.', '', + nodes.literal_block(blocktext, blocktext)) + return [figurenode, error], blankfinish + if len(node) > 1: + figurenode += nodes.legend('', *node[1:]) + return [figurenode], blankfinish diff --git a/docutils/parsers/rst/directives/misc.py b/docutils/parsers/rst/directives/misc.py new file mode 100644 index 000000000..f8a9d5217 --- /dev/null +++ b/docutils/parsers/rst/directives/misc.py @@ -0,0 +1,39 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Miscellaneous directives. +""" + +__docformat__ = 'reStructuredText' + + +from docutils import nodes + + +def raw(match, typename, data, state, statemachine, attributes): + return [], 1 + +def directive_test_function(match, typename, data, state, statemachine, + attributes): + try: + statemachine.nextline() + indented, indent, offset, blankfinish = statemachine.getindented() + text = '\n'.join(indented) + except IndexError: + text = '' + blankfinish = 1 + if text: + info = statemachine.memo.reporter.info( + 'Directive processed. Type="%s", data="%s", directive block:' + % (typename, data), '', nodes.literal_block(text, text)) + else: + info = statemachine.memo.reporter.info( + 'Directive processed. Type="%s", data="%s", directive block: None' + % (typename, data)) + return [info], blankfinish diff --git a/docutils/parsers/rst/languages/__init__.py b/docutils/parsers/rst/languages/__init__.py new file mode 100644 index 000000000..ee36d1148 --- /dev/null +++ b/docutils/parsers/rst/languages/__init__.py @@ -0,0 +1,23 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +This package contains modules for language-dependent features of +reStructuredText. +""" + +__docformat__ = 'reStructuredText' + +_languages = {} + +def getlanguage(languagecode): + if _languages.has_key(languagecode): + return _languages[languagecode] + module = __import__(languagecode, globals(), locals()) + _languages[languagecode] = module + return module diff --git a/docutils/parsers/rst/languages/en.py b/docutils/parsers/rst/languages/en.py new file mode 100644 index 000000000..2b1c52649 --- /dev/null +++ b/docutils/parsers/rst/languages/en.py @@ -0,0 +1,38 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +English-language mappings for language-dependent features of +reStructuredText. +""" + +__docformat__ = 'reStructuredText' + + +directives = { + 'attention': 'attention', + 'caution': 'caution', + 'danger': 'danger', + 'error': 'error', + 'hint': 'hint', + 'important': 'important', + 'note': 'note', + 'tip': 'tip', + 'warning': 'warning', + 'image': 'image', + 'figure': 'figure', + 'contents': 'contents', + 'footnotes': 'footnotes', + 'citations': 'citations', + 'topic': 'topic', + 'meta': 'meta', + 'imagemap': 'imagemap', + 'raw': 'raw', + 'restructuredtext-test-directive': 'restructuredtext-test-directive'} +"""English name to registered (in directives/__init__.py) directive name +mapping.""" diff --git a/docutils/parsers/rst/states.py b/docutils/parsers/rst/states.py new file mode 100644 index 000000000..b2dbf9b3e --- /dev/null +++ b/docutils/parsers/rst/states.py @@ -0,0 +1,2115 @@ +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +This is the ``docutils.parsers.restructuredtext.states`` module, the core of +the reStructuredText parser. It defines the following: + +:Classes: + - `RSTStateMachine`: reStructuredText parser's entry point. + - `NestedStateMachine`: recursive StateMachine. + - `RSTState`: reStructuredText State superclass. + - `Body`: Generic classifier of the first line of a block. + - `BulletList`: Second and subsequent bullet_list list_items + - `DefinitionList`: Second and subsequent definition_list_items. + - `EnumeratedList`: Second and subsequent enumerated_list list_items. + - `FieldList`: Second and subsequent fields. + - `OptionList`: Second and subsequent option_list_items. + - `Explicit`: Second and subsequent explicit markup constructs. + - `SubstitutionDef`: For embedded directives in substitution definitions. + - `Text`: Classifier of second line of a text block. + - `Definition`: Second line of potential definition_list_item. + - `Line`: Second line of overlined section title or transition marker. + - `Stuff`: An auxilliary collection class. + +:Exception classes: + - `MarkupError` + - `ParserError` + - `TransformationError` + +:Functions: + - `escape2null()`: Return a string, escape-backslashes converted to nulls. + - `unescape()`: Return a string, nulls removed or restored to backslashes. + - `normname()`: Return a case- and whitespace-normalized name. + +:Attributes: + - `stateclasses`: set of State classes used with `RSTStateMachine`. + +Parser Overview +=============== + +The reStructuredText parser is implemented as a state machine, examining its +input one line at a time. To understand how the parser works, please first +become familiar with the `docutils.statemachine` module. In the description +below, references are made to classes defined in this module; please see the +individual classes for details. + +Parsing proceeds as follows: + +1. The state machine examines each line of input, checking each of the + transition patterns of the state `Body`, in order, looking for a match. The + implicit transitions (blank lines and indentation) are checked before any + others. The 'text' transition is a catch-all (matches anything). + +2. The method associated with the matched transition pattern is called. + + A. Some transition methods are self-contained, appending elements to the + document tree ('doctest' parses a doctest block). The parser's current + line index is advanced to the end of the element, and parsing continues + with step 1. + + B. Others trigger the creation of a nested state machine, whose job is to + parse a compound construct ('indent' does a block quote, 'bullet' does a + bullet list, 'overline' does a section [first checking for a valid + section header]). + + - In the case of lists and explicit markup, a new state machine is + created and run to parse the first item. + + - A new state machine is created and its initial state is set to the + appropriate specialized state (`BulletList` in the case of the + 'bullet' transition). This state machine is run to parse the compound + element (or series of explicit markup elements), and returns as soon + as a non-member element is encountered. For example, the `BulletList` + state machine aborts as soon as it encounters an element which is not + a list item of that bullet list. The optional omission of + inter-element blank lines is handled by the nested state machine. + + - The current line index is advanced to the end of the elements parsed, + and parsing continues with step 1. + + C. The result of the 'text' transition depends on the next line of text. + The current state is changed to `Text`, under which the second line is + examined. If the second line is: + + - Indented: The element is a definition list item, and parsing proceeds + similarly to step 2.B, using the `DefinitionList` state. + + - A line of uniform punctuation characters: The element is a section + header; again, parsing proceeds as in step 2.B, and `Body` is still + used. + + - Anything else: The element is a paragraph, which is examined for + inline markup and appended to the parent element. Processing continues + with step 1. +""" + +__docformat__ = 'reStructuredText' + + +import sys, re, string +from docutils import nodes, statemachine, utils, roman, urischemes +from docutils.statemachine import StateMachineWS, StateWS +from docutils.utils import normname +import directives, languages +from tableparser import TableParser, TableMarkupError + + +class MarkupError(Exception): pass +class ParserError(Exception): pass + + +class Stuff: + + """Stores a bunch of stuff for dotted-attribute access.""" + + def __init__(self, **keywordargs): + self.__dict__.update(keywordargs) + + +class RSTStateMachine(StateMachineWS): + + """ + reStructuredText's master StateMachine. + + The entry point to reStructuredText parsing is the `run()` method. + """ + + def run(self, inputlines, docroot, inputoffset=0, matchtitles=1): + """ + Parse `inputlines` and return a `docutils.nodes.document` instance. + + Extend `StateMachineWS.run()`: set up parse-global data, run the + StateMachine, and return the resulting + document. + """ + self.language = languages.getlanguage(docroot.languagecode) + self.matchtitles = matchtitles + self.memo = Stuff(document=docroot, + reporter=docroot.reporter, + language=self.language, + titlestyles=[], + sectionlevel=0) + self.node = docroot + results = StateMachineWS.run(self, inputlines, inputoffset) + assert results == [], 'RSTStateMachine.run() results should be empty.' + self.node = self.memo = None # remove unneeded references + + +class NestedStateMachine(StateMachineWS): + + """ + StateMachine run from within other StateMachine runs, to parse nested + document structures. + """ + + def run(self, inputlines, inputoffset, memo, node, matchtitles=1): + """ + Parse `inputlines` and populate a `docutils.nodes.document` instance. + + Extend `StateMachineWS.run()`: set up document-wide data. + """ + self.matchtitles = matchtitles + self.memo = memo + self.node = node + results = StateMachineWS.run(self, inputlines, inputoffset) + assert results == [], 'NestedStateMachine.run() results should be empty' + return results + + +class RSTState(StateWS): + + """ + reStructuredText State superclass. + + Contains methods used by all State subclasses. + """ + + nestedSM = NestedStateMachine + + def __init__(self, statemachine, debug=0): + self.nestedSMkwargs = {'stateclasses': stateclasses, + 'initialstate': 'Body'} + StateWS.__init__(self, statemachine, debug) + + def gotoline(self, abslineoffset): + """Jump to input line `abslineoffset`, ignoring jumps past the end.""" + try: + self.statemachine.gotoline(abslineoffset) + except IndexError: + pass + + def bof(self, context): + """Called at beginning of file.""" + return [], [] + + def nestedparse(self, block, inputoffset, node, matchtitles=0, + statemachineclass=None, statemachinekwargs=None): + """ + Create a new StateMachine rooted at `node` and run it over the input + `block`. + """ + if statemachineclass is None: + statemachineclass = self.nestedSM + if statemachinekwargs is None: + statemachinekwargs = self.nestedSMkwargs + statemachine = statemachineclass(debug=self.debug, **statemachinekwargs) + statemachine.run(block, inputoffset, memo=self.statemachine.memo, + node=node, matchtitles=matchtitles) + statemachine.unlink() + return statemachine.abslineoffset() + + def nestedlistparse(self, block, inputoffset, node, initialstate, + blankfinish, blankfinishstate=None, extrasettings={}, + matchtitles=0, statemachineclass=None, + statemachinekwargs=None): + """ + Create a new StateMachine rooted at `node` and run it over the input + `block`. Also keep track of optional intermdediate blank lines and the + required final one. + """ + if statemachineclass is None: + statemachineclass = self.nestedSM + if statemachinekwargs is None: + statemachinekwargs = self.nestedSMkwargs.copy() + statemachinekwargs['initialstate'] = initialstate + statemachine = statemachineclass(debug=self.debug, **statemachinekwargs) + if blankfinishstate is None: + blankfinishstate = initialstate + statemachine.states[blankfinishstate].blankfinish = blankfinish + for key, value in extrasettings.items(): + setattr(statemachine.states[initialstate], key, value) + statemachine.run(block, inputoffset, memo=self.statemachine.memo, + node=node, matchtitles=matchtitles) + blankfinish = statemachine.states[blankfinishstate].blankfinish + statemachine.unlink() + return statemachine.abslineoffset(), blankfinish + + def section(self, title, source, style, lineno): + """ + When a new section is reached that isn't a subsection of the current + section, back up the line count (use previousline(-x)), then raise + EOFError. The current StateMachine will finish, then the calling + StateMachine can re-examine the title. This will work its way back up + the calling chain until the correct section level isreached. + + Alternative: Evaluate the title, store the title info & level, and + back up the chain until that level is reached. Store in memo? Or + return in results? + """ + if self.checksubsection(source, style, lineno): + self.newsubsection(title, lineno) + + def checksubsection(self, source, style, lineno): + """ + Check for a valid subsection header. Return 1 (true) or None (false). + + :Exception: `EOFError` when a sibling or supersection encountered. + """ + memo = self.statemachine.memo + titlestyles = memo.titlestyles + mylevel = memo.sectionlevel + try: # check for existing title style + level = titlestyles.index(style) + 1 + except ValueError: # new title style + if len(titlestyles) == memo.sectionlevel: # new subsection + titlestyles.append(style) + return 1 + else: # not at lowest level + self.statemachine.node += self.titleinconsistent(source, lineno) + return None + if level <= mylevel: # sibling or supersection + memo.sectionlevel = level # bubble up to parent section + # back up 2 lines for underline title, 3 for overline title + self.statemachine.previousline(len(style) + 1) + raise EOFError # let parent section re-evaluate + if level == mylevel + 1: # immediate subsection + return 1 + else: # invalid subsection + self.statemachine.node += self.titleinconsistent(source, lineno) + return None + + def titleinconsistent(self, sourcetext, lineno): + literalblock = nodes.literal_block('', sourcetext) + error = self.statemachine.memo.reporter.severe( + 'Title level inconsistent at line %s:' % lineno, '', literalblock) + return error + + def newsubsection(self, title, lineno): + """Append new subsection to document tree. On return, check level.""" + memo = self.statemachine.memo + mylevel = memo.sectionlevel + memo.sectionlevel += 1 + sectionnode = nodes.section() + self.statemachine.node += sectionnode + textnodes, messages = self.inline_text(title, lineno) + titlenode = nodes.title(title, '', *textnodes) + name = normname(titlenode.astext()) + sectionnode['name'] = name + sectionnode += titlenode + sectionnode += messages + memo.document.note_implicit_target(sectionnode, sectionnode) + offset = self.statemachine.lineoffset + 1 + absoffset = self.statemachine.abslineoffset() + 1 + newabsoffset = self.nestedparse( + self.statemachine.inputlines[offset:], inputoffset=absoffset, + node=sectionnode, matchtitles=1) + self.gotoline(newabsoffset) + if memo.sectionlevel <= mylevel: # can't handle next section? + raise EOFError # bubble up to supersection + # reset sectionlevel; next pass will detect it properly + memo.sectionlevel = mylevel + + def paragraph(self, lines, lineno): + """ + Return a list (paragraph & messages) and a boolean: literal_block next? + """ + data = '\n'.join(lines).rstrip() + if data[-2:] == '::': + if len(data) == 2: + return [], 1 + elif data[-3] == ' ': + text = data[:-3].rstrip() + else: + text = data[:-1] + literalnext = 1 + else: + text = data + literalnext = 0 + textnodes, messages = self.inline_text(text, lineno) + p = nodes.paragraph(data, '', *textnodes) + return [p] + messages, literalnext + + inline = Stuff() + """Patterns and constants used for inline markup recognition.""" + + inline.openers = '\'"([{<' + inline.closers = '\'")]}>' + inline.start_string_prefix = (r'(?:(?<=^)|(?<=[ \n%s]))' + % re.escape(inline.openers)) + inline.end_string_suffix = (r'(?:(?=$)|(?=[- \n.,:;!?%s]))' + % re.escape(inline.closers)) + inline.non_whitespace_before = r'(?<![ \n])' + inline.non_whitespace_escape_before = r'(?<![ \n\x00])' + inline.non_whitespace_after = r'(?![ \n])' + inline.simplename = r'[a-zA-Z0-9](?:[-_.a-zA-Z0-9]*[a-zA-Z0-9])?' + inline.uric = r"""[-_.!~*'()[\];/:@&=+$,%a-zA-Z0-9]""" + inline.urilast = r"""[_~/\]a-zA-Z0-9]""" + inline.emailc = r"""[-_!~*'{|}/#?^`&=+$%a-zA-Z0-9]""" + inline.identity = string.maketrans('', '') + inline.null2backslash = string.maketrans('\x00', '\\') + inline.patterns = Stuff( + initial=re.compile(r""" + %s # start-string prefix + ( + ( # start-strings only (group 2): + \*\* # strong + | + \* # emphasis + (?!\*) # but not strong + | + `` # literal + | + _` # inline hyperlink target + | + \| # substitution_reference start + ) + %s # no whitespace after + | # *OR* + ( # whole constructs (group 3): + (%s) # reference name (4) + (__?) # end-string (5) + | + \[ # footnote_reference or + # citation_reference start + ( # label (group 6): + [0-9]+ # manually numbered + | # *OR* + \#(?:%s)? # auto-numbered (w/ label?) + | # *OR* + \* # auto-symbol + | # *OR* + (%s) # citation reference (group 7) + ) + (\]_) # end-string (group 8) + ) + %s # end-string suffix + | # *OR* + ((?::%s:)?) # optional role (group 9) + ( # start-string (group 10) + ` # interpreted text + # or phrase reference + (?!`) # but not literal + ) + %s # no whitespace after + ) + """ % (inline.start_string_prefix, + inline.non_whitespace_after, + inline.simplename, + inline.simplename, + inline.simplename, + inline.end_string_suffix, + inline.simplename, + inline.non_whitespace_after,), + re.VERBOSE), + emphasis=re.compile(inline.non_whitespace_escape_before + + r'(\*)' + inline.end_string_suffix), + strong=re.compile(inline.non_whitespace_escape_before + + r'(\*\*)' + inline.end_string_suffix), + interpreted_or_phrase_ref=re.compile( + '%s(`(:%s:|__?)?)%s' % (inline.non_whitespace_escape_before, + inline.simplename, + inline.end_string_suffix)), + literal=re.compile(inline.non_whitespace_before + '(``)' + + inline.end_string_suffix), + target=re.compile(inline.non_whitespace_escape_before + + r'(`)' + inline.end_string_suffix), + substitution_ref=re.compile(inline.non_whitespace_escape_before + + r'(\|_{0,2})' + + inline.end_string_suffix), + uri=re.compile( + r""" + %s # start-string prefix + ( + ( # absolute URI (group 2) + ( # scheme (http, ftp, mailto) + [a-zA-Z][a-zA-Z0-9.+-]* # (group 3) + ) + : + (?: + (?: # either: + (?://?)? # hierarchical URI + %s* # URI characters + %s # final URI char + ) + (?: # optional query + \?%s* # URI characters + %s # final URI char + )? + (?: # optional fragment + \#%s* # URI characters + %s # final URI char + )? + ) + ) + | # *OR* + ( # email address (group 4) + %s+(?:\.%s+)* # name + @ # at + %s+(?:\.%s*)* # host + %s # final URI char + ) + ) + %s # end-string suffix + """ % (inline.start_string_prefix, + inline.uric, inline.urilast, + inline.uric, inline.urilast, + inline.uric, inline.urilast, + inline.emailc, inline.emailc, + inline.emailc, inline.emailc, + inline.urilast, + inline.end_string_suffix,), + re.VERBOSE)) + inline.groups = Stuff(initial=Stuff(start=2, whole=3, refname=4, refend=5, + footnotelabel=6, citationlabel=7, + fnend=8, role=9, backquote=10), + interpreted_or_phrase_ref=Stuff(suffix=2), + uri=Stuff(whole=1, absolute=2, scheme=3, email=4)) + + def quotedstart(self, match): + """Return 1 if inline markup start-string is 'quoted', 0 if not.""" + string = match.string + start = match.start() + end = match.end() + if start == 0: # start-string at beginning of text + return 0 + prestart = string[start - 1] + try: + poststart = string[end] + if self.inline.openers.index(prestart) \ + == self.inline.closers.index(poststart): # quoted + return 1 + except IndexError: # start-string at end of text + return 1 + except ValueError: # not quoted + pass + return 0 + + def inlineobj(self, match, lineno, pattern, nodeclass, + restorebackslashes=0): + string = match.string + matchstart = match.start(self.inline.groups.initial.start) + matchend = match.end(self.inline.groups.initial.start) + if self.quotedstart(match): + return (string[:matchend], [], string[matchend:], [], '') + endmatch = pattern.search(string[matchend:]) + if endmatch and endmatch.start(1): # 1 or more chars + text = unescape(endmatch.string[:endmatch.start(1)], + restorebackslashes) + rawsource = unescape(string[matchstart:matchend+endmatch.end(1)], 1) + return (string[:matchstart], [nodeclass(rawsource, text)], + string[matchend:][endmatch.end(1):], [], endmatch.group(1)) + msg = self.statemachine.memo.reporter.warning( + 'Inline %s start-string without end-string ' + 'at line %s.' % (nodeclass.__name__, lineno)) + text = unescape(string[matchstart:matchend], 1) + rawsource = unescape(string[matchstart:matchend], 1) + prb = self.problematic(text, rawsource, msg) + return string[:matchstart], [prb], string[matchend:], [msg], '' + + def problematic(self, text, rawsource, message): + msgid = self.statemachine.memo.document.set_id(message, + self.statemachine.node) + problematic = nodes.problematic(rawsource, text, refid=msgid) + prbid = self.statemachine.memo.document.set_id(problematic) + message.add_backref(prbid) + return problematic + + def emphasis(self, match, lineno): + before, inlines, remaining, sysmessages, endstring = self.inlineobj( + match, lineno, self.inline.patterns.emphasis, nodes.emphasis) + return before, inlines, remaining, sysmessages + + def strong(self, match, lineno): + before, inlines, remaining, sysmessages, endstring = self.inlineobj( + match, lineno, self.inline.patterns.strong, nodes.strong) + return before, inlines, remaining, sysmessages + + def interpreted_or_phrase_ref(self, match, lineno): + pattern = self.inline.patterns.interpreted_or_phrase_ref + rolegroup = self.inline.groups.initial.role + backquote = self.inline.groups.initial.backquote + string = match.string + matchstart = match.start(backquote) + matchend = match.end(backquote) + rolestart = match.start(rolegroup) + role = match.group(rolegroup) + position = '' + if role: + role = role[1:-1] + position = 'prefix' + elif self.quotedstart(match): + return (string[:matchend], [], string[matchend:], []) + endmatch = pattern.search(string[matchend:]) + if endmatch and endmatch.start(1): # 1 or more chars + escaped = endmatch.string[:endmatch.start(1)] + text = unescape(escaped, 0) + rawsource = unescape( + string[match.start():matchend+endmatch.end()], 1) + if rawsource[-1:] == '_': + if role: + msg = self.statemachine.memo.reporter.warning( + 'Mismatch: inline interpreted text start-string and ' + 'role with phrase-reference end-string at line %s.' + % lineno) + text = unescape(string[matchstart:matchend], 1) + rawsource = unescape(string[matchstart:matchend], 1) + prb = self.problematic(text, rawsource, msg) + return (string[:matchstart], [prb], string[matchend:], + [msg]) + return self.phrase_ref( + string[:matchstart], string[matchend:][endmatch.end():], + text, rawsource) + else: + return self.interpreted( + string[:rolestart], string[matchend:][endmatch.end():], + endmatch, role, position, lineno, + escaped, rawsource, text) + msg = self.statemachine.memo.reporter.warning( + 'Inline interpreted text or phrase reference start-string ' + 'without end-string at line %s.' % lineno) + text = unescape(string[matchstart:matchend], 1) + rawsource = unescape(string[matchstart:matchend], 1) + prb = self.problematic(text, rawsource, msg) + return string[:matchstart], [prb], string[matchend:], [msg] + + def phrase_ref(self, before, after, text, rawsource): + refname = normname(text) + reference = nodes.reference(rawsource, text) + if rawsource[-2:] == '__': + reference['anonymous'] = 1 + self.statemachine.memo.document.note_anonymous_ref(reference) + else: + reference['refname'] = refname + self.statemachine.memo.document.note_refname(reference) + return before, [reference], after, [] + + def interpreted(self, before, after, endmatch, role, position, lineno, + escaped, rawsource, text): + suffix = self.inline.groups.interpreted_or_phrase_ref.suffix + if endmatch.group(suffix): + if role: + msg = self.statemachine.memo.reporter.warning( + 'Multiple roles in interpreted text at line %s.' + % lineno) + return (before + rawsource, [], after, [msg]) + role = endmatch.group(suffix)[1:-1] + position = 'suffix' + if role: + atts = {'role': role, 'position': position} + else: + atts = {} + return before, [nodes.interpreted(rawsource, text, **atts)], after, [] + + def literal(self, match, lineno): + before, inlines, remaining, sysmessages, endstring = self.inlineobj( + match, lineno, self.inline.patterns.literal, nodes.literal, + restorebackslashes=1) + return before, inlines, remaining, sysmessages + + def inline_target(self, match, lineno): + before, inlines, remaining, sysmessages, endstring = self.inlineobj( + match, lineno, self.inline.patterns.target, nodes.target) + if inlines and isinstance(inlines[0], nodes.target): + assert len(inlines) == 1 + target = inlines[0] + name = normname(target.astext()) + target['name'] = name + self.statemachine.memo.document.note_explicit_target( + target, self.statemachine.node) + return before, inlines, remaining, sysmessages + + def substitution_reference(self, match, lineno): + before, inlines, remaining, sysmessages, endstring = self.inlineobj( + match, lineno, self.inline.patterns.substitution_ref, + nodes.substitution_reference) + if inlines: + assert len(inlines) == 1 + subrefnode = inlines[0] + assert isinstance(subrefnode, nodes.substitution_reference) + subreftext = subrefnode.astext() + refname = normname(subreftext) + subrefnode['refname'] = refname + self.statemachine.memo.document.note_substitution_ref(subrefnode) + if endstring[-1:] == '_': + referencenode = nodes.reference('|%s%s' % (subreftext, endstring), '') + if endstring[-2:] == '__': + referencenode['anonymous'] = 1 + self.statemachine.memo.document.note_anonymous_ref(referencenode) + else: + referencenode['refname'] = refname + self.statemachine.memo.document.note_refname(referencenode) + referencenode += subrefnode + inlines = [referencenode] + return before, inlines, remaining, sysmessages + + def footnote_reference(self, match, lineno): + """ + Handles `nodes.footnote_reference` and `nodes.citation_reference` + elements. + """ + label = match.group(self.inline.groups.initial.footnotelabel) + refname = normname(label) + if match.group(self.inline.groups.initial.citationlabel): + refnode = nodes.citation_reference('[%s]_' % label, refname=refname) + refnode += nodes.Text(label) + self.statemachine.memo.document.note_citation_ref(refnode) + else: + refnode = nodes.footnote_reference('[%s]_' % label) + if refname[0] == '#': + refname = refname[1:] + refnode['auto'] = 1 + self.statemachine.memo.document.note_autofootnote_ref(refnode) + elif refname == '*': + refname = '' + refnode['auto'] = '*' + self.statemachine.memo.document.note_symbol_footnote_ref( + refnode) + else: + refnode += nodes.Text(label) + if refname: + refnode['refname'] = refname + self.statemachine.memo.document.note_footnote_ref(refnode) + string = match.string + matchstart = match.start(self.inline.groups.initial.whole) + matchend = match.end(self.inline.groups.initial.whole) + return (string[:matchstart], [refnode], string[matchend:], []) + + def reference(self, match, lineno, anonymous=None): + referencename = match.group(self.inline.groups.initial.refname) + refname = normname(referencename) + referencenode = nodes.reference( + referencename + match.group(self.inline.groups.initial.refend), + referencename) + if anonymous: + referencenode['anonymous'] = 1 + self.statemachine.memo.document.note_anonymous_ref(referencenode) + else: + referencenode['refname'] = refname + self.statemachine.memo.document.note_refname(referencenode) + string = match.string + matchstart = match.start(self.inline.groups.initial.whole) + matchend = match.end(self.inline.groups.initial.whole) + return (string[:matchstart], [referencenode], string[matchend:], []) + + def anonymous_reference(self, match, lineno): + return self.reference(match, lineno, anonymous=1) + + def standalone_uri(self, text, lineno): + pattern = self.inline.patterns.uri + whole = self.inline.groups.uri.whole + scheme = self.inline.groups.uri.scheme + email = self.inline.groups.uri.email + remainder = text + textnodes = [] + start = 0 + while 1: + match = pattern.search(remainder, start) + if match: + if not match.group(scheme) or \ + urischemes.schemes.has_key(match.group(scheme).lower()): + if match.start(whole) > 0: + textnodes.append(nodes.Text(unescape( + remainder[:match.start(whole)]))) + if match.group(email): + addscheme = 'mailto:' + else: + addscheme = '' + text = match.group(whole) + unescaped = unescape(text, 0) + textnodes.append( + nodes.reference(unescape(text, 1), unescaped, + refuri=addscheme + unescaped)) + remainder = remainder[match.end(whole):] + start = 0 + else: # not a valid scheme + start = match.end(whole) + else: + if remainder: + textnodes.append(nodes.Text(unescape(remainder))) + break + return textnodes + + inline.dispatch = {'*': emphasis, + '**': strong, + '`': interpreted_or_phrase_ref, + '``': literal, + '_`': inline_target, + ']_': footnote_reference, + '|': substitution_reference, + '_': reference, + '__': anonymous_reference} + + def inline_text(self, text, lineno): + """ + Return 2 lists: nodes (text and inline elements), and system_messages. + + Using a `pattern` matching start-strings (for emphasis, strong, + interpreted, phrase reference, literal, substitution reference, and + inline target) or complete constructs (simple reference, footnote + reference) we search for a candidate. When one is found, we check for + validity (e.g., not a quoted '*' character). If valid, search for the + corresponding end string if applicable, and check for validity. If not + found or invalid, generate a warning and ignore the start-string. + Standalone hyperlinks are found last. + """ + pattern = self.inline.patterns.initial + dispatch = self.inline.dispatch + start = self.inline.groups.initial.start - 1 + backquote = self.inline.groups.initial.backquote - 1 + refend = self.inline.groups.initial.refend - 1 + fnend = self.inline.groups.initial.fnend - 1 + remaining = escape2null(text) + processed = [] + unprocessed = [] + messages = [] + while remaining: + match = pattern.search(remaining) + if match: + groups = match.groups() + before, inlines, remaining, sysmessages = \ + dispatch[groups[start] or groups[backquote] + or groups[refend] + or groups[fnend]](self, match, lineno) + unprocessed.append(before) + messages += sysmessages + if inlines: + processed += self.standalone_uri(''.join(unprocessed), + lineno) + processed += inlines + unprocessed = [] + else: + break + remaining = ''.join(unprocessed) + remaining + if remaining: + processed += self.standalone_uri(remaining, lineno) + return processed, messages + + def unindentwarning(self): + return self.statemachine.memo.reporter.warning( + ('Unindent without blank line at line %s.' + % (self.statemachine.abslineno() + 1))) + + +class Body(RSTState): + + """ + Generic classifier of the first line of a block. + """ + + enum = Stuff() + """Enumerated list parsing information.""" + + enum.formatinfo = { + 'parens': Stuff(prefix='(', suffix=')', start=1, end=-1), + 'rparen': Stuff(prefix='', suffix=')', start=0, end=-1), + 'period': Stuff(prefix='', suffix='.', start=0, end=-1)} + enum.formats = enum.formatinfo.keys() + enum.sequences = ['arabic', 'loweralpha', 'upperalpha', + 'lowerroman', 'upperroman'] # ORDERED! + enum.sequencepats = {'arabic': '[0-9]+', + 'loweralpha': '[a-z]', + 'upperalpha': '[A-Z]', + 'lowerroman': '[ivxlcdm]+', + 'upperroman': '[IVXLCDM]+',} + enum.converters = {'arabic': int, + 'loweralpha': + lambda s, zero=(ord('a')-1): ord(s) - zero, + 'upperalpha': + lambda s, zero=(ord('A')-1): ord(s) - zero, + 'lowerroman': + lambda s: roman.fromRoman(s.upper()), + 'upperroman': roman.fromRoman} + + enum.sequenceregexps = {} + for sequence in enum.sequences: + enum.sequenceregexps[sequence] = re.compile(enum.sequencepats[sequence] + + '$') + + tabletoppat = re.compile(r'\+-[-+]+-\+ *$') + """Matches the top (& bottom) of a table).""" + + tableparser = TableParser() + + pats = {} + """Fragments of patterns used by transitions.""" + + pats['nonalphanum7bit'] = '[!-/:-@[-`{-~]' + pats['alpha'] = '[a-zA-Z]' + pats['alphanum'] = '[a-zA-Z0-9]' + pats['alphanumplus'] = '[a-zA-Z0-9_-]' + pats['enum'] = ('(%(arabic)s|%(loweralpha)s|%(upperalpha)s|%(lowerroman)s' + '|%(upperroman)s)' % enum.sequencepats) + pats['optname'] = '%(alphanum)s%(alphanumplus)s*' % pats + pats['optarg'] = '%(alpha)s%(alphanumplus)s*' % pats + pats['option'] = r'(--?|\+|/)%(optname)s([ =]%(optarg)s)?' % pats + + for format in enum.formats: + pats[format] = '(?P<%s>%s%s%s)' % ( + format, re.escape(enum.formatinfo[format].prefix), + pats['enum'], re.escape(enum.formatinfo[format].suffix)) + + patterns = {'bullet': r'[-+*]( +|$)', + 'enumerator': r'(%(parens)s|%(rparen)s|%(period)s)( +|$)' + % pats, + 'field_marker': r':[^: ]([^:]*[^: ])?:( +|$)', + 'option_marker': r'%(option)s(, %(option)s)*( +| ?$)' % pats, + 'doctest': r'>>>( +|$)', + 'tabletop': tabletoppat, + 'explicit_markup': r'\.\.( +|$)', + 'anonymous': r'__( +|$)', + 'line': r'(%(nonalphanum7bit)s)\1\1\1+ *$' % pats, + #'rfc822': r'[!-9;-~]+:( +|$)', + 'text': r''} + initialtransitions = ['bullet', + 'enumerator', + 'field_marker', + 'option_marker', + 'doctest', + 'tabletop', + 'explicit_markup', + 'anonymous', + 'line', + 'text'] + + def indent(self, match, context, nextstate): + """Block quote.""" + indented, indent, lineoffset, blankfinish = \ + self.statemachine.getindented() + blockquote = self.block_quote(indented, lineoffset) + self.statemachine.node += blockquote + if not blankfinish: + self.statemachine.node += self.unindentwarning() + return context, nextstate, [] + + def block_quote(self, indented, lineoffset): + blockquote = nodes.block_quote() + self.nestedparse(indented, lineoffset, blockquote) + return blockquote + + def bullet(self, match, context, nextstate): + """Bullet list item.""" + bulletlist = nodes.bullet_list() + self.statemachine.node += bulletlist + bulletlist['bullet'] = match.string[0] + i, blankfinish = self.list_item(match.end()) + bulletlist += i + offset = self.statemachine.lineoffset + 1 # next line + newlineoffset, blankfinish = self.nestedlistparse( + self.statemachine.inputlines[offset:], + inputoffset=self.statemachine.abslineoffset() + 1, + node=bulletlist, initialstate='BulletList', + blankfinish=blankfinish) + if not blankfinish: + self.statemachine.node += self.unindentwarning() + self.gotoline(newlineoffset) + return [], nextstate, [] + + def list_item(self, indent): + indented, lineoffset, blankfinish = \ + self.statemachine.getknownindented(indent) + listitem = nodes.list_item('\n'.join(indented)) + if indented: + self.nestedparse(indented, inputoffset=lineoffset, node=listitem) + return listitem, blankfinish + + def enumerator(self, match, context, nextstate): + """Enumerated List Item""" + format, sequence, text, ordinal = self.parse_enumerator(match) + if ordinal is None: + msg = self.statemachine.memo.reporter.error( + ('Enumerated list start value invalid at line %s: ' + '%r (sequence %r)' % (self.statemachine.abslineno(), + text, sequence))) + self.statemachine.node += msg + indented, lineoffset, blankfinish = \ + self.statemachine.getknownindented(match.end()) + bq = self.block_quote(indented, lineoffset) + self.statemachine.node += bq + if not blankfinish: + self.statemachine.node += self.unindentwarning() + return [], nextstate, [] + if ordinal != 1: + msg = self.statemachine.memo.reporter.info( + ('Enumerated list start value not ordinal-1 at line %s: ' + '%r (ordinal %s)' % (self.statemachine.abslineno(), + text, ordinal))) + self.statemachine.node += msg + enumlist = nodes.enumerated_list() + self.statemachine.node += enumlist + enumlist['enumtype'] = sequence + if ordinal != 1: + enumlist['start'] = ordinal + enumlist['prefix'] = self.enum.formatinfo[format].prefix + enumlist['suffix'] = self.enum.formatinfo[format].suffix + listitem, blankfinish = self.list_item(match.end()) + enumlist += listitem + offset = self.statemachine.lineoffset + 1 # next line + newlineoffset, blankfinish = self.nestedlistparse( + self.statemachine.inputlines[offset:], + inputoffset=self.statemachine.abslineoffset() + 1, + node=enumlist, initialstate='EnumeratedList', + blankfinish=blankfinish, + extrasettings={'lastordinal': ordinal, 'format': format}) + if not blankfinish: + self.statemachine.node += self.unindentwarning() + self.gotoline(newlineoffset) + return [], nextstate, [] + + def parse_enumerator(self, match, expectedsequence=None): + """ + Analyze an enumerator and return the results. + + :Return: + - the enumerator format ('period', 'parens', or 'rparen'), + - the sequence used ('arabic', 'loweralpha', 'upperroman', etc.), + - the text of the enumerator, stripped of formatting, and + - the ordinal value of the enumerator ('a' -> 1, 'ii' -> 2, etc.; + ``None`` is returned for invalid enumerator text). + + The enumerator format has already been determined by the regular + expression match. If `expectedsequence` is given, that sequence is + tried first. If not, we check for Roman numeral 1. This way, + single-character Roman numerals (which are also alphabetical) can be + matched. If no sequence has been matched, all sequences are checked in + order. + """ + groupdict = match.groupdict() + sequence = '' + for format in self.enum.formats: + if groupdict[format]: # was this the format matched? + break # yes; keep `format` + else: # shouldn't happen + raise ParserError, 'enumerator format not matched' + text = groupdict[format][self.enum.formatinfo[format].start + :self.enum.formatinfo[format].end] + if expectedsequence: + try: + if self.enum.sequenceregexps[expectedsequence].match(text): + sequence = expectedsequence + except KeyError: # shouldn't happen + raise ParserError, 'unknown sequence: %s' % sequence + else: + if text == 'i': + sequence = 'lowerroman' + elif text == 'I': + sequence = 'upperroman' + if not sequence: + for sequence in self.enum.sequences: + if self.enum.sequenceregexps[sequence].match(text): + break + else: # shouldn't happen + raise ParserError, 'enumerator sequence not matched' + try: + ordinal = self.enum.converters[sequence](text) + except roman.InvalidRomanNumeralError: + ordinal = None + return format, sequence, text, ordinal + + def field_marker(self, match, context, nextstate): + """Field list item.""" + fieldlist = nodes.field_list() + self.statemachine.node += fieldlist + field, blankfinish = self.field(match) + fieldlist += field + offset = self.statemachine.lineoffset + 1 # next line + newlineoffset, blankfinish = self.nestedlistparse( + self.statemachine.inputlines[offset:], + inputoffset=self.statemachine.abslineoffset() + 1, + node=fieldlist, initialstate='FieldList', + blankfinish=blankfinish) + if not blankfinish: + self.statemachine.node += self.unindentwarning() + self.gotoline(newlineoffset) + return [], nextstate, [] + + def field(self, match): + name, args = self.parse_field_marker(match) + indented, indent, lineoffset, blankfinish = \ + self.statemachine.getfirstknownindented(match.end()) + fieldnode = nodes.field() + fieldnode += nodes.field_name(name, name) + for arg in args: + fieldnode += nodes.field_argument(arg, arg) + fieldbody = nodes.field_body('\n'.join(indented)) + fieldnode += fieldbody + if indented: + self.nestedparse(indented, inputoffset=lineoffset, node=fieldbody) + return fieldnode, blankfinish + + def parse_field_marker(self, match): + """Extract & return name & argument list from a field marker match.""" + field = match.string[1:] # strip off leading ':' + field = field[:field.find(':')] # strip off trailing ':' etc. + tokens = field.split() + return tokens[0], tokens[1:] # first == name, others == args + + def option_marker(self, match, context, nextstate): + """Option list item.""" + optionlist = nodes.option_list() + try: + listitem, blankfinish = self.option_list_item(match) + except MarkupError, detail: # shouldn't happen; won't match pattern + msg = self.statemachine.memo.reporter.error( + ('Invalid option list marker at line %s: %s' + % (self.statemachine.abslineno(), detail))) + self.statemachine.node += msg + indented, indent, lineoffset, blankfinish = \ + self.statemachine.getfirstknownindented(match.end()) + blockquote = self.block_quote(indented, lineoffset) + self.statemachine.node += blockquote + if not blankfinish: + self.statemachine.node += self.unindentwarning() + return [], nextstate, [] + self.statemachine.node += optionlist + optionlist += listitem + offset = self.statemachine.lineoffset + 1 # next line + newlineoffset, blankfinish = self.nestedlistparse( + self.statemachine.inputlines[offset:], + inputoffset=self.statemachine.abslineoffset() + 1, + node=optionlist, initialstate='OptionList', + blankfinish=blankfinish) + if not blankfinish: + self.statemachine.node += self.unindentwarning() + self.gotoline(newlineoffset) + return [], nextstate, [] + + def option_list_item(self, match): + options = self.parse_option_marker(match) + indented, indent, lineoffset, blankfinish = \ + self.statemachine.getfirstknownindented(match.end()) + if not indented: # not an option list item + raise statemachine.TransitionCorrection('text') + option_group = nodes.option_group('', *options) + description = nodes.description('\n'.join(indented)) + option_list_item = nodes.option_list_item('', option_group, description) + if indented: + self.nestedparse(indented, inputoffset=lineoffset, node=description) + return option_list_item, blankfinish + + def parse_option_marker(self, match): + """ + Return a list of `node.option` and `node.option_argument` objects, + parsed from an option marker match. + + :Exception: `MarkupError` for invalid option markers. + """ + optlist = [] + optionstrings = match.group().rstrip().split(', ') + for optionstring in optionstrings: + tokens = optionstring.split() + delimiter = ' ' + firstopt = tokens[0].split('=') + if len(firstopt) > 1: + tokens[:1] = firstopt + delimiter = '=' + if 0 < len(tokens) <= 2: + option = nodes.option(optionstring) + option += nodes.option_string(tokens[0], tokens[0]) + if len(tokens) > 1: + option += nodes.option_argument(tokens[1], tokens[1], + delimiter=delimiter) + optlist.append(option) + else: + raise MarkupError('wrong numer of option tokens (=%s), ' + 'should be 1 or 2: %r' % (len(tokens), + optionstring)) + return optlist + + def doctest(self, match, context, nextstate): + data = '\n'.join(self.statemachine.gettextblock()) + self.statemachine.node += nodes.doctest_block(data, data) + return [], nextstate, [] + + def tabletop(self, match, context, nextstate): + """Top border of a table.""" + nodelist, blankfinish = self.table() + self.statemachine.node += nodelist + if not blankfinish: + msg = self.statemachine.memo.reporter.warning( + 'Blank line required after table at line %s.' + % (self.statemachine.abslineno() + 1)) + self.statemachine.node += msg + return [], nextstate, [] + + def table(self): + """Parse a table.""" + block, messages, blankfinish = self.isolatetable() + if block: + try: + tabledata = self.tableparser.parse(block) + tableline = self.statemachine.abslineno() - len(block) + 1 + table = self.buildtable(tabledata, tableline) + nodelist = [table] + messages + except TableMarkupError, detail: + nodelist = self.malformedtable(block, str(detail)) + messages + else: + nodelist = messages + return nodelist, blankfinish + + def isolatetable(self): + messages = [] + blankfinish = 1 + try: + block = self.statemachine.getunindented() + except statemachine.UnexpectedIndentationError, instance: + block, lineno = instance.args + messages.append(self.statemachine.memo.reporter.error( + 'Unexpected indentation at line %s.' % lineno)) + blankfinish = 0 + width = len(block[0].strip()) + for i in range(len(block)): + block[i] = block[i].strip() + if block[i][0] not in '+|': # check left edge + blankfinish = 0 + self.statemachine.previousline(len(block) - i) + del block[i:] + break + if not self.tabletoppat.match(block[-1]): # find bottom + blankfinish = 0 + # from second-last to third line of table: + for i in range(len(block) - 2, 1, -1): + if self.tabletoppat.match(block[i]): + self.statemachine.previousline(len(block) - i + 1) + del block[i+1:] + break + else: + messages.extend(self.malformedtable(block)) + return [], messages, blankfinish + for i in range(len(block)): # check right edge + if len(block[i]) != width or block[i][-1] not in '+|': + messages.extend(self.malformedtable(block)) + return [], messages, blankfinish + return block, messages, blankfinish + + def malformedtable(self, block, detail=''): + data = '\n'.join(block) + message = 'Malformed table at line %s; formatting as a ' \ + 'literal block.' % (self.statemachine.abslineno() + - len(block) + 1) + if detail: + message += '\n' + detail + nodelist = [self.statemachine.memo.reporter.error(message), + nodes.literal_block(data, data)] + return nodelist + + def buildtable(self, tabledata, tableline): + colspecs, headrows, bodyrows = tabledata + table = nodes.table() + tgroup = nodes.tgroup(cols=len(colspecs)) + table += tgroup + for colspec in colspecs: + tgroup += nodes.colspec(colwidth=colspec) + if headrows: + thead = nodes.thead() + tgroup += thead + for row in headrows: + thead += self.buildtablerow(row, tableline) + tbody = nodes.tbody() + tgroup += tbody + for row in bodyrows: + tbody += self.buildtablerow(row, tableline) + return table + + def buildtablerow(self, rowdata, tableline): + row = nodes.row() + for cell in rowdata: + if cell is None: + continue + morerows, morecols, offset, cellblock = cell + attributes = {} + if morerows: + attributes['morerows'] = morerows + if morecols: + attributes['morecols'] = morecols + entry = nodes.entry(**attributes) + row += entry + if ''.join(cellblock): + self.nestedparse(cellblock, inputoffset=tableline+offset, + node=entry) + return row + + + explicit = Stuff() + """Patterns and constants used for explicit markup recognition.""" + + explicit.patterns = Stuff( + target=re.compile(r""" + (?: + _ # anonymous target + | # *OR* + (`?) # optional open quote + (?![ `]) # first char. not space or backquote + ( # reference name + .+? + ) + %s # not whitespace or escape + \1 # close quote if open quote used + ) + %s # not whitespace or escape + : # end of reference name + (?:[ ]+|$) # followed by whitespace + """ + % (RSTState.inline.non_whitespace_escape_before, + RSTState.inline.non_whitespace_escape_before), + re.VERBOSE), + reference=re.compile(r""" + (?: + (%s)_ # simple reference name + | # *OR* + ` # open backquote + (?![ ]) # not space + (.+?) # hyperlink phrase + %s # not whitespace or escape + `_ # close backquote & reference mark + ) + $ # end of string + """ % + (RSTState.inline.simplename, + RSTState.inline.non_whitespace_escape_before,), + re.VERBOSE), + substitution=re.compile(r""" + (?: + (?![ ]) # first char. not space + (.+?) # substitution text + %s # not whitespace or escape + \| # close delimiter + ) + (?:[ ]+|$) # followed by whitespace + """ % + RSTState.inline.non_whitespace_escape_before, + re.VERBOSE),) + explicit.groups = Stuff( + target=Stuff(quote=1, name=2), + reference=Stuff(simple=1, phrase=2), + substitution=Stuff(name=1)) + + def footnote(self, match): + indented, indent, offset, blankfinish = \ + self.statemachine.getfirstknownindented(match.end()) + label = match.group(1) + name = normname(label) + footnote = nodes.footnote('\n'.join(indented)) + if name[0] == '#': # auto-numbered + name = name[1:] # autonumber label + footnote['auto'] = 1 + if name: + footnote['name'] = name + self.statemachine.memo.document.note_autofootnote(footnote) + elif name == '*': # auto-symbol + name = '' + footnote['auto'] = '*' + self.statemachine.memo.document.note_symbol_footnote(footnote) + else: # manually numbered + footnote += nodes.label('', label) + footnote['name'] = name + self.statemachine.memo.document.note_footnote(footnote) + if name: + self.statemachine.memo.document.note_explicit_target(footnote, + footnote) + if indented: + self.nestedparse(indented, inputoffset=offset, node=footnote) + return [footnote], blankfinish + + def citation(self, match): + indented, indent, offset, blankfinish = \ + self.statemachine.getfirstknownindented(match.end()) + label = match.group(1) + name = normname(label) + citation = nodes.citation('\n'.join(indented)) + citation += nodes.label('', label) + citation['name'] = name + self.statemachine.memo.document.note_citation(citation) + self.statemachine.memo.document.note_explicit_target(citation, citation) + if indented: + self.nestedparse(indented, inputoffset=offset, node=citation) + return [citation], blankfinish + + def hyperlink_target(self, match): + pattern = self.explicit.patterns.target + namegroup = self.explicit.groups.target.name + lineno = self.statemachine.abslineno() + block, indent, offset, blankfinish = \ + self.statemachine.getfirstknownindented(match.end(), uptoblank=1, + stripindent=0) + blocktext = match.string[:match.end()] + '\n'.join(block) + block = [escape2null(line) for line in block] + escaped = block[0] + blockindex = 0 + while 1: + targetmatch = pattern.match(escaped) + if targetmatch: + break + blockindex += 1 + try: + escaped += block[blockindex] + except (IndexError, MarkupError): + raise MarkupError('malformed hyperlink target at line %s.' + % lineno) + del block[:blockindex] + block[0] = (block[0] + ' ')[targetmatch.end()-len(escaped)-1:].strip() + if block and block[-1].strip()[-1:] == '_': # possible indirect target + reference = ' '.join([line.strip() for line in block]) + refname = self.isreference(reference) + if refname: + target = nodes.target(blocktext, '', refname=refname) + self.addtarget(targetmatch.group(namegroup), '', target) + self.statemachine.memo.document.note_indirect_target(target) + return [target], blankfinish + nodelist = [] + reference = ''.join([line.strip() for line in block]) + if reference.find(' ') != -1: + warning = self.statemachine.memo.reporter.warning( + 'Hyperlink target at line %s contains whitespace. ' + 'Perhaps a footnote was intended?' + % (self.statemachine.abslineno() - len(block) + 1), '', + nodes.literal_block(blocktext, blocktext)) + nodelist.append(warning) + else: + unescaped = unescape(reference) + target = nodes.target(blocktext, '') + self.addtarget(targetmatch.group(namegroup), unescaped, target) + nodelist.append(target) + return nodelist, blankfinish + + def isreference(self, reference): + match = self.explicit.patterns.reference.match(normname(reference)) + if not match: + return None + return unescape(match.group(self.explicit.groups.reference.simple) + or match.group(self.explicit.groups.reference.phrase)) + + def addtarget(self, targetname, refuri, target): + if targetname: + name = normname(unescape(targetname)) + target['name'] = name + if refuri: + target['refuri'] = refuri + self.statemachine.memo.document.note_external_target(target) + else: + self.statemachine.memo.document.note_internal_target(target) + self.statemachine.memo.document.note_explicit_target( + target, self.statemachine.node) + else: # anonymous target + if refuri: + target['refuri'] = refuri + target['anonymous'] = 1 + self.statemachine.memo.document.note_anonymous_target(target) + + def substitutiondef(self, match): + pattern = self.explicit.patterns.substitution + lineno = self.statemachine.abslineno() + block, indent, offset, blankfinish = \ + self.statemachine.getfirstknownindented(match.end(), + stripindent=0) + blocktext = (match.string[:match.end()] + '\n'.join(block)) + block = [escape2null(line) for line in block] + escaped = block[0].rstrip() + blockindex = 0 + while 1: + subdefmatch = pattern.match(escaped) + if subdefmatch: + break + blockindex += 1 + try: + escaped = escaped + ' ' + block[blockindex].strip() + except (IndexError, MarkupError): + raise MarkupError('malformed substitution definition ' + 'at line %s.' % lineno) + del block[:blockindex] # strip out the substitution marker + block[0] = (block[0] + ' ')[subdefmatch.end()-len(escaped)-1:].strip() + if not block[0]: + del block[0] + offset += 1 + subname = subdefmatch.group(self.explicit.groups.substitution.name) + name = normname(subname) + substitutionnode = nodes.substitution_definition( + blocktext, name=name, alt=subname) + if block: + block[0] = block[0].strip() + newabsoffset, blankfinish = self.nestedlistparse( + block, inputoffset=offset, node=substitutionnode, + initialstate='SubstitutionDef', blankfinish=blankfinish) + self.statemachine.previousline( + len(block) + offset - newabsoffset - 1) + i = 0 + for node in substitutionnode[:]: + if not (isinstance(node, nodes.Inline) or + isinstance(node, nodes.Text)): + self.statemachine.node += substitutionnode[i] + del substitutionnode[i] + else: + i += 1 + if len(substitutionnode) == 0: + msg = self.statemachine.memo.reporter.warning( + 'Substitution definition "%s" empty or invalid at line ' + '%s.' % (subname, self.statemachine.abslineno()), '', + nodes.literal_block(blocktext, blocktext)) + self.statemachine.node += msg + else: + del substitutionnode['alt'] + self.statemachine.memo.document.note_substitution_def( + substitutionnode, self.statemachine.node) + return [substitutionnode], blankfinish + else: + msg = self.statemachine.memo.reporter.warning( + 'Substitution definition "%s" missing contents at line %s.' + % (subname, self.statemachine.abslineno()), '', + nodes.literal_block(blocktext, blocktext)) + self.statemachine.node += msg + return [], blankfinish + + def directive(self, match, **attributes): + typename = match.group(1) + directivefunction = directives.directive( + typename, self.statemachine.memo.language) + data = match.string[match.end():].strip() + if directivefunction: + return directivefunction(match, typename, data, self, + self.statemachine, attributes) + else: + return self.unknowndirective(typename, data) + + def unknowndirective(self, typename, data): + lineno = self.statemachine.abslineno() + indented, indent, offset, blankfinish = \ + self.statemachine.getfirstknownindented(0, stripindent=0) + text = '\n'.join(indented) + error = self.statemachine.memo.reporter.error( + 'Unknown directive type "%s" at line %s.' % (typename, lineno), + '', nodes.literal_block(text, text)) + return [error], blankfinish + + def parse_extension_attributes(self, attribute_spec, datalines, blankfinish): + """ + Parse `datalines` for a field list containing extension attributes + matching `attribute_spec`. + + :Parameters: + - `attribute_spec`: a mapping of attribute name to conversion + function, which should raise an exception on bad input. + - `datalines`: a list of input strings. + - `blankfinish`: + + :Return: + - Success value, 1 or 0. + - An attribute dictionary on success, an error string on failure. + - Updated `blankfinish` flag. + """ + node = nodes.field_list() + newlineoffset, blankfinish = self.nestedlistparse( + datalines, 0, node, initialstate='FieldList', + blankfinish=blankfinish) + if newlineoffset != len(datalines): # incomplete parse of block + return 0, 'invalid attribute block', blankfinish + try: + attributes = utils.extract_extension_attributes(node, attribute_spec) + except KeyError, detail: + return 0, ('unknown attribute: "%s"' % detail), blankfinish + except (ValueError, TypeError), detail: + return 0, ('invalid attribute value:\n%s' % detail), blankfinish + except utils.ExtensionAttributeError, detail: + return 0, ('invalid attribute data: %s' % detail), blankfinish + return 1, attributes, blankfinish + + def comment(self, match): + if not match.string[match.end():].strip() \ + and self.statemachine.nextlineblank(): # an empty comment? + return [nodes.comment()], 1 # "A tiny but practical wart." + indented, indent, offset, blankfinish = \ + self.statemachine.getfirstknownindented(match.end()) + text = '\n'.join(indented) + return [nodes.comment(text, text)], blankfinish + + explicit.constructs = [ + (footnote, + re.compile(r""" + \.\.[ ]+ # explicit markup start + \[ + ( # footnote label: + [0-9]+ # manually numbered footnote + | # *OR* + \# # anonymous auto-numbered footnote + | # *OR* + \#%s # auto-number ed?) footnote label + | # *OR* + \* # auto-symbol footnote + ) + \] + (?:[ ]+|$) # whitespace or end of line + """ % RSTState.inline.simplename, re.VERBOSE)), + (citation, + re.compile(r""" + \.\.[ ]+ # explicit markup start + \[(%s)\] # citation label + (?:[ ]+|$) # whitespace or end of line + """ % RSTState.inline.simplename, re.VERBOSE)), + (hyperlink_target, + re.compile(r""" + \.\.[ ]+ # explicit markup start + _ # target indicator + (?![ ]) # first char. not space + """, re.VERBOSE)), + (substitutiondef, + re.compile(r""" + \.\.[ ]+ # explicit markup start + \| # substitution indicator + (?![ ]) # first char. not space + """, re.VERBOSE)), + (directive, + re.compile(r""" + \.\.[ ]+ # explicit markup start + (%s) # directive name + :: # directive delimiter + (?:[ ]+|$) # whitespace or end of line + """ % RSTState.inline.simplename, re.VERBOSE))] + + def explicit_markup(self, match, context, nextstate): + """Footnotes, hyperlink targets, directives, comments.""" + nodelist, blankfinish = self.explicit_construct(match) + self.statemachine.node += nodelist + self.explicitlist(blankfinish) + return [], nextstate, [] + + def explicit_construct(self, match): + """Determine which explicit construct this is, parse & return it.""" + errors = [] + for method, pattern in self.explicit.constructs: + expmatch = pattern.match(match.string) + if expmatch: + try: + return method(self, expmatch) + except MarkupError, detail: # never reached? + errors.append( + self.statemachine.memo.reporter.warning('%s: %s' + % (detail.__class__.__name__, detail))) + break + nodelist, blankfinish = self.comment(match) + return nodelist + errors, blankfinish + + def explicitlist(self, blankfinish): + """ + Create a nested state machine for a series of explicit markup constructs + (including anonymous hyperlink targets). + """ + offset = self.statemachine.lineoffset + 1 # next line + newlineoffset, blankfinish = self.nestedlistparse( + self.statemachine.inputlines[offset:], + inputoffset=self.statemachine.abslineoffset() + 1, + node=self.statemachine.node, initialstate='Explicit', + blankfinish=blankfinish) + self.gotoline(newlineoffset) + if not blankfinish: + self.statemachine.node += self.unindentwarning() + + def anonymous(self, match, context, nextstate): + """Anonymous hyperlink targets.""" + nodelist, blankfinish = self.anonymous_target(match) + self.statemachine.node += nodelist + self.explicitlist(blankfinish) + return [], nextstate, [] + + def anonymous_target(self, match): + block, indent, offset, blankfinish \ + = self.statemachine.getfirstknownindented(match.end(), + uptoblank=1) + blocktext = match.string[:match.end()] + '\n'.join(block) + if block and block[-1].strip()[-1:] == '_': # possible indirect target + reference = escape2null(' '.join([line.strip() for line in block])) + refname = self.isreference(reference) + if refname: + target = nodes.target(blocktext, '', refname=refname, + anonymous=1) + self.statemachine.memo.document.note_anonymous_target(target) + self.statemachine.memo.document.note_indirect_target(target) + return [target], blankfinish + nodelist = [] + reference = escape2null(''.join([line.strip() for line in block])) + if reference.find(' ') != -1: + warning = self.statemachine.memo.reporter.warning( + 'Anonymous hyperlink target at line %s contains whitespace. ' + 'Perhaps a footnote was intended?' + % (self.statemachine.abslineno() - len(block) + 1), '', + nodes.literal_block(blocktext, blocktext)) + nodelist.append(warning) + else: + target = nodes.target(blocktext, '', anonymous=1) + if reference: + unescaped = unescape(reference) + target['refuri'] = unescaped + self.statemachine.memo.document.note_anonymous_target(target) + nodelist.append(target) + return nodelist, blankfinish + + def line(self, match, context, nextstate): + """Section title overline or transition marker.""" + if self.statemachine.matchtitles: + return [match.string], 'Line', [] + else: + blocktext = self.statemachine.line + msg = self.statemachine.memo.reporter.severe( + 'Unexpected section title or transition at line %s.' + % self.statemachine.abslineno(), '', + nodes.literal_block(blocktext, blocktext)) + self.statemachine.node += msg + return [], nextstate, [] + + def text(self, match, context, nextstate): + """Titles, definition lists, paragraphs.""" + return [match.string], 'Text', [] + + +class SpecializedBody(Body): + + """ + Superclass for second and subsequent compound element members. + + All transition methods are disabled. Override individual methods in + subclasses to re-enable. + """ + + def invalid_input(self, match=None, context=None, nextstate=None): + """Not a compound element member. Abort this state machine.""" + self.statemachine.previousline() # back up so parent SM can reassess + raise EOFError + + indent = invalid_input + bullet = invalid_input + enumerator = invalid_input + field_marker = invalid_input + option_marker = invalid_input + doctest = invalid_input + tabletop = invalid_input + explicit_markup = invalid_input + anonymous = invalid_input + line = invalid_input + text = invalid_input + + +class BulletList(SpecializedBody): + + """Second and subsequent bullet_list list_items.""" + + def bullet(self, match, context, nextstate): + """Bullet list item.""" + if match.string[0] != self.statemachine.node['bullet']: + # different bullet: new list + self.invalid_input() + listitem, blankfinish = self.list_item(match.end()) + self.statemachine.node += listitem + self.blankfinish = blankfinish + return [], 'BulletList', [] + + +class DefinitionList(SpecializedBody): + + """Second and subsequent definition_list_items.""" + + def text(self, match, context, nextstate): + """Definition lists.""" + return [match.string], 'Definition', [] + + +class EnumeratedList(SpecializedBody): + + """Second and subsequent enumerated_list list_items.""" + + def enumerator(self, match, context, nextstate): + """Enumerated list item.""" + format, sequence, text, ordinal = self.parse_enumerator( + match, self.statemachine.node['enumtype']) + if (sequence != self.statemachine.node['enumtype'] or + format != self.format or + ordinal != self.lastordinal + 1): + # different enumeration: new list + self.invalid_input() + listitem, blankfinish = self.list_item(match.end()) + self.statemachine.node += listitem + self.blankfinish = blankfinish + self.lastordinal = ordinal + return [], 'EnumeratedList', [] + + +class FieldList(SpecializedBody): + + """Second and subsequent field_list fields.""" + + def field_marker(self, match, context, nextstate): + """Field list field.""" + field, blankfinish = self.field(match) + self.statemachine.node += field + self.blankfinish = blankfinish + return [], 'FieldList', [] + + +class OptionList(SpecializedBody): + + """Second and subsequent option_list option_list_items.""" + + def option_marker(self, match, context, nextstate): + """Option list item.""" + try: + option_list_item, blankfinish = self.option_list_item(match) + except MarkupError, detail: + self.invalid_input() + self.statemachine.node += option_list_item + self.blankfinish = blankfinish + return [], 'OptionList', [] + + +class RFC822List(SpecializedBody): + + """Second and subsequent RFC822 field_list fields.""" + + pass + + +class Explicit(SpecializedBody): + + """Second and subsequent explicit markup construct.""" + + def explicit_markup(self, match, context, nextstate): + """Footnotes, hyperlink targets, directives, comments.""" + nodelist, blankfinish = self.explicit_construct(match) + self.statemachine.node += nodelist + self.blankfinish = blankfinish + return [], nextstate, [] + + def anonymous(self, match, context, nextstate): + """Anonymous hyperlink targets.""" + nodelist, blankfinish = self.anonymous_target(match) + self.statemachine.node += nodelist + self.blankfinish = blankfinish + return [], nextstate, [] + + +class SubstitutionDef(Body): + + """ + Parser for the contents of a substitution_definition element. + """ + + patterns = { + 'embedded_directive': r'(%s)::( +|$)' % RSTState.inline.simplename, + 'text': r''} + initialtransitions = ['embedded_directive', 'text'] + + def embedded_directive(self, match, context, nextstate): + if self.statemachine.node.has_key('alt'): + attributes = {'alt': self.statemachine.node['alt']} + else: + attributes = {} + nodelist, blankfinish = self.directive(match, **attributes) + self.statemachine.node += nodelist + if not self.statemachine.ateof(): + self.blankfinish = blankfinish + raise EOFError + + def text(self, match, context, nextstate): + if not self.statemachine.ateof(): + self.blankfinish = self.statemachine.nextlineblank() + raise EOFError + + +class Text(RSTState): + + """ + Classifier of second line of a text block. + + Could be a paragraph, a definition list item, or a title. + """ + + patterns = {'underline': Body.patterns['line'], + 'text': r''} + initialtransitions = [('underline', 'Body'), ('text', 'Body')] + + def blank(self, match, context, nextstate): + """End of paragraph.""" + paragraph, literalnext = self.paragraph( + context, self.statemachine.abslineno() - 1) + self.statemachine.node += paragraph + if literalnext: + self.statemachine.node += self.literal_block() + return [], 'Body', [] + + def eof(self, context): + if context: + paragraph, literalnext = self.paragraph( + context, self.statemachine.abslineno() - 1) + self.statemachine.node += paragraph + if literalnext: + self.statemachine.node += self.literal_block() + return [] + + def indent(self, match, context, nextstate): + """Definition list item.""" + definitionlist = nodes.definition_list() + definitionlistitem, blankfinish = self.definition_list_item(context) + definitionlist += definitionlistitem + self.statemachine.node += definitionlist + offset = self.statemachine.lineoffset + 1 # next line + newlineoffset, blankfinish = self.nestedlistparse( + self.statemachine.inputlines[offset:], + inputoffset=self.statemachine.abslineoffset() + 1, + node=definitionlist, initialstate='DefinitionList', + blankfinish=blankfinish, blankfinishstate='Definition') + if not blankfinish: + self.statemachine.node += self.unindentwarning() + self.gotoline(newlineoffset) + return [], 'Body', [] + + def underline(self, match, context, nextstate): + """Section title.""" + lineno = self.statemachine.abslineno() + if not self.statemachine.matchtitles: + blocktext = context[0] + '\n' + self.statemachine.line + msg = self.statemachine.memo.reporter.severe( + 'Unexpected section title at line %s.' % lineno, '', + nodes.literal_block(blocktext, blocktext)) + self.statemachine.node += msg + return [], nextstate, [] + title = context[0].rstrip() + underline = match.string.rstrip() + source = title + '\n' + underline + if len(title) > len(underline): + blocktext = context[0] + '\n' + self.statemachine.line + msg = self.statemachine.memo.reporter.info( + 'Title underline too short at line %s.' % lineno, '', + nodes.literal_block(blocktext, blocktext)) + self.statemachine.node += msg + style = underline[0] + context[:] = [] + self.section(title, source, style, lineno - 1) + return [], nextstate, [] + + def text(self, match, context, nextstate): + """Paragraph.""" + startline = self.statemachine.abslineno() - 1 + msg = None + try: + block = self.statemachine.getunindented() + except statemachine.UnexpectedIndentationError, instance: + block, lineno = instance.args + msg = self.statemachine.memo.reporter.error( + 'Unexpected indentation at line %s.' % lineno) + lines = context + block + paragraph, literalnext = self.paragraph(lines, startline) + self.statemachine.node += paragraph + self.statemachine.node += msg + if literalnext: + try: + self.statemachine.nextline() + except IndexError: + pass + self.statemachine.node += self.literal_block() + return [], nextstate, [] + + def literal_block(self): + """Return a list of nodes.""" + indented, indent, offset, blankfinish = \ + self.statemachine.getindented() + nodelist = [] + while indented and not indented[-1].strip(): + indented.pop() + if indented: + data = '\n'.join(indented) + nodelist.append(nodes.literal_block(data, data)) + if not blankfinish: + nodelist.append(self.unindentwarning()) + else: + nodelist.append(self.statemachine.memo.reporter.warning( + 'Literal block expected at line %s; none found.' + % self.statemachine.abslineno())) + return nodelist + + def definition_list_item(self, termline): + indented, indent, lineoffset, blankfinish = \ + self.statemachine.getindented() + definitionlistitem = nodes.definition_list_item('\n'.join(termline + + indented)) + termlist, messages = self.term(termline, + self.statemachine.abslineno() - 1) + definitionlistitem += termlist + definition = nodes.definition('', *messages) + definitionlistitem += definition + if termline[0][-2:] == '::': + definition += self.statemachine.memo.reporter.info( + 'Blank line missing before literal block? Interpreted as a ' + 'definition list item. At line %s.' % (lineoffset + 1)) + self.nestedparse(indented, inputoffset=lineoffset, node=definition) + return definitionlistitem, blankfinish + + def term(self, lines, lineno): + """Return a definition_list's term and optional classifier.""" + assert len(lines) == 1 + nodelist = [] + parts = lines[0].split(' : ', 1) # split into 1 or 2 parts + termpart = parts[0].rstrip() + textnodes, messages = self.inline_text(termpart, lineno) + nodelist = [nodes.term(termpart, '', *textnodes)] + if len(parts) == 2: + classifierpart = parts[1].lstrip() + textnodes, cpmessages = self.inline_text(classifierpart, lineno) + nodelist.append(nodes.classifier(classifierpart, '', *textnodes)) + messages += cpmessages + return nodelist, messages + + +class SpecializedText(Text): + + """ + Superclass for second and subsequent lines of Text-variants. + + All transition methods are disabled. Override individual methods in + subclasses to re-enable. + """ + + def eof(self, context): + """Incomplete construct.""" + return [] + + def invalid_input(self, match=None, context=None, nextstate=None): + """Not a compound element member. Abort this state machine.""" + raise EOFError + + blank = invalid_input + indent = invalid_input + underline = invalid_input + text = invalid_input + + +class Definition(SpecializedText): + + """Second line of potential definition_list_item.""" + + def eof(self, context): + """Not a definition.""" + self.statemachine.previousline(2) # back up so parent SM can reassess + return [] + + def indent(self, match, context, nextstate): + """Definition list item.""" + definitionlistitem, blankfinish = self.definition_list_item(context) + self.statemachine.node += definitionlistitem + self.blankfinish = blankfinish + return [], 'DefinitionList', [] + + +class Line(SpecializedText): + + """Second line of over- & underlined section title or transition marker.""" + + eofcheck = 1 # @@@ ??? + """Set to 0 while parsing sections, so that we don't catch the EOF.""" + + def eof(self, context): + """Transition marker at end of section or document.""" + if self.eofcheck: # ignore EOFError with sections + transition = nodes.transition(context[0]) + self.statemachine.node += transition + msg = self.statemachine.memo.reporter.error( + 'Document or section may not end with a transition ' + '(line %s).' % (self.statemachine.abslineno() - 1)) + self.statemachine.node += msg + self.eofcheck = 1 + return [] + + def blank(self, match, context, nextstate): + """Transition marker.""" + transition = nodes.transition(context[0]) + if len(self.statemachine.node) == 0: + msg = self.statemachine.memo.reporter.error( + 'Document or section may not begin with a transition ' + '(line %s).' % (self.statemachine.abslineno() - 1)) + self.statemachine.node += msg + elif isinstance(self.statemachine.node[-1], nodes.transition): + msg = self.statemachine.memo.reporter.error( + 'At least one body element must separate transitions; ' + 'adjacent transitions at line %s.' + % (self.statemachine.abslineno() - 1)) + self.statemachine.node += msg + self.statemachine.node += transition + return [], 'Body', [] + + def text(self, match, context, nextstate): + """Potential over- & underlined title.""" + lineno = self.statemachine.abslineno() - 1 + overline = context[0] + title = match.string + underline = '' + try: + underline = self.statemachine.nextline() + except IndexError: + blocktext = overline + '\n' + title + msg = self.statemachine.memo.reporter.severe( + 'Incomplete section title at line %s.' % lineno, '', + nodes.literal_block(blocktext, blocktext)) + self.statemachine.node += msg + return [], 'Body', [] + source = '%s\n%s\n%s' % (overline, title, underline) + overline = overline.rstrip() + underline = underline.rstrip() + if not self.transitions['underline'][0].match(underline): + msg = self.statemachine.memo.reporter.severe( + 'Missing underline for overline at line %s.' % lineno, '', + nodes.literal_block(source, source)) + self.statemachine.node += msg + return [], 'Body', [] + elif overline != underline: + msg = self.statemachine.memo.reporter.severe( + 'Title overline & underline mismatch at ' 'line %s.' % lineno, + '', nodes.literal_block(source, source)) + self.statemachine.node += msg + return [], 'Body', [] + title = title.rstrip() + if len(title) > len(overline): + msg = self.statemachine.memo.reporter.info( + 'Title overline too short at line %s.'% lineno, '', + nodes.literal_block(source, source)) + self.statemachine.node += msg + style = (overline[0], underline[0]) + self.eofcheck = 0 # @@@ not sure this is correct + self.section(title.lstrip(), source, style, lineno + 1) + self.eofcheck = 1 + return [], 'Body', [] + + indent = text # indented title + + def underline(self, match=None, context=None, nextstate=None): + blocktext = context[0] + '\n' + self.statemachine.line + msg = self.statemachine.memo.reporter.error( + 'Invalid section title or transition marker at line %s.' + % (self.statemachine.abslineno() - 1), '', + nodes.literal_block(blocktext, blocktext)) + self.statemachine.node += msg + return [], 'Body', [] + + +stateclasses = [Body, BulletList, DefinitionList, EnumeratedList, FieldList, + OptionList, RFC822List, Explicit, Text, Definition, Line, + SubstitutionDef] +"""Standard set of State classes used to start `RSTStateMachine`.""" + + +def escape2null(text): + """Return a string with escape-backslashes converted to nulls.""" + parts = [] + start = 0 + while 1: + found = text.find('\\', start) + if found == -1: + parts.append(text[start:]) + return ''.join(parts) + parts.append(text[start:found]) + parts.append('\x00' + text[found+1:found+2]) + start = found + 2 # skip character after escape + +def unescape(text, restorebackslashes=0): + """Return a string with nulls removed or restored to backslashes.""" + if restorebackslashes: + return text.translate(RSTState.inline.null2backslash) + else: + return text.translate(RSTState.inline.identity, '\x00') diff --git a/docutils/parsers/rst/tableparser.py b/docutils/parsers/rst/tableparser.py new file mode 100644 index 000000000..7bacf99cd --- /dev/null +++ b/docutils/parsers/rst/tableparser.py @@ -0,0 +1,313 @@ +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +This module defines the `TableParser` class, which parses a plaintext-graphic +table and produces a well-formed data structure suitable for building a CALS +table. + +:Exception class: `TableMarkupError` + +:Function: + `update_dictoflists()`: Merge two dictionaries containing list values. +""" + +__docformat__ = 'reStructuredText' + + +import re + + +class TableMarkupError(Exception): pass + + +class TableParser: + + """ + Parse a plaintext graphic table using `parse()`. + + Here's an example of a plaintext graphic table:: + + +------------------------+------------+----------+----------+ + | Header row, column 1 | Header 2 | Header 3 | Header 4 | + +========================+============+==========+==========+ + | body row 1, column 1 | column 2 | column 3 | column 4 | + +------------------------+------------+----------+----------+ + | body row 2 | Cells may span columns. | + +------------------------+------------+---------------------+ + | body row 3 | Cells may | - Table cells | + +------------------------+ span rows. | - contain | + | body row 4 | | - body elements. | + +------------------------+------------+---------------------+ + + Intersections use '+', row separators use '-' (except for one optional + head/body row separator, which uses '='), and column separators use '|'. + + Passing the above table to the `parse()` method will result in the + following data structure:: + + ([24, 12, 10, 10], + [[(0, 0, 1, ['Header row, column 1']), + (0, 0, 1, ['Header 2']), + (0, 0, 1, ['Header 3']), + (0, 0, 1, ['Header 4'])]], + [[(0, 0, 3, ['body row 1, column 1']), + (0, 0, 3, ['column 2']), + (0, 0, 3, ['column 3']), + (0, 0, 3, ['column 4'])], + [(0, 0, 5, ['body row 2']), + (0, 2, 5, ['Cells may span columns.']), + None, + None], + [(0, 0, 7, ['body row 3']), + (1, 0, 7, ['Cells may', 'span rows.', '']), + (1, 1, 7, ['- Table cells', '- contain', '- body elements.']), + None], + [(0, 0, 9, ['body row 4']), None, None, None]]) + + The first item is a list containing column widths (colspecs). The second + item is a list of head rows, and the third is a list of body rows. Each + row contains a list of cells. Each cell is either None (for a cell unused + because of another cell's span), or a tuple. A cell tuple contains four + items: the number of extra rows used by the cell in a vertical span + (morerows); the number of extra columns used by the cell in a horizontal + span (morecols); the line offset of the first line of the cell contents; + and the cell contents, a list of lines of text. + """ + + headbodyseparatorpat = re.compile(r'\+=[=+]+=\+$') + """Matches the row separator between head rows and body rows.""" + + def parse(self, block): + """ + Analyze the text `block` and return a table data structure. + + Given a plaintext-graphic table in `block` (list of lines of text; no + whitespace padding), parse the table, construct and return the data + necessary to construct a CALS table or equivalent. + + Raise `TableMarkupError` if there is any problem with the markup. + """ + self.setup(block) + self.findheadbodysep() + self.parsegrid() + structure = self.structurefromcells() + return structure + + def setup(self, block): + self.block = block[:] # make a copy; it may be modified + self.bottom = len(block) - 1 + self.right = len(block[0]) - 1 + self.headbodysep = None + self.done = [-1] * len(block[0]) + self.cells = [] + self.rowseps = {0: [0]} + self.colseps = {0: [0]} + + def findheadbodysep(self): + """Look for a head/body row separator line; store the line index.""" + for i in range(len(self.block)): + line = self.block[i] + if self.headbodyseparatorpat.match(line): + if self.headbodysep: + raise TableMarkupError, ( + 'Multiple head/body row separators in table (at line ' + 'offset %s and %s); only one allowed.' + % (self.headbodysep, i)) + else: + self.headbodysep = i + self.block[i] = line.replace('=', '-') + if self.headbodysep == 0 or self.headbodysep == len(self.block) - 1: + raise TableMarkupError, ( + 'The head/body row separator may not be the first or last ' + 'line of the table.' % (self.headbodysep, i)) + + def parsegrid(self): + """ + Start with a queue of upper-left corners, containing the upper-left + corner of the table itself. Trace out one rectangular cell, remember + it, and add its upper-right and lower-left corners to the queue of + potential upper-left corners of further cells. Process the queue in + top-to-bottom order, keeping track of how much of each text column has + been seen. + + We'll end up knowing all the row and column boundaries, cell positions + and their dimensions. + """ + corners = [(0, 0)] + while corners: + top, left = corners.pop(0) + if top == self.bottom or left == self.right \ + or top <= self.done[left]: + continue + result = self.scancell(top, left) + if not result: + continue + bottom, right, rowseps, colseps = result + update_dictoflists(self.rowseps, rowseps) + update_dictoflists(self.colseps, colseps) + self.markdone(top, left, bottom, right) + cellblock = self.getcellblock(top, left, bottom, right) + self.cells.append((top, left, bottom, right, cellblock)) + corners.extend([(top, right), (bottom, left)]) + corners.sort() + if not self.checkparsecomplete(): + raise TableMarkupError, 'Malformed table; parse incomplete.' + + def markdone(self, top, left, bottom, right): + """For keeping track of how much of each text column has been seen.""" + before = top - 1 + after = bottom - 1 + for col in range(left, right): + assert self.done[col] == before + self.done[col] = after + + def checkparsecomplete(self): + """Each text column should have been completely seen.""" + last = self.bottom - 1 + for col in range(self.right): + if self.done[col] != last: + return None + return 1 + + def getcellblock(self, top, left, bottom, right): + """Given the corners, extract the text of a cell.""" + cellblock = [] + margin = right + for lineno in range(top + 1, bottom): + line = self.block[lineno][left + 1 : right].rstrip() + cellblock.append(line) + if line: + margin = margin and min(margin, len(line) - len(line.lstrip())) + if 0 < margin < right: + cellblock = [line[margin:] for line in cellblock] + return cellblock + + def scancell(self, top, left): + """Starting at the top-left corner, start tracing out a cell.""" + assert self.block[top][left] == '+' + result = self.scanright(top, left) + return result + + def scanright(self, top, left): + """ + Look for the top-right corner of the cell, and make note of all column + boundaries ('+'). + """ + colseps = {} + line = self.block[top] + for i in range(left + 1, self.right + 1): + if line[i] == '+': + colseps[i] = [top] + result = self.scandown(top, left, i) + if result: + bottom, rowseps, newcolseps = result + update_dictoflists(colseps, newcolseps) + return bottom, i, rowseps, colseps + elif line[i] != '-': + return None + return None + + def scandown(self, top, left, right): + """ + Look for the bottom-right corner of the cell, making note of all row + boundaries. + """ + rowseps = {} + for i in range(top + 1, self.bottom + 1): + if self.block[i][right] == '+': + rowseps[i] = [right] + result = self.scanleft(top, left, i, right) + if result: + newrowseps, colseps = result + update_dictoflists(rowseps, newrowseps) + return i, rowseps, colseps + elif self.block[i][right] != '|': + return None + return None + + def scanleft(self, top, left, bottom, right): + """ + Noting column boundaries, look for the bottom-left corner of the cell. + It must line up with the starting point. + """ + colseps = {} + line = self.block[bottom] + for i in range(right - 1, left, -1): + if line[i] == '+': + colseps[i] = [bottom] + elif line[i] != '-': + return None + if line[left] != '+': + return None + result = self.scanup(top, left, bottom, right) + if result is not None: + rowseps = result + return rowseps, colseps + return None + + def scanup(self, top, left, bottom, right): + """Noting row boundaries, see if we can return to the starting point.""" + rowseps = {} + for i in range(bottom - 1, top, -1): + if self.block[i][left] == '+': + rowseps[i] = [left] + elif self.block[i][left] != '|': + return None + return rowseps + + def structurefromcells(self): + """ + From the data colledted by `scancell()`, convert to the final data + structure. + """ + rowseps = self.rowseps.keys() # list of row boundaries + rowseps.sort() + rowindex = {} + for i in range(len(rowseps)): + rowindex[rowseps[i]] = i # row boundary -> row number mapping + colseps = self.colseps.keys() # list of column boundaries + colseps.sort() + colindex = {} + for i in range(len(colseps)): + colindex[colseps[i]] = i # column boundary -> col number mapping + colspecs = [(colseps[i] - colseps[i - 1] - 1) + for i in range(1, len(colseps))] # list of column widths + # prepare an empty table with the correct number of rows & columns + onerow = [None for i in range(len(colseps) - 1)] + rows = [onerow[:] for i in range(len(rowseps) - 1)] + # keep track of # of cells remaining; should reduce to zero + remaining = (len(rowseps) - 1) * (len(colseps) - 1) + for top, left, bottom, right, block in self.cells: + rownum = rowindex[top] + colnum = colindex[left] + assert rows[rownum][colnum] is None, ( + 'Cell (row %s, column %s) already used.' + % (rownum + 1, colnum + 1)) + morerows = rowindex[bottom] - rownum - 1 + morecols = colindex[right] - colnum - 1 + remaining -= (morerows + 1) * (morecols + 1) + # write the cell into the table + rows[rownum][colnum] = (morerows, morecols, top + 1, block) + assert remaining == 0, 'Unused cells remaining.' + if self.headbodysep: # separate head rows from body rows + numheadrows = rowindex[self.headbodysep] + headrows = rows[:numheadrows] + bodyrows = rows[numheadrows:] + else: + headrows = [] + bodyrows = rows + return (colspecs, headrows, bodyrows) + + +def update_dictoflists(master, newdata): + """ + Extend the list values of `master` with those from `newdata`. + + Both parameters must be dictionaries containing list values. + """ + for key, values in newdata.items(): + master.setdefault(key, []).extend(values) diff --git a/docutils/readers/__init__.py b/docutils/readers/__init__.py new file mode 100644 index 000000000..9b8d38654 --- /dev/null +++ b/docutils/readers/__init__.py @@ -0,0 +1,118 @@ +#! /usr/bin/env python + +""" +:Authors: David Goodger; Ueli Schlaepfer +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +This package contains Docutils Reader modules. +""" + +__docformat__ = 'reStructuredText' + + +import sys +from docutils import nodes, utils +from docutils.transforms import universal + + +class Reader: + + """ + Abstract base class for docutils Readers. + + Each reader module or package must export a subclass also called 'Reader'. + + The three steps of a Reader's responsibility are defined: `scan()`, + `parse()`, and `transform()`. Call `read()` to process a document. + """ + + transforms = () + """Ordered tuple of transform classes (each with a ``transform()`` method). + Populated by subclasses. `Reader.transform()` instantiates & runs them.""" + + def __init__(self, reporter, languagecode): + """ + Initialize the Reader instance. + + Several instance attributes are defined with dummy initial values. + Subclasses may use these attributes as they wish. + """ + + self.languagecode = languagecode + """Default language for new documents.""" + + self.reporter = reporter + """A `utils.Reporter` instance shared by all doctrees.""" + + self.source = None + """Path to the source of raw input.""" + + self.input = None + """Raw text input; either a single string or, for more complex cases, + a collection of strings.""" + + self.transforms = tuple(self.transforms) + """Instance copy of `Reader.transforms`; may be modified by client.""" + + def read(self, source, parser): + self.source = source + self.parser = parser + self.scan() # may modify self.parser, depending on input + self.parse() + self.transform() + return self.document + + def scan(self): + """Override to read `self.input` from `self.source`.""" + raise NotImplementedError('subclass must override this method') + + def scanfile(self, source): + """ + Scan a single file and return the raw data. + + Parameter `source` may be: + + (a) a file-like object, which is read directly; + (b) a path to a file, which is opened and then read; or + (c) `None`, which implies `sys.stdin`. + """ + if hasattr(source, 'read'): + return source.read() + if self.source: + return open(source).read() + return sys.stdin.read() + + def parse(self): + """Parse `self.input` into a document tree.""" + self.document = self.newdocument() + self.parser.parse(self.input, self.document) + + def transform(self): + """Run all of the transforms defined for this Reader.""" + for xclass in (universal.first_reader_transforms + + tuple(self.transforms) + + universal.last_reader_transforms): + xclass(self.document).transform() + + def newdocument(self, languagecode=None): + """Create and return a new empty document tree (root node).""" + document = nodes.document( + languagecode=(languagecode or self.languagecode), + reporter=self.reporter) + document['source'] = self.source + return document + + +_reader_aliases = {'rtxt': 'standalone', + 'restructuredtext': 'standalone'} + +def get_reader_class(readername): + """Return the Reader class from the `readername` module.""" + readername = readername.lower() + if _reader_aliases.has_key(readername): + readername = _reader_aliases[readername] + module = __import__(readername, globals(), locals()) + return module.Reader diff --git a/docutils/readers/standalone.py b/docutils/readers/standalone.py new file mode 100644 index 000000000..27c0ded6b --- /dev/null +++ b/docutils/readers/standalone.py @@ -0,0 +1,34 @@ +#! /usr/bin/env python + +""" +:Authors: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Standalone file Reader for the reStructuredText markup syntax. +""" + +__docformat__ = 'reStructuredText' + + +import sys +from docutils import readers +from docutils.transforms import frontmatter, references +from docutils.parsers.rst import Parser + + +class Reader(readers.Reader): + + document = None + """A single document tree.""" + + transforms = (references.Substitutions, + frontmatter.DocTitle, + frontmatter.DocInfo, + references.Footnotes, + references.Hyperlinks,) + + def scan(self): + self.input = self.scanfile(self.source) diff --git a/docutils/roman.py b/docutils/roman.py new file mode 100644 index 000000000..5972c3cef --- /dev/null +++ b/docutils/roman.py @@ -0,0 +1,81 @@ +"""Convert to and from Roman numerals""" + +__author__ = "Mark Pilgrim (f8dy@diveintopython.org)" +__version__ = "1.4" +__date__ = "8 August 2001" +__copyright__ = """Copyright (c) 2001 Mark Pilgrim + +This program is part of "Dive Into Python", a free Python tutorial for +experienced programmers. Visit http://diveintopython.org/ for the +latest version. + +This program is free software; you can redistribute it and/or modify +it under the terms of the Python 2.1.1 license, available at +http://www.python.org/2.1.1/license.html +""" + +import re + +#Define exceptions +class RomanError(Exception): pass +class OutOfRangeError(RomanError): pass +class NotIntegerError(RomanError): pass +class InvalidRomanNumeralError(RomanError): pass + +#Define digit mapping +romanNumeralMap = (('M', 1000), + ('CM', 900), + ('D', 500), + ('CD', 400), + ('C', 100), + ('XC', 90), + ('L', 50), + ('XL', 40), + ('X', 10), + ('IX', 9), + ('V', 5), + ('IV', 4), + ('I', 1)) + +def toRoman(n): + """convert integer to Roman numeral""" + if not (0 < n < 5000): + raise OutOfRangeError, "number out of range (must be 1..4999)" + if int(n) <> n: + raise NotIntegerError, "decimals can not be converted" + + result = "" + for numeral, integer in romanNumeralMap: + while n >= integer: + result += numeral + n -= integer + return result + +#Define pattern to detect valid Roman numerals +romanNumeralPattern = re.compile(''' + ^ # beginning of string + M{0,4} # thousands - 0 to 4 M's + (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's), + # or 500-800 (D, followed by 0 to 3 C's) + (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's), + # or 50-80 (L, followed by 0 to 3 X's) + (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's), + # or 5-8 (V, followed by 0 to 3 I's) + $ # end of string + ''' ,re.VERBOSE) + +def fromRoman(s): + """convert Roman numeral to integer""" + if not s: + raise InvalidRomanNumeralError, 'Input can not be blank' + if not romanNumeralPattern.search(s): + raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s + + result = 0 + index = 0 + for numeral, integer in romanNumeralMap: + while s[index:index+len(numeral)] == numeral: + result += integer + index += len(numeral) + return result + diff --git a/docutils/statemachine.py b/docutils/statemachine.py new file mode 100644 index 000000000..9410cb956 --- /dev/null +++ b/docutils/statemachine.py @@ -0,0 +1,1076 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Version: 1.3 +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +A finite state machine specialized for regular-expression-based text filters, +this module defines the following classes: + +- `StateMachine`, a state machine +- `State`, a state superclass +- `StateMachineWS`, a whitespace-sensitive version of `StateMachine` +- `StateWS`, a state superclass for use with `StateMachineWS` +- `SearchStateMachine`, uses `re.search()` instead of `re.match()` +- `SearchStateMachineWS`, uses `re.search()` instead of `re.match()` + +Exception classes: + +- `UnknownStateError` +- `DuplicateStateError` +- `UnknownTransitionError` +- `DuplicateTransitionError` +- `TransitionPatternNotFound` +- `TransitionMethodNotFound` +- `UnexpectedIndentationError` +- `TransitionCorrection`: Raised to switch to another transition. + +Functions: + +- `string2lines()`: split a multi-line string into a list of one-line strings +- `extractindented()`: return indented lines with minimum indentation removed + +How To Use This Module +====================== +(See the individual classes, methods, and attributes for details.) + +1. Import it: ``import statemachine`` or ``from statemachine import ...``. + You will also need to ``import re``. + +2. Derive a subclass of `State` (or `StateWS`) for each state in your state + machine:: + + class MyState(statemachine.State): + + Within the state's class definition: + + a) Include a pattern for each transition, in `State.patterns`:: + + patterns = {'atransition': r'pattern', ...} + + b) Include a list of initial transitions to be set up automatically, in + `State.initialtransitions`:: + + initialtransitions = ['atransition', ...] + + c) Define a method for each transition, with the same name as the + transition pattern:: + + def atransition(self, match, context, nextstate): + # do something + result = [...] # a list + return context, nextstate, result + # context, nextstate may be altered + + Transition methods may raise an `EOFError` to cut processing short. + + d) You may wish to override the `State.bof()` and/or `State.eof()` implicit + transition methods, which handle the beginning- and end-of-file. + + e) In order to handle nested processing, you may wish to override the + attributes `State.nestedSM` and/or `State.nestedSMkwargs`. + + If you are using `StateWS` as a base class, in order to handle nested + indented blocks, you may wish to: + + - override the attributes `StateWS.indentSM`, `StateWS.indentSMkwargs`, + `StateWS.knownindentSM`, and/or `StateWS.knownindentSMkwargs`; + - override the `StateWS.blank()` method; and/or + - override or extend the `StateWS.indent()`, `StateWS.knownindent()`, + and/or `StateWS.firstknownindent()` methods. + +3. Create a state machine object:: + + sm = StateMachine(stateclasses=[MyState, ...], initialstate='MyState') + +4. Obtain the input text, which needs to be converted into a tab-free list of + one-line strings. For example, to read text from a file called + 'inputfile':: + + inputstring = open('inputfile').read() + inputlines = statemachine.string2lines(inputstring) + +6. Run the state machine on the input text and collect the results, a list:: + + results = sm.run(inputlines) + +7. Remove any lingering circular references:: + + sm.unlink() +""" + +__docformat__ = 'restructuredtext' + +import sys, re, string + + +class StateMachine: + + """ + A finite state machine for text filters using regular expressions. + + The input is provided in the form of a list of one-line strings (no + newlines). States are subclasses of the `State` class. Transitions consist + of regular expression patterns and transition methods, and are defined in + each state. + + The state machine is started with the `run()` method, which returns the + results of processing in a list. + """ + + def __init__(self, stateclasses, initialstate, debug=0): + """ + Initialize a `StateMachine` object; add state objects. + + Parameters: + + - `stateclasses`: a list of `State` (sub)classes. + - `initialstate`: a string, the class name of the initial state. + - `debug`: a boolean; produce verbose output if true (nonzero). + """ + + self.inputlines = None + """List of strings (without newlines). Filled by `self.run()`.""" + + self.inputoffset = 0 + """Offset of `self.inputlines` from the beginning of the file.""" + + self.line = None + """Current input line.""" + + self.lineoffset = None + """Current input line offset from beginning of `self.inputlines`.""" + + self.debug = debug + """Debugging mode on/off.""" + + self.initialstate = initialstate + """The name of the initial state (key to `self.states`).""" + + self.currentstate = initialstate + """The name of the current state (key to `self.states`).""" + + self.states = {} + """Mapping of {state_name: State_object}.""" + + self.addstates(stateclasses) + + def unlink(self): + """Remove circular references to objects no longer required.""" + for state in self.states.values(): + state.unlink() + self.states = None + + def run(self, inputlines, inputoffset=0): + """ + Run the state machine on `inputlines`. Return results (a list). + + Reset `self.lineoffset` and `self.currentstate`. Run the + beginning-of-file transition. Input one line at a time and check for a + matching transition. If a match is found, call the transition method + and possibly change the state. Store the context returned by the + transition method to be passed on to the next transition matched. + Accumulate the results returned by the transition methods in a list. + Run the end-of-file transition. Finally, return the accumulated + results. + + Parameters: + + - `inputlines`: a list of strings without newlines. + - `inputoffset`: the line offset of `inputlines` from the beginning of + the file. + """ + self.inputlines = inputlines + self.inputoffset = inputoffset + self.lineoffset = -1 + self.currentstate = self.initialstate + if self.debug: + print >>sys.stderr, ('\nStateMachine.run: inputlines:\n| %s' % + '\n| '.join(self.inputlines)) + context = None + results = [] + state = self.getstate() + try: + if self.debug: + print >>sys.stderr, ('\nStateMachine.run: bof transition') + context, result = state.bof(context) + results.extend(result) + while 1: + try: + self.nextline() + if self.debug: + print >>sys.stderr, ('\nStateMachine.run: line:\n| %s' + % self.line) + except IndexError: + break + try: + context, nextstate, result = self.checkline(context, state) + except EOFError: + break + state = self.getstate(nextstate) + results.extend(result) + if self.debug: + print >>sys.stderr, ('\nStateMachine.run: eof transition') + result = state.eof(context) + results.extend(result) + except: + self.error() + raise + return results + + def getstate(self, nextstate=None): + """ + Return current state object; set it first if `nextstate` given. + + Parameter `nextstate`: a string, the name of the next state. + + Exception: `UnknownStateError` raised if `nextstate` unknown. + """ + if nextstate: + if self.debug and nextstate != self.currentstate: + print >>sys.stderr, \ + ('\nStateMachine.getstate: Changing state from ' + '"%s" to "%s" (input line %s).' + % (self.currentstate, nextstate, self.abslineno())) + self.currentstate = nextstate + try: + return self.states[self.currentstate] + except KeyError: + raise UnknownStateError(self.currentstate) + + def nextline(self, n=1): + """Load `self.line` with the `n`'th next line and return it.""" + self.lineoffset += n + self.line = self.inputlines[self.lineoffset] + return self.line + + def nextlineblank(self): + """Return 1 if the next line is blank or non-existant.""" + try: + return not self.inputlines[self.lineoffset + 1].strip() + except IndexError: + return 1 + + def ateof(self): + """Return 1 if the input is at or past end-of-file.""" + return self.lineoffset >= len(self.inputlines) - 1 + + def atbof(self): + """Return 1 if the input is at or before beginning-of-file.""" + return self.lineoffset <= 0 + + def previousline(self, n=1): + """Load `self.line` with the `n`'th previous line and return it.""" + self.lineoffset -= n + self.line = self.inputlines[self.lineoffset] + return self.line + + def gotoline(self, lineoffset): + """Jump to absolute line offset `lineoffset`, load and return it.""" + self.lineoffset = lineoffset - self.inputoffset + self.line = self.inputlines[self.lineoffset] + return self.line + + def abslineoffset(self): + """Return line offset of current line, from beginning of file.""" + return self.lineoffset + self.inputoffset + + def abslineno(self): + """Return line number of current line (counting from 1).""" + return self.lineoffset + self.inputoffset + 1 + + def gettextblock(self): + """Return a contiguous block of text.""" + block = [] + for line in self.inputlines[self.lineoffset:]: + if not line.strip(): + break + block.append(line) + self.nextline(len(block) - 1) # advance to last line of block + return block + + def getunindented(self): + """ + Return a contiguous, flush-left block of text. + + Raise `UnexpectedIndentationError` if an indented line is encountered + before the text block ends (with a blank line). + """ + block = [self.line] + for line in self.inputlines[self.lineoffset + 1:]: + if not line.strip(): + break + if line[0] == ' ': + self.nextline(len(block) - 1) # advance to last line of block + raise UnexpectedIndentationError(block, self.abslineno() + 1) + block.append(line) + self.nextline(len(block) - 1) # advance to last line of block + return block + + def checkline(self, context, state): + """ + Examine one line of input for a transition match. + + Parameters: + + - `context`: application-dependent storage. + - `state`: a `State` object, the current state. + + Return the values returned by the transition method: + + - context: possibly modified from the parameter `context`; + - next state name (`State` subclass name), or ``None`` if no match; + - the result output of the transition, a list. + """ + if self.debug: + print >>sys.stdout, ('\nStateMachine.checkline: ' + 'context "%s", state "%s"' % + (context, state.__class__.__name__)) + context, nextstate, result = self.matchtransition(context, state) + return context, nextstate, result + + def matchtransition(self, context, state): + """ + Try to match the current line to a transition & execute its method. + + Parameters: + + - `context`: application-dependent storage. + - `state`: a `State` object, the current state. + + Return the values returned by the transition method: + + - context: possibly modified from the parameter `context`, unchanged + if no match; + - next state name (`State` subclass name), or ``None`` if no match; + - the result output of the transition, a list (empty if no match). + """ + if self.debug: + print >>sys.stderr, ( + '\nStateMachine.matchtransition: state="%s", transitions=%r.' + % (state.__class__.__name__, state.transitionorder)) + for name in state.transitionorder: + while 1: + pattern, method, nextstate = state.transitions[name] + if self.debug: + print >>sys.stderr, ( + '\nStateMachine.matchtransition: Trying transition ' + '"%s" in state "%s".' + % (name, state.__class__.__name__)) + match = self.match(pattern) + if match: + if self.debug: + print >>sys.stderr, ( + '\nStateMachine.matchtransition: Matched ' + 'transition "%s" in state "%s".' + % (name, state.__class__.__name__)) + try: + return method(match, context, nextstate) + except TransitionCorrection, detail: + name = str(detail) + continue # try again with new transition name + break + else: + return context, None, [] # no match + + def match(self, pattern): + """ + Return the result of a regular expression match. + + Parameter `pattern`: an `re` compiled regular expression. + """ + return pattern.match(self.line) + + def addstate(self, stateclass): + """ + Initialize & add a `stateclass` (`State` subclass) object. + + Exception: `DuplicateStateError` raised if `stateclass` already added. + """ + statename = stateclass.__name__ + if self.states.has_key(statename): + raise DuplicateStateError(statename) + self.states[statename] = stateclass(self, self.debug) + + def addstates(self, stateclasses): + """ + Add `stateclasses` (a list of `State` subclasses). + """ + for stateclass in stateclasses: + self.addstate(stateclass) + + def error(self): + """Report error details.""" + type, value, module, line, function = _exceptiondata() + print >>sys.stderr, '%s: %s' % (type, value) + print >>sys.stderr, 'input line %s' % (self.abslineno()) + print >>sys.stderr, ('module %s, line %s, function %s' + % (module, line, function)) + + +class State: + + """ + State superclass. Contains a list of transitions, and transition methods. + + Transition methods all have the same signature. They take 3 parameters: + + - An `re` match object. ``match.string`` contains the matched input line, + ``match.start()`` gives the start index of the match, and + ``match.end()`` gives the end index. + - A context object, whose meaning is application-defined (initial value + ``None``). It can be used to store any information required by the state + machine, and the retured context is passed on to the next transition + method unchanged. + - The name of the next state, a string, taken from the transitions list; + normally it is returned unchanged, but it may be altered by the + transition method if necessary. + + Transition methods all return a 3-tuple: + + - A context object, as (potentially) modified by the transition method. + - The next state name (a return value of ``None`` means no state change). + - The processing result, a list, which is accumulated by the state + machine. + + Transition methods may raise an `EOFError` to cut processing short. + + There are two implicit transitions, and corresponding transition methods + are defined: `bof()` handles the beginning-of-file, and `eof()` handles + the end-of-file. These methods have non-standard signatures and return + values. `bof()` returns the initial context and results, and may be used + to return a header string, or do any other processing needed. `eof()` + should handle any remaining context and wrap things up; it returns the + final processing result. + + Typical applications need only subclass `State` (or a subclass), set the + `patterns` and `initialtransitions` class attributes, and provide + corresponding transition methods. The default object initialization will + take care of constructing the list of transitions. + """ + + patterns = None + """ + {Name: pattern} mapping, used by `maketransition()`. Each pattern may + be a string or a compiled `re` pattern. Override in subclasses. + """ + + initialtransitions = None + """ + A list of transitions to initialize when a `State` is instantiated. + Each entry is either a transition name string, or a (transition name, next + state name) pair. See `maketransitions()`. Override in subclasses. + """ + + nestedSM = None + """ + The `StateMachine` class for handling nested processing. + + If left as ``None``, `nestedSM` defaults to the class of the state's + controlling state machine. Override it in subclasses to avoid the default. + """ + + nestedSMkwargs = None + """ + Keyword arguments dictionary, passed to the `nestedSM` constructor. + + Two keys must have entries in the dictionary: + + - Key 'stateclasses' must be set to a list of `State` classes. + - Key 'initialstate' must be set to the name of the initial state class. + + If `nestedSMkwargs` is left as ``None``, 'stateclasses' defaults to the + class of the current state, and 'initialstate' defaults to the name of the + class of the current state. Override in subclasses to avoid the defaults. + """ + + def __init__(self, statemachine, debug=0): + """ + Initialize a `State` object; make & add initial transitions. + + Parameters: + + - `statemachine`: the controlling `StateMachine` object. + - `debug`: a boolean; produce verbose output if true (nonzero). + """ + + self.transitionorder = [] + """A list of transition names in search order.""" + + self.transitions = {} + """ + A mapping of transition names to 3-tuples containing + (compiled_pattern, transition_method, next_state_name). Initialized as + an instance attribute dynamically (instead of as a class attribute) + because it may make forward references to patterns and methods in this + or other classes. + """ + + if self.initialtransitions: + names, transitions = self.maketransitions(self.initialtransitions) + self.addtransitions(names, transitions) + + self.statemachine = statemachine + """A reference to the controlling `StateMachine` object.""" + + self.debug = debug + """Debugging mode on/off.""" + + if self.nestedSM is None: + self.nestedSM = self.statemachine.__class__ + if self.nestedSMkwargs is None: + self.nestedSMkwargs = {'stateclasses': [self.__class__], + 'initialstate': self.__class__.__name__} + + def unlink(self): + """Remove circular references to objects no longer required.""" + self.statemachine = None + + def addtransitions(self, names, transitions): + """ + Add a list of transitions to the start of the transition list. + + Parameters: + + - `names`: a list of transition names. + - `transitions`: a mapping of names to transition tuples. + + Exceptions: `DuplicateTransitionError`, `UnknownTransitionError`. + """ + for name in names: + if self.transitions.has_key(name): + raise DuplicateTransitionError(name) + if not transitions.has_key(name): + raise UnknownTransitionError(name) + self.transitionorder[:0] = names + self.transitions.update(transitions) + + def addtransition(self, name, transition): + """ + Add a transition to the start of the transition list. + + Parameter `transition`: a ready-made transition 3-tuple. + + Exception: `DuplicateTransitionError`. + """ + if self.transitions.has_key(name): + raise DuplicateTransitionError(name) + self.transitionorder[:0] = [name] + self.transitions[name] = transition + + def removetransition(self, name): + """ + Remove a transition by `name`. + + Exception: `UnknownTransitionError`. + """ + try: + del self.transitions[name] + self.transitionorder.remove(name) + except: + raise UnknownTransitionError(name) + + def maketransition(self, name, nextstate=None): + """ + Make & return a transition tuple based on `name`. + + This is a convenience function to simplify transition creation. + + Parameters: + + - `name`: a string, the name of the transition pattern & method. This + `State` object must have a method called '`name`', and a dictionary + `self.patterns` containing a key '`name`'. + - `nextstate`: a string, the name of the next `State` object for this + transition. A value of ``None`` (or absent) implies no state change + (i.e., continue with the same state). + + Exceptions: `TransitionPatternNotFound`, `TransitionMethodNotFound`. + """ + if nextstate is None: + nextstate = self.__class__.__name__ + try: + pattern = self.patterns[name] + if not hasattr(pattern, 'match'): + pattern = re.compile(pattern) + except KeyError: + raise TransitionPatternNotFound( + '%s.patterns[%r]' % (self.__class__.__name__, name)) + try: + method = getattr(self, name) + except AttributeError: + raise TransitionMethodNotFound( + '%s.%s' % (self.__class__.__name__, name)) + return (pattern, method, nextstate) + + def maketransitions(self, namelist): + """ + Return a list of transition names and a transition mapping. + + Parameter `namelist`: a list, where each entry is either a + transition name string, or a 1- or 2-tuple (transition name, optional + next state name). + """ + stringtype = type('') + names = [] + transitions = {} + for namestate in namelist: + if type(namestate) is stringtype: + transitions[namestate] = self.maketransition(namestate) + names.append(namestate) + else: + transitions[namestate[0]] = self.maketransition(*namestate) + names.append(namestate[0]) + return names, transitions + + def bof(self, context): + """ + Handle beginning-of-file. Return unchanged `context`, empty result. + + Override in subclasses. + + Parameter `context`: application-defined storage. + """ + return context, [] + + def eof(self, context): + """ + Handle end-of-file. Return empty result. + + Override in subclasses. + + Parameter `context`: application-defined storage. + """ + return [] + + def nop(self, match, context, nextstate): + """ + A "do nothing" transition method. + + Return unchanged `context` & `nextstate`, empty result. Useful for + simple state changes (actionless transitions). + """ + return context, nextstate, [] + + +class StateMachineWS(StateMachine): + + """ + `StateMachine` subclass specialized for whitespace recognition. + + The transitions 'blank' (for blank lines) and 'indent' (for indented text + blocks) are defined implicitly, and are checked before any other + transitions. The companion `StateWS` class defines default transition + methods. There are three methods provided for extracting indented text + blocks: + + - `getindented()`: use when the indent is unknown. + - `getknownindented()`: use when the indent is known for all lines. + - `getfirstknownindented()`: use when only the first line's indent is + known. + """ + + spaces = re.compile(' *') + """Indentation recognition pattern.""" + + def checkline(self, context, state): + """ + Examine one line of input for whitespace first, then transitions. + + Extends `StateMachine.checkline()`. + """ + if self.debug: + print >>sys.stdout, ('\nStateMachineWS.checkline: ' + 'context "%s", state "%s"' % + (context, state.__class__.__name__)) + context, nextstate, result = self.checkwhitespace(context, state) + if nextstate == '': # no whitespace match + return StateMachine.checkline(self, context, state) + else: + return context, nextstate, result + + def checkwhitespace(self, context, state): + """ + Check for a blank line or increased indent. Call the state's + transition method if a match is found. + + Parameters: + + - `context`: application-dependent storage. + - `state`: a `State` object, the current state. + + Return the values returned by the transition method: + + - context, possibly modified from the parameter `context`; + - next state name (`State` subclass name), or '' (empty string) if no + match; + - the result output of the transition, a list (empty if no match). + """ + if self.debug: + print >>sys.stdout, ('\nStateMachineWS.checkwhitespace: ' + 'context "%s", state "%s"' % + (context, state.__class__.__name__)) + match = self.spaces.match(self.line) + indent = match.end() + if indent == len(self.line): + if self.debug: + print >>sys.stdout, ('\nStateMachineWS.checkwhitespace: ' + 'implicit transition "blank" matched') + return state.blank(match, context, self.currentstate) + elif indent: + if self.debug: + print >>sys.stdout, ('\nStateMachineWS.checkwhitespace: ' + 'implicit transition "indent" matched') + return state.indent(match, context, self.currentstate) + else: + return context, '', [] # neither blank line nor indented + + def getindented(self, uptoblank=0, stripindent=1): + """ + Return a indented lines of text and info. + + Extract an indented block where the indent is unknown for all lines. + + :Parameters: + - `uptoblank`: Stop collecting at the first blank line if true (1). + - `stripindent`: Strip common leading indent if true (1, default). + + :Return: + - the indented block (a list of lines of text), + - its indent, + - its first line offset from BOF, and + - whether or not it finished with a blank line. + """ + offset = self.abslineoffset() + indented, indent, blankfinish = extractindented( + self.inputlines[self.lineoffset:], uptoblank, stripindent) + if indented: + self.nextline(len(indented) - 1) # advance to last indented line + while indented and not indented[0].strip(): + indented.pop(0) + offset += 1 + return indented, indent, offset, blankfinish + + def getknownindented(self, indent, uptoblank=0, stripindent=1): + """ + Return an indented block and info. + + Extract an indented block where the indent is known for all lines. + Starting with the current line, extract the entire text block with at + least `indent` indentation (which must be whitespace, except for the + first line). + + :Parameters: + - `indent`: The number of indent columns/characters. + - `uptoblank`: Stop collecting at the first blank line if true (1). + - `stripindent`: Strip `indent` characters of indentation if true + (1, default). + + :Return: + - the indented block, + - its first line offset from BOF, and + - whether or not it finished with a blank line. + """ + offset = self.abslineoffset() + indented = [self.line[indent:]] + for line in self.inputlines[self.lineoffset + 1:]: + if line[:indent].strip(): + blankfinish = not indented[-1].strip() and len(indented) > 1 + break + if uptoblank and line.strip(): + blankfinish = 1 + break + if stripindent: + indented.append(line[indent:]) + else: + indented.append(line) + else: + blankfinish = 1 + if indented: + self.nextline(len(indented) - 1) # advance to last indented line + while indented and not indented[0].strip(): + indented.pop(0) + offset += 1 + return indented, offset, blankfinish + + def getfirstknownindented(self, indent, uptoblank=0, stripindent=1): + """ + Return an indented block and info. + + Extract an indented block where the indent is known for the first line + and unknown for all other lines. + + :Parameters: + - `indent`: The first line's indent (# of columns/characters). + - `uptoblank`: Stop collecting at the first blank line if true (1). + - `stripindent`: Strip `indent` characters of indentation if true + (1, default). + + :Return: + - the indented block, + - its indent, + - its first line offset from BOF, and + - whether or not it finished with a blank line. + """ + offset = self.abslineoffset() + indented = [self.line[indent:]] + indented[1:], indent, blankfinish = extractindented( + self.inputlines[self.lineoffset + 1:], uptoblank, stripindent) + self.nextline(len(indented) - 1) # advance to last indented line + while indented and not indented[0].strip(): + indented.pop(0) + offset += 1 + return indented, indent, offset, blankfinish + + +class StateWS(State): + + """ + State superclass specialized for whitespace (blank lines & indents). + + Use this class with `StateMachineWS`. The transition method `blank()` + handles blank lines and `indent()` handles nested indented blocks. + Indented blocks trigger a new state machine to be created by `indent()` + and run. The class of the state machine to be created is in `indentSM`, + and the constructor keyword arguments are in the dictionary + `indentSMkwargs`. + + The methods `knownindent()` and `firstknownindent()` are provided for + indented blocks where the indent (all lines' and first line's only, + respectively) is known to the transition method, along with the attributes + `knownindentSM` and `knownindentSMkwargs`. Neither transition method is + triggered automatically. + """ + + indentSM = None + """ + The `StateMachine` class handling indented text blocks. + + If left as ``None``, `indentSM` defaults to the value of `State.nestedSM`. + Override it in subclasses to avoid the default. + """ + + indentSMkwargs = None + """ + Keyword arguments dictionary, passed to the `indentSM` constructor. + + If left as ``None``, `indentSMkwargs` defaults to the value of + `State.nestedSMkwargs`. Override it in subclasses to avoid the default. + """ + + knownindentSM = None + """ + The `StateMachine` class handling known-indented text blocks. + + If left as ``None``, `knownindentSM` defaults to the value of `indentSM`. + Override it in subclasses to avoid the default. + """ + + knownindentSMkwargs = None + """ + Keyword arguments dictionary, passed to the `knownindentSM` constructor. + + If left as ``None``, `knownindentSMkwargs` defaults to the value of + `indentSMkwargs`. Override it in subclasses to avoid the default. + """ + + def __init__(self, statemachine, debug=0): + """ + Initialize a `StateSM` object; extends `State.__init__()`. + + Check for indent state machine attributes, set defaults if not set. + """ + State.__init__(self, statemachine, debug) + if self.indentSM is None: + self.indentSM = self.nestedSM + if self.indentSMkwargs is None: + self.indentSMkwargs = self.nestedSMkwargs + if self.knownindentSM is None: + self.knownindentSM = self.indentSM + if self.knownindentSMkwargs is None: + self.knownindentSMkwargs = self.indentSMkwargs + + def blank(self, match, context, nextstate): + """Handle blank lines. Does nothing. Override in subclasses.""" + return self.nop(match, context, nextstate) + + def indent(self, match, context, nextstate): + """ + Handle an indented text block. Extend or override in subclasses. + + Recursively run the registered state machine for indented blocks + (`self.indentSM`). + """ + indented, indent, lineoffset, blankfinish = \ + self.statemachine.getindented() + sm = self.indentSM(debug=self.debug, **self.indentSMkwargs) + results = sm.run(indented, inputoffset=lineoffset) + return context, nextstate, results + + def knownindent(self, match, context, nextstate): + """ + Handle a known-indent text block. Extend or override in subclasses. + + Recursively run the registered state machine for known-indent indented + blocks (`self.knownindentSM`). The indent is the length of the match, + ``match.end()``. + """ + indented, lineoffset, blankfinish = \ + self.statemachine.getknownindented(match.end()) + sm = self.knownindentSM(debug=self.debug, **self.knownindentSMkwargs) + results = sm.run(indented, inputoffset=lineoffset) + return context, nextstate, results + + def firstknownindent(self, match, context, nextstate): + """ + Handle an indented text block (first line's indent known). + + Extend or override in subclasses. + + Recursively run the registered state machine for known-indent indented + blocks (`self.knownindentSM`). The indent is the length of the match, + ``match.end()``. + """ + indented, lineoffset, blankfinish = \ + self.statemachine.getfirstknownindented(match.end()) + sm = self.knownindentSM(debug=self.debug, **self.knownindentSMkwargs) + results = sm.run(indented, inputoffset=lineoffset) + return context, nextstate, results + + +class _SearchOverride: + + """ + Mix-in class to override `StateMachine` regular expression behavior. + + Changes regular expression matching, from the default `re.match()` + (succeeds only if the pattern matches at the start of `self.line`) to + `re.search()` (succeeds if the pattern matches anywhere in `self.line`). + When subclassing a `StateMachine`, list this class **first** in the + inheritance list of the class definition. + """ + + def match(self, pattern): + """ + Return the result of a regular expression search. + + Overrides `StateMachine.match()`. + + Parameter `pattern`: `re` compiled regular expression. + """ + return pattern.search(self.line) + + +class SearchStateMachine(_SearchOverride, StateMachine): + """`StateMachine` which uses `re.search()` instead of `re.match()`.""" + pass + + +class SearchStateMachineWS(_SearchOverride, StateMachineWS): + """`StateMachineWS` which uses `re.search()` instead of `re.match()`.""" + pass + + +class UnknownStateError(Exception): pass +class DuplicateStateError(Exception): pass +class UnknownTransitionError(Exception): pass +class DuplicateTransitionError(Exception): pass +class TransitionPatternNotFound(Exception): pass +class TransitionMethodNotFound(Exception): pass +class UnexpectedIndentationError(Exception): pass + + +class TransitionCorrection(Exception): + + """ + Raise from within a transition method to switch to another transition. + """ + + +_whitespace_conversion_table = string.maketrans('\v\f', ' ') + +def string2lines(astring, tabwidth=8, convertwhitespace=0): + """ + Return a list of one-line strings with tabs expanded and no newlines. + + Each tab is expanded with between 1 and `tabwidth` spaces, so that the + next character's index becomes a multiple of `tabwidth` (8 by default). + + Parameters: + + - `astring`: a multi-line string. + - `tabwidth`: the number of columns between tab stops. + - `convertwhitespace`: convert form feeds and vertical tabs to spaces? + """ + if convertwhitespace: + astring = astring.translate(_whitespace_conversion_table) + return [s.expandtabs(tabwidth) for s in astring.splitlines()] + +def extractindented(lines, uptoblank=0, stripindent=1): + """ + Extract and return a list of indented lines of text. + + Collect all lines with indentation, determine the minimum indentation, + remove the minimum indentation from all indented lines (unless + `stripindent` is false), and return them. All lines up to but not + including the first unindented line will be returned. + + :Parameters: + - `lines`: a list of one-line strings without newlines. + - `uptoblank`: Stop collecting at the first blank line if true (1). + - `stripindent`: Strip common leading indent if true (1, default). + + :Return: + - a list of indented lines with mininum indent removed; + - the amount of the indent; + - whether or not the block finished with a blank line or at the end of + `lines`. + """ + source = [] + indent = None + for line in lines: + if line and line[0] != ' ': # line not indented + # block finished properly iff the last indented line was blank + blankfinish = len(source) and not source[-1].strip() + break + stripped = line.lstrip() + if uptoblank and not stripped: # blank line + blankfinish = 1 + break + source.append(line) + if not stripped: # blank line + continue + lineindent = len(line) - len(stripped) + if indent is None: + indent = lineindent + else: + indent = min(indent, lineindent) + else: + blankfinish = 1 # block ends at end of lines + if indent: + if stripindent: + source = [s[indent:] for s in source] + return source, indent, blankfinish + else: + return [], 0, blankfinish + +def _exceptiondata(): + """ + Return exception information: + + - the exception's class name; + - the exception object; + - the name of the file containing the offending code; + - the line number of the offending code; + - the function name of the offending code. + """ + type, value, traceback = sys.exc_info() + while traceback.tb_next: + traceback = traceback.tb_next + code = traceback.tb_frame.f_code + return (type.__name__, value, code.co_filename, traceback.tb_lineno, + code.co_name) diff --git a/docutils/transforms/__init__.py b/docutils/transforms/__init__.py new file mode 100644 index 000000000..6c2ae279f --- /dev/null +++ b/docutils/transforms/__init__.py @@ -0,0 +1,62 @@ +#! /usr/bin/env python +""" +:Authors: David Goodger, Ueli Schlaepfer +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +This package contains modules for standard tree transforms available +to Docutils components. Tree transforms serve a variety of purposes: + +- To tie up certain syntax-specific "loose ends" that remain after the + initial parsing of the input plaintext. These transforms are used to + supplement a limited syntax. + +- To automate the internal linking of the document tree (hyperlink + references, footnote references, etc.). + +- To extract useful information from the document tree. These + transforms may be used to construct (for example) indexes and tables + of contents. + +Each transform is an optional step that a Docutils Reader may choose to +perform on the parsed document, depending on the input context. A Docutils +Reader may also perform Reader-specific transforms before or after performing +these standard transforms. +""" + +__docformat__ = 'reStructuredText' + + +from docutils import languages + + +class TransformError(Exception): pass + + +class Transform: + + """ + Docutils transform component abstract base class. + """ + + def __init__(self, doctree, startnode=None): + """ + Initial setup for in-place document transforms. + """ + + self.doctree = doctree + """The document tree to transform.""" + + self.startnode = startnode + """Node from which to begin the transform. For many transforms which + apply to the document as a whole, `startnode` is not set (i.e. its + value is `None`).""" + + self.language = languages.getlanguage(doctree.languagecode) + """Language module local to this document.""" + + def transform(self): + """Override to transform the document tree.""" + raise NotImplementedError('subclass must override this method') diff --git a/docutils/transforms/components.py b/docutils/transforms/components.py new file mode 100644 index 000000000..2cfe4d2a8 --- /dev/null +++ b/docutils/transforms/components.py @@ -0,0 +1,85 @@ +#! /usr/bin/env python +""" +:Authors: David Goodger, Ueli Schlaepfer +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Transforms related to document components. + +- `Contents`: Used to build a table of contents. +""" + +__docformat__ = 'reStructuredText' + + +import re +from docutils import nodes, utils +from docutils.transforms import TransformError, Transform + + +class Contents(Transform): + + """ + This transform generates a table of contents from the entire document tree + or from a single branch. It locates "section" elements and builds them + into a nested bullet list, which is placed within a "topic". A title is + either explicitly specified, taken from the appropriate language module, + or omitted (local table of contents). The depth may be specified. + Two-way references between the table of contents and section titles are + generated (requires Writer support). + + This transform requires a startnode, which which contains generation + options and provides the location for the generated table of contents (the + startnode is replaced by the table of contents "topic"). + """ + + def transform(self): + topic = nodes.topic(CLASS='contents') + title = self.startnode.details['title'] + if self.startnode.details.has_key('local'): + startnode = self.startnode.parent + # @@@ generate an error if the startnode (directive) not at + # section/document top-level? Drag it up until it is? + while not isinstance(startnode, nodes.Structural): + startnode = startnode.parent + if not title: + title = [] + else: + startnode = self.doctree + if not title: + title = nodes.title('', self.language.labels['contents']) + contents = self.build_contents(startnode) + if len(contents): + topic += title + topic += contents + self.startnode.parent.replace(self.startnode, topic) + else: + self.startnode.parent.remove(self.startnode) + + def build_contents(self, node, level=0): + level += 1 + sections = [] + i = len(node) - 1 + while i >= 0 and isinstance(node[i], nodes.section): + sections.append(node[i]) + i -= 1 + sections.reverse() + entries = [] + for section in sections: + title = section[0] + reference = nodes.reference('', '', refid=section['id'], + *title.getchildren()) + entry = nodes.paragraph('', '', reference) + item = nodes.list_item('', entry) + itemid = self.doctree.set_id(item) + title['refid'] = itemid + if (not self.startnode.details.has_key('depth')) \ + or level < self.startnode.details['depth']: + subsects = self.build_contents(section, level) + item += subsects + entries.append(item) + if entries: + entries = nodes.bullet_list('', *entries) + return entries diff --git a/docutils/transforms/frontmatter.py b/docutils/transforms/frontmatter.py new file mode 100644 index 000000000..0a8068fad --- /dev/null +++ b/docutils/transforms/frontmatter.py @@ -0,0 +1,375 @@ +#! /usr/bin/env python +""" +:Authors: David Goodger, Ueli Schlaepfer +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Transforms related to the front matter of a document (information +found before the main text): + +- `DocTitle`: Used to transform a lone top level section's title to + the document title, and promote a remaining lone top-level section's + title to the document subtitle. + +- `DocInfo`: Used to transform a bibliographic field list into docinfo + elements. +""" + +__docformat__ = 'reStructuredText' + +import re +from docutils import nodes, utils +from docutils.transforms import TransformError, Transform + + +class DocTitle(Transform): + + """ + In reStructuredText_, there is no way to specify a document title + and subtitle explicitly. Instead, we can supply the document title + (and possibly the subtitle as well) implicitly, and use this + two-step transform to "raise" or "promote" the title(s) (and their + corresponding section contents) to the document level. + + 1. If the document contains a single top-level section as its + first non-comment element, the top-level section's title + becomes the document's title, and the top-level section's + contents become the document's immediate contents. The lone + top-level section header must be the first non-comment element + in the document. + + For example, take this input text:: + + ================= + Top-Level Title + ================= + + A paragraph. + + Once parsed, it looks like this:: + + <document> + <section name="top-level title"> + <title> + Top-Level Title + <paragraph> + A paragraph. + + After running the DocTitle transform, we have:: + + <document name="top-level title"> + <title> + Top-Level Title + <paragraph> + A paragraph. + + 2. If step 1 successfully determines the document title, we + continue by checking for a subtitle. + + If the lone top-level section itself contains a single + second-level section as its first non-comment element, that + section's title is promoted to the document's subtitle, and + that section's contents become the document's immediate + contents. Given this input text:: + + ================= + Top-Level Title + ================= + + Second-Level Title + ~~~~~~~~~~~~~~~~~~ + + A paragraph. + + After parsing and running the Section Promotion transform, the + result is:: + + <document name="top-level title"> + <title> + Top-Level Title + <subtitle name="second-level title"> + Second-Level Title + <paragraph> + A paragraph. + + (Note that the implicit hyperlink target generated by the + "Second-Level Title" is preserved on the "subtitle" element + itself.) + + Any comment elements occurring before the document title or + subtitle are accumulated and inserted as the first body elements + after the title(s). + """ + + def transform(self): + if self.promote_document_title(): + self.promote_document_subtitle() + + def promote_document_title(self): + section, index = self.candidate_index() + if index is None: + return None + doctree = self.doctree + # Transfer the section's attributes to the document element (at root): + doctree.attributes.update(section.attributes) + doctree[:] = (section[:1] # section title + + doctree[:index] # everything that was in the document + # before the section + + section[1:]) # everything that was in the section + return 1 + + def promote_document_subtitle(self): + subsection, index = self.candidate_index() + if index is None: + return None + subtitle = nodes.subtitle() + # Transfer the subsection's attributes to the new subtitle: + subtitle.attributes.update(subsection.attributes) + # Transfer the contents of the subsection's title to the subtitle: + subtitle[:] = subsection[0][:] + doctree = self.doctree + doctree[:] = (doctree[:1] # document title + + [subtitle] + + doctree[1:index] # everything that was in the document + # before the section + + subsection[1:]) # everything that was in the subsection + return 1 + + def candidate_index(self): + """ + Find and return the promotion candidate and its index. + + Return (None, None) if no valid candidate was found. + """ + doctree = self.doctree + index = doctree.findnonclass(nodes.PreBibliographic) + if index is None or len(doctree) > (index + 1) or \ + not isinstance(doctree[index], nodes.section): + return None, None + else: + return doctree[index], index + + +class DocInfo(Transform): + + """ + This transform is specific to the reStructuredText_ markup syntax; + see "Bibliographic Fields" in the `reStructuredText Markup + Specification`_ for a high-level description. This transform + should be run *after* the `DocTitle` transform. + + Given a field list as the first non-comment element after the + document title and subtitle (if present), registered bibliographic + field names are transformed to the corresponding DTD elements, + becoming child elements of the "docinfo" element (except for the + abstract, which becomes a "topic" element after "docinfo"). + + For example, given this document fragment after parsing:: + + <document> + <title> + Document Title + <field_list> + <field> + <field_name> + Author + <field_body> + <paragraph> + A. Name + <field> + <field_name> + Status + <field_body> + <paragraph> + $RCSfile$ + ... + + After running the bibliographic field list transform, the + resulting document tree would look like this:: + + <document> + <title> + Document Title + <docinfo> + <author> + A. Name + <status> + frontmatter.py + ... + + The "Status" field contained an expanded RCS keyword, which is + normally (but optionally) cleaned up by the transform. The sole + contents of the field body must be a paragraph containing an + expanded RCS keyword of the form "$keyword: expansion text $". Any + RCS keyword can be processed in any bibliographic field. The + dollar signs and leading RCS keyword name are removed. Extra + processing is done for the following RCS keywords: + + - "RCSfile" expands to the name of the file in the RCS or CVS + repository, which is the name of the source file with a ",v" + suffix appended. The transform will remove the ",v" suffix. + + - "Date" expands to the format "YYYY/MM/DD hh:mm:ss" (in the UTC + time zone). The RCS Keywords transform will extract just the + date itself and transform it to an ISO 8601 format date, as in + "2000-12-31". + + (Since the source file for this text is itself stored under CVS, + we can't show an example of the "Date" RCS keyword because we + can't prevent any RCS keywords used in this explanation from + being expanded. Only the "RCSfile" keyword is stable; its + expansion text changes only if the file name changes.) + """ + + def transform(self): + doctree = self.doctree + index = doctree.findnonclass(nodes.PreBibliographic) + if index is None: + return + candidate = doctree[index] + if isinstance(candidate, nodes.field_list): + biblioindex = doctree.findnonclass(nodes.Titular) + nodelist, remainder = self.extract_bibliographic(candidate) + if remainder: + doctree[index] = remainder + else: + del doctree[index] + doctree[biblioindex:biblioindex] = nodelist + return + + def extract_bibliographic(self, field_list): + docinfo = nodes.docinfo() + remainder = [] + bibliofields = self.language.bibliographic_fields + abstract = None + for field in field_list: + try: + name = field[0][0].astext() + normedname = utils.normname(name) + if not (len(field) == 2 and bibliofields.has_key(normedname) + and self.check_empty_biblio_field(field, name)): + raise TransformError + biblioclass = bibliofields[normedname] + if issubclass(biblioclass, nodes.TextElement): + if not self.check_compound_biblio_field(field, name): + raise TransformError + self.filter_rcs_keywords(field[1][0]) + docinfo.append(biblioclass('', '', *field[1][0])) + else: # multiple body elements possible + if issubclass(biblioclass, nodes.authors): + self.extract_authors(field, name, docinfo) + elif issubclass(biblioclass, nodes.topic): + if abstract: + field[-1] += self.doctree.reporter.warning( + 'There can only be one abstract.') + raise TransformError + title = nodes.title( + name, self.language.labels['abstract']) + abstract = nodes.topic('', title, CLASS='abstract', + *field[1].children) + else: + docinfo.append(biblioclass('', *field[1].children)) + except TransformError: + remainder.append(field) + continue + nodelist = [] + if len(docinfo) != 0: + nodelist.append(docinfo) + if abstract: + nodelist.append(abstract) + if remainder: + field_list[:] = remainder + else: + field_list = None + return nodelist, field_list + + def check_empty_biblio_field(self, field, name): + if len(field[1]) < 1: + field[-1] += self.doctree.reporter.warning( + 'Cannot extract empty bibliographic field "%s".' % name) + return None + return 1 + + def check_compound_biblio_field(self, field, name): + if len(field[1]) > 1: + field[-1] += self.doctree.reporter.warning( + 'Cannot extract compound bibliographic field "%s".' % name) + return None + if not isinstance(field[1][0], nodes.paragraph): + field[-1] += self.doctree.reporter.warning( + 'Cannot extract bibliographic field "%s" containing anything ' + 'other than a single paragraph.' + % name) + return None + return 1 + + rcs_keyword_substitutions = [ + (re.compile(r'\$' r'Date: (\d\d\d\d)/(\d\d)/(\d\d) [\d:]+ \$$', + re.IGNORECASE), r'\1-\2-\3'), + (re.compile(r'\$' r'RCSfile: (.+),v \$$', + re.IGNORECASE), r'\1'), + (re.compile(r'\$[a-zA-Z]+: (.+) \$$'), r'\1'),] + + def filter_rcs_keywords(self, paragraph): + if len(paragraph) == 1 and isinstance(paragraph[0], nodes.Text): + textnode = paragraph[0] + for pattern, substitution in self.rcs_keyword_substitutions: + match = pattern.match(textnode.data) + if match: + textnode.data = pattern.sub(substitution, textnode.data) + return + + def extract_authors(self, field, name, docinfo): + try: + if len(field[1]) == 1: + if isinstance(field[1][0], nodes.paragraph): + authors = self.authors_from_one_paragraph(field) + elif isinstance(field[1][0], nodes.bullet_list): + authors = self.authors_from_bullet_list(field) + else: + raise TransformError + else: + authors = self.authors_from_paragraphs(field) + authornodes = [nodes.author('', '', *author) + for author in authors if author] + docinfo.append(nodes.authors('', *authornodes)) + except TransformError: + field[-1] += self.doctree.reporter.warning( + 'Bibliographic field "%s" incompatible with extraction: ' + 'it must contain either a single paragraph (with authors ' + 'separated by one of "%s"), multiple paragraphs (one per ' + 'author), or a bullet list with one paragraph (one author) ' + 'per item.' + % (name, ''.join(self.language.author_separators))) + raise + + def authors_from_one_paragraph(self, field): + text = field[1][0].astext().strip() + if not text: + raise TransformError + for authorsep in self.language.author_separators: + authornames = text.split(authorsep) + if len(authornames) > 1: + break + authornames = [author.strip() for author in authornames] + authors = [[nodes.Text(author)] for author in authornames] + return authors + + def authors_from_bullet_list(self, field): + authors = [] + for item in field[1][0]: + if len(item) != 1 or not isinstance(item[0], nodes.paragraph): + raise TransformError + authors.append(item[0].children) + if not authors: + raise TransformError + return authors + + def authors_from_paragraphs(self, field): + for item in field[1]: + if not isinstance(item, nodes.paragraph): + raise TransformError + authors = [item.children for item in field[1]] + return authors diff --git a/docutils/transforms/references.py b/docutils/transforms/references.py new file mode 100644 index 000000000..c2ff9189b --- /dev/null +++ b/docutils/transforms/references.py @@ -0,0 +1,670 @@ +#! /usr/bin/env python +""" +:Authors: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Transforms for resolving references: + +- `Hyperlinks`: Used to resolve hyperlink targets and references. +- `Footnotes`: Resolve footnote numbering and references. +- `Substitutions`: Resolve substitutions. +""" + +__docformat__ = 'reStructuredText' + +import re +from docutils import nodes, utils +from docutils.transforms import TransformError, Transform + + +class Hyperlinks(Transform): + + """Resolve the various types of hyperlink targets and references.""" + + def transform(self): + stages = [] + #stages.append('Beginning of references.Hyperlinks.transform()\n' + self.doctree.pformat()) + self.resolve_chained_targets() + #stages.append('After references.Hyperlinks.resolve_chained_targets()\n' + self.doctree.pformat()) + self.resolve_anonymous() + #stages.append('After references.Hyperlinks.resolve_anonymous()\n' + self.doctree.pformat()) + self.resolve_indirect() + #stages.append('After references.Hyperlinks.resolve_indirect()\n' + self.doctree.pformat()) + self.resolve_external_targets() + #stages.append('After references.Hyperlinks.resolve_external_references()\n' + self.doctree.pformat()) + self.resolve_internal_targets() + #stages.append('After references.Hyperlinks.resolve_internal_references()\n' + self.doctree.pformat()) + #import difflib + #compare = difflib.Differ().compare + #for i in range(len(stages) - 1): + # print ''.join(compare(stages[i].splitlines(1), stages[i+1].splitlines(1))) + + def resolve_chained_targets(self): + """ + Attributes "refuri" and "refname" are migrated from the final direct + target up the chain of contiguous adjacent internal targets, using + `ChainedTargetResolver`. + """ + visitor = ChainedTargetResolver(self.doctree) + self.doctree.walk(visitor) + + def resolve_anonymous(self): + """ + Link anonymous references to targets. Given:: + + <paragraph> + <reference anonymous="1"> + internal + <reference anonymous="1"> + external + <target anonymous="1" id="id1"> + <target anonymous="1" id="id2" refuri="http://external"> + + Corresponding references are linked via "refid" or resolved via + "refuri":: + + <paragraph> + <reference anonymous="1" refid="id1"> + text + <reference anonymous="1" refuri="http://external"> + external + <target anonymous="1" id="id1"> + <target anonymous="1" id="id2" refuri="http://external"> + """ + if len(self.doctree.anonymous_refs) \ + != len(self.doctree.anonymous_targets): + msg = self.doctree.reporter.error( + 'Anonymous hyperlink mismatch: %s references but %s targets.' + % (len(self.doctree.anonymous_refs), + len(self.doctree.anonymous_targets))) + self.doctree.messages += msg + msgid = self.doctree.set_id(msg) + for ref in self.doctree.anonymous_refs: + prb = nodes.problematic( + ref.rawsource, ref.rawsource, refid=msgid) + prbid = self.doctree.set_id(prb) + msg.add_backref(prbid) + ref.parent.replace(ref, prb) + return + for i in range(len(self.doctree.anonymous_refs)): + ref = self.doctree.anonymous_refs[i] + target = self.doctree.anonymous_targets[i] + if target.hasattr('refuri'): + ref['refuri'] = target['refuri'] + ref.resolved = 1 + else: + ref['refid'] = target['id'] + self.doctree.note_refid(ref) + target.referenced = 1 + + def resolve_indirect(self): + """ + a) Indirect external references:: + + <paragraph> + <reference refname="indirect external"> + indirect external + <target id="id1" name="direct external" + refuri="http://indirect"> + <target id="id2" name="indirect external" + refname="direct external"> + + The "refuri" attribute is migrated back to all indirect targets from + the final direct target (i.e. a target not referring to another + indirect target):: + + <paragraph> + <reference refname="indirect external"> + indirect external + <target id="id1" name="direct external" + refuri="http://indirect"> + <target id="id2" name="indirect external" + refuri="http://indirect"> + + Once the attribute is migrated, the preexisting "refname" attribute + is dropped. + + b) Indirect internal references:: + + <target id="id1" name="final target"> + <paragraph> + <reference refname="indirect internal"> + indirect internal + <target id="id2" name="indirect internal 2" + refname="final target"> + <target id="id3" name="indirect internal" + refname="indirect internal 2"> + + Targets which indirectly refer to an internal target become one-hop + indirect (their "refid" attributes are directly set to the internal + target's "id"). References which indirectly refer to an internal + target become direct internal references:: + + <target id="id1" name="final target"> + <paragraph> + <reference refid="id1"> + indirect internal + <target id="id2" name="indirect internal 2" refid="id1"> + <target id="id3" name="indirect internal" refid="id1"> + """ + #import mypdb as pdb + #pdb.set_trace() + for target in self.doctree.indirect_targets: + if not target.resolved: + self.resolve_indirect_target(target) + self.resolve_indirect_references(target) + + def resolve_indirect_target(self, target): + refname = target['refname'] + reftarget = None + if self.doctree.explicit_targets.has_key(refname): + reftarget = self.doctree.explicit_targets[refname] + elif self.doctree.implicit_targets.has_key(refname): + reftarget = self.doctree.implicit_targets[refname] + if not reftarget: + self.nonexistent_indirect_target(target) + return + if isinstance(reftarget, nodes.target) \ + and not reftarget.resolved and reftarget.hasattr('refname'): + self.one_indirect_target(reftarget) # multiply indirect + if reftarget.hasattr('refuri'): + target['refuri'] = reftarget['refuri'] + if target.hasattr('name'): + self.doctree.note_external_target(target) + elif reftarget.hasattr('refid'): + target['refid'] = reftarget['refid'] + self.doctree.note_refid(target) + else: + try: + target['refid'] = reftarget['id'] + self.doctree.note_refid(target) + except KeyError: + self.nonexistent_indirect_target(target) + return + del target['refname'] + target.resolved = 1 + reftarget.referenced = 1 + + def nonexistent_indirect_target(self, target): + naming = '' + if target.hasattr('name'): + naming = '"%s" ' % target['name'] + reflist = self.doctree.refnames[target['name']] + else: + reflist = self.doctree.refnames[target['id']] + naming += '(id="%s")' % target['id'] + msg = self.doctree.reporter.warning( + 'Indirect hyperlink target %s refers to target "%s", ' + 'which does not exist.' % (naming, target['refname'])) + self.doctree.messages += msg + msgid = self.doctree.set_id(msg) + for ref in reflist: + prb = nodes.problematic( + ref.rawsource, ref.rawsource, refid=msgid) + prbid = self.doctree.set_id(prb) + msg.add_backref(prbid) + ref.parent.replace(ref, prb) + target.resolved = 1 + + def resolve_indirect_references(self, target): + if target.hasattr('refid'): + attname = 'refid' + call_if_named = 0 + call_method = self.doctree.note_refid + elif target.hasattr('refuri'): + attname = 'refuri' + call_if_named = 1 + call_method = self.doctree.note_external_target + else: + return + attval = target[attname] + if target.hasattr('name'): + name = target['name'] + try: + reflist = self.doctree.refnames[name] + except KeyError, instance: + if target.referenced: + return + msg = self.doctree.reporter.info( + 'Indirect hyperlink target "%s" is not referenced.' + % name) + self.doctree.messages += msg + target.referenced = 1 + return + delatt = 'refname' + else: + id = target['id'] + try: + reflist = self.doctree.refids[id] + except KeyError, instance: + if target.referenced: + return + msg = self.doctree.reporter.info( + 'Indirect hyperlink target id="%s" is not referenced.' + % id) + self.doctree.messages += msg + target.referenced = 1 + return + delatt = 'refid' + for ref in reflist: + if ref.resolved: + continue + del ref[delatt] + ref[attname] = attval + if not call_if_named or ref.hasattr('name'): + call_method(ref) + ref.resolved = 1 + if isinstance(ref, nodes.target): + self.resolve_indirect_references(ref) + target.referenced = 1 + + def resolve_external_targets(self): + """ + Given:: + + <paragraph> + <reference refname="direct external"> + direct external + <target id="id1" name="direct external" refuri="http://direct"> + + The "refname" attribute is replaced by the direct "refuri" attribute:: + + <paragraph> + <reference refuri="http://direct"> + direct external + <target id="id1" name="direct external" refuri="http://direct"> + """ + for target in self.doctree.external_targets: + if target.hasattr('refuri') and target.hasattr('name'): + name = target['name'] + refuri = target['refuri'] + try: + reflist = self.doctree.refnames[name] + except KeyError, instance: + if target.referenced: + continue + msg = self.doctree.reporter.info( + 'External hyperlink target "%s" is not referenced.' + % name) + self.doctree.messages += msg + target.referenced = 1 + continue + for ref in reflist: + if ref.resolved: + continue + del ref['refname'] + ref['refuri'] = refuri + ref.resolved = 1 + target.referenced = 1 + + def resolve_internal_targets(self): + """ + Given:: + + <paragraph> + <reference refname="direct internal"> + direct internal + <target id="id1" name="direct internal"> + + The "refname" attribute is replaced by "refid" linking to the target's + "id":: + + <paragraph> + <reference refid="id1"> + direct internal + <target id="id1" name="direct internal"> + """ + for target in self.doctree.internal_targets: + if target.hasattr('refuri') or target.hasattr('refid') \ + or not target.hasattr('name'): + continue + name = target['name'] + refid = target['id'] + try: + reflist = self.doctree.refnames[name] + except KeyError, instance: + if target.referenced: + continue + msg = self.doctree.reporter.info( + 'Internal hyperlink target "%s" is not referenced.' + % name) + self.doctree.messages += msg + target.referenced = 1 + continue + for ref in reflist: + if ref.resolved: + continue + del ref['refname'] + ref['refid'] = refid + ref.resolved = 1 + target.referenced = 1 + + +class ChainedTargetResolver(nodes.NodeVisitor): + + """ + Copy reference attributes up the length of a hyperlink target chain. + + "Chained targets" are multiple adjacent internal hyperlink targets which + "point to" an external or indirect target. After the transform, all + chained targets will effectively point to the same place. + + Given the following ``doctree`` as input:: + + <document> + <target id="a" name="a"> + <target id="b" name="b"> + <target id="c" name="c" refuri="http://chained.external.targets"> + <target id="d" name="d"> + <paragraph> + I'm known as "d". + <target id="e" name="e"> + <target id="id1"> + <target id="f" name="f" refname="d"> + + ``ChainedTargetResolver(doctree).walk()`` will transform the above into:: + + <document> + <target id="a" name="a" refuri="http://chained.external.targets"> + <target id="b" name="b" refuri="http://chained.external.targets"> + <target id="c" name="c" refuri="http://chained.external.targets"> + <target id="d" name="d"> + <paragraph> + I'm known as "d". + <target id="e" name="e" refname="d"> + <target id="id1" refname="d"> + <target id="f" name="f" refname="d"> + """ + + def unknown_visit(self, node): + pass + + def visit_target(self, node): + if node.hasattr('refuri'): + attname = 'refuri' + call_if_named = self.doctree.note_external_target + elif node.hasattr('refname'): + attname = 'refname' + call_if_named = self.doctree.note_indirect_target + elif node.hasattr('refid'): + attname = 'refid' + call_if_named = None + else: + return + attval = node[attname] + index = node.parent.index(node) + for i in range(index - 1, -1, -1): + sibling = node.parent[i] + if not isinstance(sibling, nodes.target) \ + or sibling.hasattr('refuri') \ + or sibling.hasattr('refname') \ + or sibling.hasattr('refid'): + break + sibling[attname] = attval + if sibling.hasattr('name') and call_if_named: + call_if_named(sibling) + + +class Footnotes(Transform): + + """ + Assign numbers to autonumbered footnotes, and resolve links to footnotes, + citations, and their references. + + Given the following ``doctree`` as input:: + + <document> + <paragraph> + A labeled autonumbered footnote referece: + <footnote_reference auto="1" id="id1" refname="footnote"> + <paragraph> + An unlabeled autonumbered footnote referece: + <footnote_reference auto="1" id="id2"> + <footnote auto="1" id="id3"> + <paragraph> + Unlabeled autonumbered footnote. + <footnote auto="1" id="footnote" name="footnote"> + <paragraph> + Labeled autonumbered footnote. + + Auto-numbered footnotes have attribute ``auto="1"`` and no label. + Auto-numbered footnote_references have no reference text (they're + empty elements). When resolving the numbering, a ``label`` element + is added to the beginning of the ``footnote``, and reference text + to the ``footnote_reference``. + + The transformed result will be:: + + <document> + <paragraph> + A labeled autonumbered footnote referece: + <footnote_reference auto="1" id="id1" refid="footnote"> + 2 + <paragraph> + An unlabeled autonumbered footnote referece: + <footnote_reference auto="1" id="id2" refid="id3"> + 1 + <footnote auto="1" id="id3" backrefs="id2"> + <label> + 1 + <paragraph> + Unlabeled autonumbered footnote. + <footnote auto="1" id="footnote" name="footnote" backrefs="id1"> + <label> + 2 + <paragraph> + Labeled autonumbered footnote. + + Note that the footnotes are not in the same order as the references. + + The labels and reference text are added to the auto-numbered ``footnote`` + and ``footnote_reference`` elements. Footnote elements are backlinked to + their references via "refids" attributes. References are assigned "id" + and "refid" attributes. + + After adding labels and reference text, the "auto" attributes can be + ignored. + """ + + autofootnote_labels = None + """Keep track of unlabeled autonumbered footnotes.""" + + symbols = [ + # Entries 1-4 and 6 below are from section 12.51 of + # The Chicago Manual of Style, 14th edition. + '*', # asterisk/star + u'\u2020', # dagger † + u'\u2021', # double dagger ‡ + u'\u00A7', # section mark § + u'\u00B6', # paragraph mark (pilcrow) ¶ + # (parallels ['||'] in CMoS) + '#', # number sign + # The entries below were chosen arbitrarily. + u'\u2660', # spade suit ♠ + u'\u2665', # heart suit ♥ + u'\u2666', # diamond suit ♦ + u'\u2663', # club suit ♣ + ] + + def transform(self): + self.autofootnote_labels = [] + startnum = self.doctree.autofootnote_start + self.doctree.autofootnote_start = self.number_footnotes(startnum) + self.number_footnote_references(startnum) + self.symbolize_footnotes() + self.resolve_footnotes_and_citations() + + def number_footnotes(self, startnum): + """ + Assign numbers to autonumbered footnotes. + + For labeled autonumbered footnotes, copy the number over to + corresponding footnote references. + """ + for footnote in self.doctree.autofootnotes: + while 1: + label = str(startnum) + startnum += 1 + if not self.doctree.explicit_targets.has_key(label): + break + footnote.insert(0, nodes.label('', label)) + if footnote.hasattr('dupname'): + continue + if footnote.hasattr('name'): + name = footnote['name'] + for ref in self.doctree.footnote_refs.get(name, []): + ref += nodes.Text(label) + ref.delattr('refname') + ref['refid'] = footnote['id'] + footnote.add_backref(ref['id']) + self.doctree.note_refid(ref) + ref.resolved = 1 + else: + footnote['name'] = label + self.doctree.note_explicit_target(footnote, footnote) + self.autofootnote_labels.append(label) + return startnum + + def number_footnote_references(self, startnum): + """Assign numbers to autonumbered footnote references.""" + i = 0 + for ref in self.doctree.autofootnote_refs: + if ref.resolved or ref.hasattr('refid'): + continue + try: + label = self.autofootnote_labels[i] + except IndexError: + msg = self.doctree.reporter.error( + 'Too many autonumbered footnote references: only %s ' + 'corresponding footnotes available.' + % len(self.autofootnote_labels)) + msgid = self.doctree.set_id(msg) + self.doctree.messages += msg + for ref in self.doctree.autofootnote_refs[i:]: + if ref.resolved or ref.hasattr('refname'): + continue + prb = nodes.problematic( + ref.rawsource, ref.rawsource, refid=msgid) + prbid = self.doctree.set_id(prb) + msg.add_backref(prbid) + ref.parent.replace(ref, prb) + break + ref += nodes.Text(label) + footnote = self.doctree.explicit_targets[label] + ref['refid'] = footnote['id'] + self.doctree.note_refid(ref) + footnote.add_backref(ref['id']) + ref.resolved = 1 + i += 1 + + def symbolize_footnotes(self): + """Add symbols indexes to "[*]"-style footnotes and references.""" + labels = [] + for footnote in self.doctree.symbol_footnotes: + reps, index = divmod(self.doctree.symbol_footnote_start, + len(self.symbols)) + labeltext = self.symbols[index] * (reps + 1) + labels.append(labeltext) + footnote.insert(0, nodes.label('', labeltext)) + self.doctree.symbol_footnote_start += 1 + self.doctree.set_id(footnote) + i = 0 + for ref in self.doctree.symbol_footnote_refs: + try: + ref += nodes.Text(labels[i]) + except IndexError: + msg = self.doctree.reporter.error( + 'Too many symbol footnote references: only %s ' + 'corresponding footnotes available.' % len(labels)) + msgid = self.set_id(msg) + self.doctree.messages += msg + for ref in self.doctree.symbol_footnote_refs[i:]: + if ref.resolved or ref.hasattr('refid'): + continue + prb = nodes.problematic( + ref.rawsource, ref.rawsource, refid=msgid) + prbid = self.doctree.set_id(prb) + msg.add_backref(prbid) + ref.parent.replace(ref, prb) + break + footnote = self.doctree.symbol_footnotes[i] + ref['refid'] = footnote['id'] + self.doctree.note_refid(ref) + footnote.add_backref(ref['id']) + i += 1 + + def resolve_footnotes_and_citations(self): + """ + Link manually-labeled footnotes and citations to/from their references. + """ + for footnote in self.doctree.footnotes: + label = footnote['name'] + if self.doctree.footnote_refs.has_key(label): + reflist = self.doctree.footnote_refs[label] + self.resolve_references(footnote, reflist) + for citation in self.doctree.citations: + label = citation['name'] + if self.doctree.citation_refs.has_key(label): + reflist = self.doctree.citation_refs[label] + self.resolve_references(citation, reflist) + + def resolve_references(self, note, reflist): + id = note['id'] + for ref in reflist: + if ref.resolved: + continue + ref.delattr('refname') + ref['refid'] = id + note.add_backref(ref['id']) + ref.resolved = 1 + note.resolved = 1 + + +class Substitutions(Transform): + + """ + Given the following ``doctree`` as input:: + + <document> + <paragraph> + The + <substitution_reference refname="biohazard"> + biohazard + symbol is deservedly scary-looking. + <substitution_definition name="biohazard"> + <image alt="biohazard" uri="biohazard.png"> + + The ``substitution_reference`` will simply be replaced by the + contents of the corresponding ``substitution_definition``. + + The transformed result will be:: + + <document> + <paragraph> + The + <image alt="biohazard" uri="biohazard.png"> + symbol is deservedly scary-looking. + <substitution_definition name="biohazard"> + <image alt="biohazard" uri="biohazard.png"> + """ + + def transform(self): + defs = self.doctree.substitution_defs + for refname, refs in self.doctree.substitution_refs.items(): + for ref in refs: + if defs.has_key(refname): + ref.parent.replace(ref, defs[refname].getchildren()) + else: + msg = self.doctree.reporter.error( + 'Undefined substitution referenced: "%s".' % refname) + msgid = self.doctree.set_id(msg) + self.doctree.messages += msg + prb = nodes.problematic( + ref.rawsource, ref.rawsource, refid=msgid) + prbid = self.doctree.set_id(prb) + msg.add_backref(prbid) + ref.parent.replace(ref, prb) + self.doctree.substitution_refs = None # release replaced references diff --git a/docutils/transforms/universal.py b/docutils/transforms/universal.py new file mode 100644 index 000000000..1cebcc9db --- /dev/null +++ b/docutils/transforms/universal.py @@ -0,0 +1,149 @@ +#! /usr/bin/env python +""" +:Authors: David Goodger, Ueli Schlaepfer +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Transforms needed by most or all documents: + +- `Messages`: Placement of system messages stored in + `nodes.document.messages`. +- `TestMessages`: Like `Messages`, used on test runs. +- `FinalReferences`: Resolve remaining references. +- `Pending`: Execute pending transforms (abstract base class; + `FirstReaderPending`, `LastReaderPending`, `FirstWriterPending`, and + `LastWriterPending` are its concrete subclasses). +""" + +__docformat__ = 'reStructuredText' + +import re +from docutils import nodes, utils +from docutils.transforms import TransformError, Transform + + +class Messages(Transform): + + """ + Place any system messages generated after parsing into a dedicated section + of the document. + """ + + def transform(self): + # @@@ filter out msgs below threshold? + if len(self.doctree.messages) > 0: + section = nodes.section(CLASS='system-messages') + # @@@ get this from the language module? + section += nodes.title('', 'Docutils System Messages') + section += self.doctree.messages.getchildren() + self.doctree.messages[:] = [] + self.doctree += section + + +class TestMessages(Transform): + + """ + Append all system messages to the end of the doctree. + """ + + def transform(self): + self.doctree += self.doctree.messages.getchildren() + + +class FinalChecks(Transform): + + """ + Perform last-minute checks. + + - Check for dangling references (incl. footnote & citation). + """ + + def transform(self): + visitor = FinalCheckVisitor(self.doctree) + self.doctree.walk(visitor) + + +class FinalCheckVisitor(nodes.NodeVisitor): + + def unknown_visit(self, node): + pass + + def visit_reference(self, node): + if node.resolved or not node.hasattr('refname'): + return + refname = node['refname'] + try: + id = self.doctree.nameids[refname] + except KeyError: + msg = self.doctree.reporter.error( + 'Unknown target name: "%s".' % (node['refname'])) + self.doctree.messages += msg + msgid = self.doctree.set_id(msg) + prb = nodes.problematic( + node.rawsource, node.rawsource, refid=msgid) + prbid = self.doctree.set_id(prb) + msg.add_backref(prbid) + node.parent.replace(node, prb) + return + del node['refname'] + node['refid'] = id + self.doctree.ids[id].referenced = 1 + node.resolved = 1 + + visit_footnote_reference = visit_citation_reference = visit_reference + + +class Pending(Transform): + + """ + Execute pending transforms. + """ + + stage = None + """The stage of processing applicable to this transform; match with + `nodes.pending.stage`. Possible values include 'first_reader', + 'last_reader', 'first_writer', and 'last_writer'. Override in + subclasses.""" + + def transform(self): + for pending in self.doctree.pending: + if pending.stage == self.stage: + pending.transform(self.doctree, pending).transform() + + +class FirstReaderPending(Pending): + + stage = 'first_reader' + + +class LastReaderPending(Pending): + + stage = 'last_reader' + + +class FirstWriterPending(Pending): + + stage = 'first_writer' + + +class LastWriterPending(Pending): + + stage = 'last_writer' + + +test_transforms = (TestMessages,) +"""Universal transforms to apply to the raw doctree when testing.""" + +first_reader_transforms = (FirstReaderPending,) +"""Universal transforms to apply before any other Reader transforms.""" + +last_reader_transforms = (LastReaderPending,) +"""Universal transforms to apply after all other Reader transforms.""" + +first_writer_transforms = (FirstWriterPending,) +"""Universal transforms to apply before any other Writer transforms.""" + +last_writer_transforms = (LastWriterPending, FinalChecks, Messages) +"""Universal transforms to apply after all other Writer transforms.""" diff --git a/docutils/urischemes.py b/docutils/urischemes.py new file mode 100644 index 000000000..367217ad4 --- /dev/null +++ b/docutils/urischemes.py @@ -0,0 +1,94 @@ +""" +`schemes` is a dictionary with lowercase URI addressing schemes as +keys and descriptions as values. It was compiled from the index at +http://www.w3.org/Addressing/schemes.html, revised 2001-08-20. Many +values are blank and should be filled in with useful descriptions. +""" + +schemes = { + 'about': 'provides information on Navigator', + 'acap': 'application configuration access protocol', + 'addbook': "To add vCard entries to Communicator's Address Book", + 'afp': 'Apple Filing Protocol', + 'afs': 'Andrew File System global file names', + 'aim': 'AOL Instant Messenger', + 'callto': 'for NetMeeting links', + 'castanet': 'Castanet Tuner URLs for Netcaster', + 'chttp': 'cached HTTP supported by RealPlayer', + 'cid': 'content identifier', + 'data': 'allows inclusion of small data items as "immediate" data; RFC-2397', + 'dav': 'Distributed Authoring and Versioning Protocol; RFC 2518', + 'dns': 'Domain Name System resources', + 'eid': ('External ID; non-URL data; general escape mechanism to allow ' + 'access to information for applications that are too ' + 'specialized to justify their own schemes'), + 'fax': '', + 'file': 'Host-specific file names', + 'finger': '', + 'freenet': '', + 'ftp': 'File Transfer Protocol', + 'gopher': 'The Gopher Protocol', + 'gsm-sms': '', + 'h323': '', + 'h324': '', + 'hdl': '', + 'hnews': '', + 'http': 'Hypertext Transfer Protocol', + 'https': '', + 'iioploc': '', + 'ilu': '', + 'imap': 'internet message access protocol', + 'ior': '', + 'ipp': '', + 'irc': 'Internet Relay Chat', + 'jar': '', + 'javascript': 'JavaScript code; evaluates the expression after the colon', + 'jdbc': '', + 'ldap': 'Lightweight Directory Access Protocol', + 'lifn': '', + 'livescript': '', + 'lrq': '', + 'mailbox': 'Mail folder access', + 'mailserver': 'Access to data available from mail servers', + 'mailto': 'Electronic mail address', + 'md5': '', + 'mid': 'message identifier', + 'mocha': '', + 'modem': '', + 'news': 'USENET news', + 'nfs': 'network file system protocol', + 'nntp': 'USENET news using NNTP access', + 'opaquelocktoken': '', + 'phone': '', + 'pop': 'Post Office Protocol', + 'pop3': 'Post Office Protocol v3', + 'printer': '', + 'prospero': 'Prospero Directory Service', + 'res': '', + 'rtsp': 'real time streaming protocol', + 'rvp': '', + 'rwhois': '', + 'rx': 'Remote Execution', + 'sdp': '', + 'service': 'service location', + 'sip': 'session initiation protocol', + 'smb': '', + 'snews': 'For NNTP postings via SSL', + 't120': '', + 'tcp': '', + 'tel': 'telephone', + 'telephone': 'telephone', + 'telnet': 'Reference to interactive sessions', + 'tip': 'Transaction Internet Protocol', + 'tn3270': 'Interactive 3270 emulation sessions', + 'tv': '', + 'urn': 'Uniform Resource Name', + 'uuid': '', + 'vemmi': 'versatile multimedia interface', + 'videotex': '', + 'view-source': 'displays HTML code that was generated with JavaScript', + 'wais': 'Wide Area Information Servers', + 'whodp': '', + 'whois++': 'Distributed directory service.', + 'z39.50r': 'Z39.50 Retrieval', + 'z39.50s': 'Z39.50 Session',} diff --git a/docutils/utils.py b/docutils/utils.py new file mode 100644 index 000000000..a92c8fb97 --- /dev/null +++ b/docutils/utils.py @@ -0,0 +1,373 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Miscellaneous utilities for the documentation utilities. +""" + +import sys, re +import nodes + + +class SystemMessage(Exception): + + def __init__(self, system_message): + Exception.__init__(self, system_message.astext()) + + +class Reporter: + + """ + Info/warning/error reporter and ``system_message`` element generator. + + Five levels of system messages are defined, along with corresponding + methods: `debug()`, `info()`, `warning()`, `error()`, and `severe()`. + + There is typically one Reporter object per process. A Reporter object is + instantiated with thresholds for generating warnings and errors (raising + exceptions), a switch to turn debug output on or off, and an I/O stream + for warnings. These are stored in the default reporting category, '' + (zero-length string). + + Multiple reporting categories [#]_ may be set, each with its own warning + and error thresholds, debugging switch, and warning stream (collectively a + `ConditionSet`). Categories are hierarchically-named strings that look + like attribute references: 'spam', 'spam.eggs', 'neeeow.wum.ping'. The + 'spam' category is the ancestor of 'spam.bacon.eggs'. Unset categories + inherit stored conditions from their closest ancestor category that has + been set. + + When a system message is generated, the stored conditions from its + category (or ancestor if unset) are retrieved. The system message level is + compared to the thresholds stored in the category, and a warning or error + is generated as appropriate. Debug messages are produced iff the stored + debug switch is on. Message output is sent to the stored warning stream. + + The default category is '' (empty string). By convention, Writers should + retrieve reporting conditions from the 'writer' category (which, unless + explicitly set, defaults to the conditions of the default category). + + .. [#] The concept of "categories" was inspired by the log4j project: + http://jakarta.apache.org/log4j/. + """ + + levels = 'DEBUG INFO WARNING ERROR SEVERE'.split() + """List of names for system message levels, indexed by level.""" + + def __init__(self, warninglevel, errorlevel, stream=None, debug=0): + """ + Initialize the `ConditionSet` forthe `Reporter`'s default category. + + :Parameters: + + - `warninglevel`: The level at or above which warning output will + be sent to `stream`. + - `errorlevel`: The level at or above which `SystemMessage` + exceptions will be raised. + - `debug`: Show debug (level=0) system messages? + - `stream`: Where warning output is sent (`None` implies + `sys.stderr`). + """ + + if stream is None: + stream = sys.stderr + + self.categories = {'': ConditionSet(debug, warninglevel, errorlevel, + stream)} + """Mapping of category names to conditions. Default category is ''.""" + + def setconditions(self, category, warninglevel, errorlevel, + stream=None, debug=0): + if stream is None: + stream = sys.stderr + self.categories[category] = ConditionSet(debug, warninglevel, + errorlevel, stream) + + def unsetconditions(self, category): + if category and self.categories.has_key(category): + del self.categories[category] + + __delitem__ = unsetconditions + + def getconditions(self, category): + while not self.categories.has_key(category): + category = category[:category.rfind('.') + 1][:-1] + return self.categories[category] + + __getitem__ = getconditions + + def system_message(self, level, comment=None, category='', + *children, **attributes): + """ + Return a system_message object. + + Raise an exception or generate a warning if appropriate. + """ + msg = nodes.system_message(comment, level=level, + type=self.levels[level], + *children, **attributes) + debug, warninglevel, errorlevel, stream = self[category].astuple() + if level >= warninglevel or debug and level == 0: + if category: + print >>stream, 'Reporter "%s":' % category, msg.astext() + else: + print >>stream, 'Reporter:', msg.astext() + if level >= errorlevel: + raise SystemMessage(msg) + return msg + + def debug(self, comment=None, category='', *children, **attributes): + """ + Level-0, "DEBUG": an internal reporting issue. Typically, there is no + effect on the processing. Level-0 system messages are handled + separately from the others. + """ + return self.system_message( + 0, comment, category, *children, **attributes) + + def info(self, comment=None, category='', *children, **attributes): + """ + Level-1, "INFO": a minor issue that can be ignored. Typically there is + no effect on processing, and level-1 system messages are not reported. + """ + return self.system_message( + 1, comment, category, *children, **attributes) + + def warning(self, comment=None, category='', *children, **attributes): + """ + Level-2, "WARNING": an issue that should be addressed. If ignored, + there may be unpredictable problems with the output. + """ + return self.system_message( + 2, comment, category, *children, **attributes) + + def error(self, comment=None, category='', *children, **attributes): + """ + Level-3, "ERROR": an error that should be addressed. If ignored, the + output will contain errors. + """ + return self.system_message( + 3, comment, category, *children, **attributes) + + def severe(self, comment=None, category='', *children, **attributes): + """ + Level-4, "SEVERE": a severe error that must be addressed. If ignored, + the output will contain severe errors. Typically level-4 system + messages are turned into exceptions which halt processing. + """ + return self.system_message( + 4, comment, category, *children, **attributes) + + +class ConditionSet: + + """ + A set of thresholds, switches, and streams corresponding to one `Reporter` + category. + """ + + def __init__(self, debug, warninglevel, errorlevel, stream): + self.debug = debug + self.warninglevel = warninglevel + self.errorlevel = errorlevel + self.stream = stream + + def astuple(self): + return (self.debug, self.warninglevel, self.errorlevel, + self.stream) + + +class ExtensionAttributeError(Exception): pass +class BadAttributeError(ExtensionAttributeError): pass +class BadAttributeDataError(ExtensionAttributeError): pass +class DuplicateAttributeError(ExtensionAttributeError): pass + + +def extract_extension_attributes(field_list, attribute_spec): + """ + Return a dictionary mapping extension attribute names to converted values. + + :Parameters: + - `field_list`: A flat field list without field arguments, where each + field body consists of a single paragraph only. + - `attribute_spec`: Dictionary mapping known attribute names to a + conversion function such as `int` or `float`. + + :Exceptions: + - `KeyError` for unknown attribute names. + - `ValueError` for invalid attribute values (raised by the conversion + function). + - `DuplicateAttributeError` for duplicate attributes. + - `BadAttributeError` for invalid fields. + - `BadAttributeDataError` for invalid attribute data (missing name, + missing data, bad quotes, etc.). + """ + attlist = extract_attributes(field_list) + attdict = assemble_attribute_dict(attlist, attribute_spec) + return attdict + +def extract_attributes(field_list): + """ + Return a list of attribute (name, value) pairs from field names & bodies. + + :Parameter: + `field_list`: A flat field list without field arguments, where each + field body consists of a single paragraph only. + + :Exceptions: + - `BadAttributeError` for invalid fields. + - `BadAttributeDataError` for invalid attribute data (missing name, + missing data, bad quotes, etc.). + """ + attlist = [] + for field in field_list: + if len(field) != 2: + raise BadAttributeError( + 'extension attribute field may not contain field arguments') + name = field[0].astext().lower() + body = field[1] + if len(body) == 0: + data = None + elif len(body) > 1 or not isinstance(body[0], nodes.paragraph) \ + or len(body[0]) != 1 or not isinstance(body[0][0], nodes.Text): + raise BadAttributeDataError( + 'extension attribute field body may contain\n' + 'a single paragraph only (attribute "%s")' % name) + else: + data = body[0][0].astext() + attlist.append((name, data)) + return attlist + +def assemble_attribute_dict(attlist, attspec): + """ + Return a mapping of attribute names to values. + + :Parameters: + - `attlist`: A list of (name, value) pairs (the output of + `extract_attributes()`). + - `attspec`: Dictionary mapping known attribute names to a + conversion function such as `int` or `float`. + + :Exceptions: + - `KeyError` for unknown attribute names. + - `DuplicateAttributeError` for duplicate attributes. + - `ValueError` for invalid attribute values (raised by conversion + function). + """ + attributes = {} + for name, value in attlist: + convertor = attspec[name] # raises KeyError if unknown + if attributes.has_key(name): + raise DuplicateAttributeError('duplicate attribute "%s"' % name) + try: + attributes[name] = convertor(value) + except (ValueError, TypeError), detail: + raise detail.__class__('(attribute "%s", value "%r") %s' + % (name, value, detail)) + return attributes + + +class NameValueError(Exception): pass + + +def extract_name_value(line): + """ + Return a list of (name, value) from a line of the form "name=value ...". + + :Exception: + `NameValueError` for invalid input (missing name, missing data, bad + quotes, etc.). + """ + attlist = [] + while line: + equals = line.find('=') + if equals == -1: + raise NameValueError('missing "="') + attname = line[:equals].strip() + if equals == 0 or not attname: + raise NameValueError( + 'missing attribute name before "="') + line = line[equals+1:].lstrip() + if not line: + raise NameValueError( + 'missing value after "%s="' % attname) + if line[0] in '\'"': + endquote = line.find(line[0], 1) + if endquote == -1: + raise NameValueError( + 'attribute "%s" missing end quote (%s)' + % (attname, line[0])) + if len(line) > endquote + 1 and line[endquote + 1].strip(): + raise NameValueError( + 'attribute "%s" end quote (%s) not followed by ' + 'whitespace' % (attname, line[0])) + data = line[1:endquote] + line = line[endquote+1:].lstrip() + else: + space = line.find(' ') + if space == -1: + data = line + line = '' + else: + data = line[:space] + line = line[space+1:].lstrip() + attlist.append((attname.lower(), data)) + return attlist + + +def normname(name): + """Return a case- and whitespace-normalized name.""" + return ' '.join(name.lower().split()) + +def id(string): + """ + Convert `string` into an identifier and return it. + + Docutils identifiers will conform to the regular expression + ``[a-z][-a-z0-9]*``. For CSS compatibility, identifiers (the "class" and + "id" attributes) should have no underscores, colons, or periods. Hyphens + may be used. + + - The `HTML 4.01 spec`_ defines identifiers based on SGML tokens: + + ID and NAME tokens must begin with a letter ([A-Za-z]) and may be + followed by any number of letters, digits ([0-9]), hyphens ("-"), + underscores ("_"), colons (":"), and periods ("."). + + - However the `CSS1 spec`_ defines identifiers based on the "name" token, + a tighter interpretation ("flex" tokenizer notation; "latin1" and + "escape" 8-bit characters have been replaced with entities):: + + unicode \\[0-9a-f]{1,4} + latin1 [¡-ÿ] + escape {unicode}|\\[ -~¡-ÿ] + nmchar [-a-z0-9]|{latin1}|{escape} + name {nmchar}+ + + The CSS1 "nmchar" rule does not include underscores ("_"), colons (":"), + or periods ("."), therefore "class" and "id" attributes should not contain + these characters. They should be replaced with hyphens ("-"). Combined + with HTML's requirements (the first character must be a letter; no + "unicode", "latin1", or "escape" characters), this results in the + ``[a-z][-a-z0-9]*`` pattern. + + .. _HTML 4.01 spec: http://www.w3.org/TR/html401 + .. _CSS1 spec: http://www.w3.org/TR/REC-CSS1 + """ + id = non_id_chars.sub('-', normname(string)) + id = non_id_at_ends.sub('', id) + return str(id) + +non_id_chars = re.compile('[^a-z0-9]+') +non_id_at_ends = re.compile('^[-0-9]+|-+$') + +def newdocument(languagecode='en', warninglevel=2, errorlevel=4, + stream=None, debug=0): + reporter = Reporter(warninglevel, errorlevel, stream, debug) + document = nodes.document(languagecode=languagecode, reporter=reporter) + return document diff --git a/docutils/writers/__init__.py b/docutils/writers/__init__.py new file mode 100644 index 000000000..6d9d7c226 --- /dev/null +++ b/docutils/writers/__init__.py @@ -0,0 +1,104 @@ +#! /usr/bin/env python + +""" +:Authors: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +This package contains Docutils Writer modules. +""" + +__docformat__ = 'reStructuredText' + + +import sys +from docutils import languages +from docutils.transforms import universal + + +class Writer: + + """ + Abstract base class for docutils Writers. + + Each writer module or package must export a subclass also called 'Writer'. + + Call `write()` to process a document. + """ + + document = None + """The document to write.""" + + language = None + """Language module for the document.""" + + destination = None + """Where to write the document.""" + + transforms = () + """Ordered list of transform classes (each with a ``transform()`` method). + Populated by subclasses. `Writer.transform()` instantiates & runs them.""" + + def __init__(self): + """Initialize the Writer instance.""" + + self.transforms = list(self.transforms) + """Instance copy of `Writer.transforms`; may be modified by client.""" + + def write(self, document, destination): + self.document = document + self.language = languages.getlanguage(document.languagecode) + self.destination = destination + self.transform() + self.translate() + self.record() + + def transform(self): + """Run all of the transforms defined for this Writer.""" + for xclass in (universal.first_writer_transforms + + tuple(self.transforms) + + universal.last_writer_transforms): + xclass(self.document).transform() + + def translate(self): + """Override to do final document tree translation.""" + raise NotImplementedError('subclass must override this method') + + def record(self): + """Override to record `document` to `destination`.""" + raise NotImplementedError('subclass must override this method') + + def recordfile(self, output, destination): + """ + Write `output` to a single file. + + Parameters: + + - `output`: Data to write. + - `destination`: one of: + + (a) a file-like object, which is written directly; + (b) a path to a file, which is opened and then written; or + (c) `None`, which implies `sys.stdout`. + """ + output = output.encode('raw-unicode-escape') # @@@ temporary + if hasattr(self.destination, 'write'): + destination.write(output) + elif self.destination: + open(self.destination, 'w').write(output) + else: + sys.stdout.write(output) + + +_writer_aliases = { + 'html': 'html4css1',} + +def get_writer_class(writername): + """Return the Writer class from the `writername` module.""" + writername = writername.lower() + if _writer_aliases.has_key(writername): + writername = _writer_aliases[writername] + module = __import__(writername, globals(), locals()) + return module.Writer diff --git a/docutils/writers/html4css1.py b/docutils/writers/html4css1.py new file mode 100644 index 000000000..dde26f7d0 --- /dev/null +++ b/docutils/writers/html4css1.py @@ -0,0 +1,759 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Simple HyperText Markup Language document tree Writer. + +The output uses the HTML 4.01 Transitional DTD (*almost* strict) and +contains a minimum of formatting information. A cascading style sheet +"default.css" is required for proper viewing with a browser. +""" + +__docformat__ = 'reStructuredText' + + +import time +from types import ListType +from docutils import writers, nodes, languages + + +class Writer(writers.Writer): + + output = None + """Final translated form of `document`.""" + + def translate(self): + visitor = HTMLTranslator(self.document) + self.document.walkabout(visitor) + self.output = visitor.astext() + + def record(self): + self.recordfile(self.output, self.destination) + + +class HTMLTranslator(nodes.NodeVisitor): + + def __init__(self, doctree): + nodes.NodeVisitor.__init__(self, doctree) + self.language = languages.getlanguage(doctree.languagecode) + self.head = ['<!DOCTYPE HTML PUBLIC' + ' "-//W3C//DTD HTML 4.01 Transitional//EN"\n' + ' "http://www.w3.org/TR/html4/loose.dtd">\n', + '<HTML LANG="%s">\n<HEAD>\n' % doctree.languagecode, + '<LINK REL="StyleSheet" HREF="default.css"' + ' TYPE="text/css">\n'] + self.body = ['</HEAD>\n<BODY>\n'] + self.foot = ['</BODY>\n</HTML>\n'] + self.sectionlevel = 0 + self.context = [] + self.topic_class = '' + + def astext(self): + return ''.join(self.head + self.body + self.foot) + + def encode(self, text): + """Encode special characters in `text` & return.""" + text = text.replace("&", "&") + text = text.replace("<", "<") + text = text.replace('"', """) + text = text.replace(">", ">") + return text + + def starttag(self, node, tagname, suffix='\n', **attributes): + atts = {} + for (name, value) in attributes.items(): + atts[name.lower()] = value + for att in ('class',): # append to node attribute + if node.has_key(att): + if atts.has_key(att): + atts[att] = node[att] + ' ' + atts[att] + for att in ('id',): # node attribute overrides + if node.has_key(att): + atts[att] = node[att] + attlist = atts.items() + attlist.sort() + parts = [tagname.upper()] + for name, value in attlist: + if value is None: # boolean attribute + parts.append(name.upper()) + elif isinstance(value, ListType): + values = [str(v) for v in value] + parts.append('%s="%s"' % (name.upper(), + self.encode(' '.join(values)))) + else: + parts.append('%s="%s"' % (name.upper(), + self.encode(str(value)))) + return '<%s>%s' % (' '.join(parts), suffix) + + def visit_Text(self, node): + self.body.append(self.encode(node.astext())) + + def depart_Text(self, node): + pass + + def visit_admonition(self, node, name): + self.body.append(self.starttag(node, 'div', CLASS=name)) + self.body.append('<P CLASS="admonition-title">' + self.language.labels[name] + '</P>\n') + + def depart_admonition(self): + self.body.append('</DIV>\n') + + def visit_attention(self, node): + self.visit_admonition(node, 'attention') + + def depart_attention(self, node): + self.depart_admonition() + + def visit_author(self, node): + self.visit_docinfo_item(node, 'author') + + def depart_author(self, node): + self.depart_docinfo_item() + + def visit_authors(self, node): + pass + + def depart_authors(self, node): + pass + + def visit_block_quote(self, node): + self.body.append(self.starttag(node, 'blockquote')) + + def depart_block_quote(self, node): + self.body.append('</BLOCKQUOTE>\n') + + def visit_bullet_list(self, node): + if self.topic_class == 'contents': + self.body.append(self.starttag(node, 'ul', compact=None)) + else: + self.body.append(self.starttag(node, 'ul')) + + def depart_bullet_list(self, node): + self.body.append('</UL>\n') + + def visit_caption(self, node): + self.body.append(self.starttag(node, 'p', '', CLASS='caption')) + + def depart_caption(self, node): + self.body.append('</P>\n') + + def visit_caution(self, node): + self.visit_admonition(node, 'caution') + + def depart_caution(self, node): + self.depart_admonition() + + def visit_citation(self, node): + self.body.append(self.starttag(node, 'table', CLASS='citation', + frame="void", rules="none")) + self.body.append('<COL CLASS="label">\n' + '<COL>\n' + '<TBODY VALIGN="top">\n' + '<TR><TD>\n') + + def depart_citation(self, node): + self.body.append('</TD></TR>\n' + '</TBODY>\n</TABLE>\n') + + def visit_citation_reference(self, node): + href = '' + if node.has_key('refid'): + href = '#' + node['refid'] + elif node.has_key('refname'): + href = '#' + self.doctree.nameids[node['refname']] + self.body.append(self.starttag(node, 'a', '[', href=href, #node['refid'], + CLASS='citation-reference')) + + def depart_citation_reference(self, node): + self.body.append(']</A>') + + def visit_classifier(self, node): + self.body.append(' <SPAN CLASS="classifier-delimiter">:</SPAN> ') + self.body.append(self.starttag(node, 'span', '', CLASS='classifier')) + + def depart_classifier(self, node): + self.body.append('</SPAN>') + + def visit_colspec(self, node): + atts = {} + #if node.has_key('colwidth'): + # atts['width'] = str(node['colwidth']) + '*' + self.body.append(self.starttag(node, 'col', **atts)) + + def depart_colspec(self, node): + pass + + def visit_comment(self, node): + self.body.append('<!-- ') + + def depart_comment(self, node): + self.body.append(' -->\n') + + def visit_contact(self, node): + self.visit_docinfo_item(node, 'contact') + + def depart_contact(self, node): + self.depart_docinfo_item() + + def visit_copyright(self, node): + self.visit_docinfo_item(node, 'copyright') + + def depart_copyright(self, node): + self.depart_docinfo_item() + + def visit_danger(self, node): + self.visit_admonition(node, 'danger') + + def depart_danger(self, node): + self.depart_admonition() + + def visit_date(self, node): + self.visit_docinfo_item(node, 'date') + + def depart_date(self, node): + self.depart_docinfo_item() + + def visit_definition(self, node): + self.body.append('</DT>\n') + self.body.append(self.starttag(node, 'dd')) + + def depart_definition(self, node): + self.body.append('</DD>\n') + + def visit_definition_list(self, node): + self.body.append(self.starttag(node, 'dl')) + + def depart_definition_list(self, node): + self.body.append('</DL>\n') + + def visit_definition_list_item(self, node): + pass + + def depart_definition_list_item(self, node): + pass + + def visit_description(self, node): + self.body.append('<TD>\n') + + def depart_description(self, node): + self.body.append('</TD>') + + def visit_docinfo(self, node): + self.body.append(self.starttag(node, 'table', CLASS='docinfo', + frame="void", rules="none")) + self.body.append('<COL CLASS="docinfo-name">\n' + '<COL CLASS="docinfo-content">\n' + '<TBODY VALIGN="top">\n') + + def depart_docinfo(self, node): + self.body.append('</TBODY>\n</TABLE>\n') + + def visit_docinfo_item(self, node, name): + self.head.append('<META NAME="%s" CONTENT="%s">\n' + % (name, self.encode(node.astext()))) + self.body.append(self.starttag(node, 'tr', '')) + self.body.append('<TD>\n' + '<P CLASS="docinfo-name">%s:</P>\n' + '</TD><TD>\n' + '<P>' % self.language.labels[name]) + + def depart_docinfo_item(self): + self.body.append('</P>\n</TD></TR>') + + def visit_doctest_block(self, node): + self.body.append(self.starttag(node, 'pre', CLASS='doctest-block')) + + def depart_doctest_block(self, node): + self.body.append('</PRE>\n') + + def visit_document(self, node): + self.body.append(self.starttag(node, 'div', CLASS='document')) + + def depart_document(self, node): + self.body.append('</DIV>\n') + #self.body.append( + # '<P CLASS="credits">HTML generated from <CODE>%s</CODE> on %s ' + # 'by <A HREF="http://docutils.sourceforge.net/">Docutils</A>.' + # '</P>\n' % (node['source'], time.strftime('%Y-%m-%d'))) + + def visit_emphasis(self, node): + self.body.append('<EM>') + + def depart_emphasis(self, node): + self.body.append('</EM>') + + def visit_entry(self, node): + if isinstance(node.parent.parent, nodes.thead): + tagname = 'th' + else: + tagname = 'td' + atts = {} + if node.has_key('morerows'): + atts['rowspan'] = node['morerows'] + 1 + if node.has_key('morecols'): + atts['colspan'] = node['morecols'] + 1 + self.body.append(self.starttag(node, tagname, **atts)) + self.context.append('</%s>' % tagname.upper()) + if len(node) == 0: # empty cell + self.body.append(' ') + + def depart_entry(self, node): + self.body.append(self.context.pop()) + + def visit_enumerated_list(self, node): + """ + The 'start' attribute does not conform to HTML 4.01's strict.dtd, but + CSS1 doesn't help. CSS2 isn't widely enough supported yet to be + usable. + """ + atts = {} + if node.has_key('start'): + atts['start'] = node['start'] + if node.has_key('enumtype'): + atts['class'] = node['enumtype'] + # @@@ To do: prefix, suffix. How? Change prefix/suffix to a + # single "format" attribute? Use CSS2? + self.body.append(self.starttag(node, 'ol', **atts)) + + def depart_enumerated_list(self, node): + self.body.append('</OL>\n') + + def visit_error(self, node): + self.visit_admonition(node, 'error') + + def depart_error(self, node): + self.depart_admonition() + + def visit_field(self, node): + self.body.append(self.starttag(node, 'tr', CLASS='field')) + + def depart_field(self, node): + self.body.append('</TR>\n') + + def visit_field_argument(self, node): + self.body.append(' ') + self.body.append(self.starttag(node, 'span', '', + CLASS='field-argument')) + + def depart_field_argument(self, node): + self.body.append('</SPAN>') + + def visit_field_body(self, node): + self.body.append(':</P>\n</TD><TD>') + self.body.append(self.starttag(node, 'div', CLASS='field-body')) + + def depart_field_body(self, node): + self.body.append('</DIV></TD>\n') + + def visit_field_list(self, node): + self.body.append(self.starttag(node, 'table', frame='void', + rules='none')) + self.body.append('<COL CLASS="field-name">\n' + '<COL CLASS="field-body">\n' + '<TBODY VALIGN="top">\n') + + def depart_field_list(self, node): + self.body.append('</TBODY>\n</TABLE>\n') + + def visit_field_name(self, node): + self.body.append('<TD>\n') + self.body.append(self.starttag(node, 'p', '', CLASS='field-name')) + + def depart_field_name(self, node): + """ + Leave the end tag to `self.visit_field_body()`, in case there are any + field_arguments. + """ + pass + + def visit_figure(self, node): + self.body.append(self.starttag(node, 'div', CLASS='figure')) + + def depart_figure(self, node): + self.body.append('</DIV>\n') + + def visit_footnote(self, node): + self.body.append(self.starttag(node, 'table', CLASS='footnote', + frame="void", rules="none")) + self.body.append('<COL CLASS="label">\n' + '<COL>\n' + '<TBODY VALIGN="top">\n' + '<TR><TD>\n') + + def depart_footnote(self, node): + self.body.append('</TD></TR>\n' + '</TBODY>\n</TABLE>\n') + + def visit_footnote_reference(self, node): + href = '' + if node.has_key('refid'): + href = '#' + node['refid'] + elif node.has_key('refname'): + href = '#' + self.doctree.nameids[node['refname']] + self.body.append(self.starttag(node, 'a', '', href=href, #node['refid'], + CLASS='footnote-reference')) + + def depart_footnote_reference(self, node): + self.body.append('</A>') + + def visit_hint(self, node): + self.visit_admonition(node, 'hint') + + def depart_hint(self, node): + self.depart_admonition() + + def visit_image(self, node): + atts = node.attributes.copy() + atts['src'] = atts['uri'] + del atts['uri'] + if not atts.has_key('alt'): + atts['alt'] = atts['src'] + self.body.append(self.starttag(node, 'img', '', **atts)) + + def depart_image(self, node): + pass + + def visit_important(self, node): + self.visit_admonition(node, 'important') + + def depart_important(self, node): + self.depart_admonition() + + def visit_interpreted(self, node): + self.body.append('<SPAN class="interpreted">') + + def depart_interpreted(self, node): + self.body.append('</SPAN>') + + def visit_label(self, node): + self.body.append(self.starttag(node, 'p', '[', CLASS='label')) + + def depart_label(self, node): + self.body.append(']</P>\n' + '</TD><TD>\n') + + def visit_legend(self, node): + self.body.append(self.starttag(node, 'div', CLASS='legend')) + + def depart_legend(self, node): + self.body.append('</DIV>\n') + + def visit_list_item(self, node): + self.body.append(self.starttag(node, 'li')) + + def depart_list_item(self, node): + self.body.append('</LI>\n') + + def visit_literal(self, node): + self.body.append('<CODE>') + + def depart_literal(self, node): + self.body.append('</CODE>') + + def visit_literal_block(self, node): + self.body.append(self.starttag(node, 'pre', CLASS='literal-block')) + + def depart_literal_block(self, node): + self.body.append('</PRE>\n') + + def visit_meta(self, node): + self.head.append(self.starttag(node, 'meta', **node.attributes)) + + def depart_meta(self, node): + pass + + def visit_note(self, node): + self.visit_admonition(node, 'note') + + def depart_note(self, node): + self.depart_admonition() + + def visit_option(self, node): + if self.context[-1]: + self.body.append(', ') + + def depart_option(self, node): + self.context[-1] += 1 + + def visit_option_argument(self, node): + self.body.append(node.get('delimiter', ' ')) + self.body.append(self.starttag(node, 'span', '', + CLASS='option-argument')) + + def depart_option_argument(self, node): + self.body.append('</SPAN>') + + def visit_option_group(self, node): + atts = {} + if len(node.astext()) > 14: + atts['colspan'] = 2 + self.context.append('</TR>\n<TR><TD> </TD>') + else: + self.context.append('') + self.body.append(self.starttag(node, 'td', **atts)) + self.body.append('<P><CODE>') + self.context.append(0) + + def depart_option_group(self, node): + self.context.pop() + self.body.append('</CODE></P>\n</TD>') + self.body.append(self.context.pop()) + + def visit_option_list(self, node): + self.body.append( + self.starttag(node, 'table', CLASS='option-list', + frame="void", rules="none", cellspacing=12)) + self.body.append('<COL CLASS="option">\n' + '<COL CLASS="description">\n' + '<TBODY VALIGN="top">\n') + + def depart_option_list(self, node): + self.body.append('</TBODY>\n</TABLE>\n') + + def visit_option_list_item(self, node): + self.body.append(self.starttag(node, 'tr', '')) + + def depart_option_list_item(self, node): + self.body.append('</TR>\n') + + def visit_option_string(self, node): + self.body.append(self.starttag(node, 'span', '', CLASS='option')) + + def depart_option_string(self, node): + self.body.append('</SPAN>') + + def visit_organization(self, node): + self.visit_docinfo_item(node, 'organization') + + def depart_organization(self, node): + self.depart_docinfo_item() + + def visit_paragraph(self, node): + if not self.topic_class == 'contents': + self.body.append(self.starttag(node, 'p', '')) + + def depart_paragraph(self, node): + if self.topic_class == 'contents': + self.body.append('\n') + else: + self.body.append('</P>\n') + + def visit_problematic(self, node): + if node.hasattr('refid'): + self.body.append('<A HREF="#%s">' % node['refid']) + self.context.append('</A>') + else: + self.context.append('') + self.body.append(self.starttag(node, 'span', '', CLASS='problematic')) + + def depart_problematic(self, node): + self.body.append('</SPAN>') + self.body.append(self.context.pop()) + + def visit_raw(self, node): + if node.has_key('format') and node['format'] == 'html': + self.body.append(node.astext()) + raise nodes.SkipNode + + def visit_reference(self, node): + if node.has_key('refuri'): + href = node['refuri'] + elif node.has_key('refid'): + #else: + href = '#' + node['refid'] + elif node.has_key('refname'): + # @@@ Check for non-existent mappings. Here or in a transform? + href = '#' + self.doctree.nameids[node['refname']] + self.body.append(self.starttag(node, 'a', '', href=href, + CLASS='reference')) + + def depart_reference(self, node): + self.body.append('</A>') + + def visit_revision(self, node): + self.visit_docinfo_item(node, 'revision') + + def depart_revision(self, node): + self.depart_docinfo_item() + + def visit_row(self, node): + self.body.append(self.starttag(node, 'tr', '')) + + def depart_row(self, node): + self.body.append('</TR>\n') + + def visit_section(self, node): + self.sectionlevel += 1 + self.body.append(self.starttag(node, 'div', CLASS='section')) + + def depart_section(self, node): + self.sectionlevel -= 1 + self.body.append('</DIV>\n') + + def visit_status(self, node): + self.visit_docinfo_item(node, 'status') + + def depart_status(self, node): + self.depart_docinfo_item() + + def visit_strong(self, node): + self.body.append('<STRONG>') + + def depart_strong(self, node): + self.body.append('</STRONG>') + + def visit_substitution_definition(self, node): + raise nodes.SkipChildren + + def depart_substitution_definition(self, node): + pass + + def visit_substitution_reference(self, node): + self.unimplemented_visit(node) + + def visit_subtitle(self, node): + self.body.append(self.starttag(node, 'H2', '', CLASS='subtitle')) + + def depart_subtitle(self, node): + self.body.append('</H1>\n') + + def visit_system_message(self, node): + if node['level'] < self.doctree.reporter['writer'].warninglevel: + raise nodes.SkipNode + self.body.append(self.starttag(node, 'div', CLASS='system-message')) + self.body.append('<P CLASS="system-message-title">') + if node.hasattr('backrefs'): + backrefs = node['backrefs'] + if len(backrefs) == 1: + self.body.append('<A HREF="#%s">%s</A> ' + '(level %s system message)</P>\n' + % (backrefs[0], node['type'], node['level'])) + else: + i = 1 + backlinks = [] + for backref in backrefs: + backlinks.append('<A HREF="#%s">%s</A>' % (backref, i)) + i += 1 + self.body.append('%s (%s; level %s system message)</P>\n' + % (node['type'], '|'.join(backlinks), + node['level'])) + else: + self.body.append('%s (level %s system message)</P>\n' + % (node['type'], node['level'])) + + def depart_system_message(self, node): + self.body.append('</DIV>\n') + + def visit_table(self, node): + self.body.append( + self.starttag(node, 'table', frame='border', rules='all')) + + def depart_table(self, node): + self.body.append('</TABLE>\n') + + def visit_target(self, node): + if not (node.has_key('refuri') or node.has_key('refid') + or node.has_key('refname')): + self.body.append(self.starttag(node, 'a', '', CLASS='target')) + self.context.append('</A>') + else: + self.context.append('') + + def depart_target(self, node): + self.body.append(self.context.pop()) + + def visit_tbody(self, node): + self.body.append(self.context.pop()) # '</COLGROUP>\n' or '' + self.body.append(self.starttag(node, 'tbody', valign='top')) + + def depart_tbody(self, node): + self.body.append('</TBODY>\n') + + def visit_term(self, node): + self.body.append(self.starttag(node, 'dt', '')) + + def depart_term(self, node): + """ + Leave the end tag to `self.visit_definition()`, in case there's a + classifier. + """ + pass + + def visit_tgroup(self, node): + self.body.append(self.starttag(node, 'colgroup')) + self.context.append('</COLGROUP>\n') + + def depart_tgroup(self, node): + pass + + def visit_thead(self, node): + self.body.append(self.context.pop()) # '</COLGROUP>\n' + self.context.append('') + self.body.append(self.starttag(node, 'thead', valign='bottom')) + + def depart_thead(self, node): + self.body.append('</THEAD>\n') + + def visit_tip(self, node): + self.visit_admonition(node, 'tip') + + def depart_tip(self, node): + self.depart_admonition() + + def visit_title(self, node): + """Only 6 section levels are supported by HTML.""" + if isinstance(node.parent, nodes.topic): + self.body.append( + self.starttag(node, 'P', '', CLASS='topic-title')) + self.context.append('</P>\n') + elif self.sectionlevel == 0: + self.head.append('<TITLE>%s</TITLE>\n' + % self.encode(node.astext())) + self.body.append(self.starttag(node, 'H1', '', CLASS='title')) + self.context.append('</H1>\n') + else: + self.body.append( + self.starttag(node, 'H%s' % self.sectionlevel, '')) + context = '' + if node.hasattr('refid'): + self.body.append('<A HREF="#%s">' % node['refid']) + context = '</A>' + self.context.append('%s</H%s>\n' % (context, self.sectionlevel)) + + def depart_title(self, node): + self.body.append(self.context.pop()) + + def visit_topic(self, node): + self.body.append(self.starttag(node, 'div', CLASS='topic')) + self.topic_class = node.get('class') + + def depart_topic(self, node): + self.body.append('</DIV>\n') + self.topic_class = '' + + def visit_transition(self, node): + self.body.append(self.starttag(node, 'hr')) + + def depart_transition(self, node): + pass + + def visit_version(self, node): + self.visit_docinfo_item(node, 'version') + + def depart_version(self, node): + self.depart_docinfo_item() + + def visit_warning(self, node): + self.visit_admonition(node, 'warning') + + def depart_warning(self, node): + self.depart_admonition() + + def unimplemented_visit(self, node): + raise NotImplementedError('visiting unimplemented node type: %s' + % node.__class__.__name__) diff --git a/docutils/writers/pprint.py b/docutils/writers/pprint.py new file mode 100644 index 000000000..a34c2a920 --- /dev/null +++ b/docutils/writers/pprint.py @@ -0,0 +1,28 @@ +#! /usr/bin/env python + +""" +:Authors: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Simple internal document tree Writer, writes indented pseudo-XML. +""" + +__docformat__ = 'reStructuredText' + + +from docutils import writers + + +class Writer(writers.Writer): + + output = None + """Final translated form of `document`.""" + + def translate(self): + self.output = self.document.pformat() + + def record(self): + self.recordfile(self.output, self.destination) diff --git a/install.py b/install.py new file mode 100755 index 000000000..be9ed238b --- /dev/null +++ b/install.py @@ -0,0 +1,20 @@ +#!/usr/bin/env python +# $Id$ + +""" +This is a quick & dirty installation shortcut. It is equivalent to the +command:: + + python setup.py install + +However, the shortcut lacks error checking! +""" + +from distutils import core +from setup import do_setup + +if __name__ == '__main__' : + core._setup_stop_after = 'config' + dist = do_setup() + dist.commands = ['install'] + dist.run_commands() diff --git a/setup.py b/setup.py new file mode 100755 index 000000000..23aa0ce4a --- /dev/null +++ b/setup.py @@ -0,0 +1,24 @@ +#!/usr/bin/env python +# $Id$ + +from distutils.core import setup + +def do_setup(): + dist = setup( + name = 'Docutils', + description = 'Python Documentation Utilities', + #long_description = '', + url = 'http://docutils.sourceforge.net/', + version = 'pre-0.1', + author = 'David Goodger', + author_email = 'goodger@users.sourceforge.net', + license = 'public domain, Python (see COPYING.txt)', + packages = ['docutils', 'docutils.readers', 'docutils.writers', + 'docutils.transforms', 'docutils.languages', + 'docutils.parsers', 'docutils.parsers.restructuredtext', + 'docutils.parsers.restructuredtext.directives', + 'docutils.parsers.restructuredtext.languages']) + return dist + +if __name__ == '__main__' : + do_setup() diff --git a/test/DocutilsTestSupport.py b/test/DocutilsTestSupport.py new file mode 100644 index 000000000..766eafde6 --- /dev/null +++ b/test/DocutilsTestSupport.py @@ -0,0 +1,379 @@ +#! /usr/bin/env python + +""" +:Authors: David Goodger; Garth Kidd +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Exports the following: + +:Modules: + - `statemachine` is 'docutils.statemachine' + - `nodes` is 'docutils.nodes' + - `urischemes` is 'docutils.urischemes' + - `utils` is 'docutils.utils' + - `transforms` is 'docutils.transforms' + - `states` is 'docutils.parsers.rst.states' + - `tableparser` is 'docutils.parsers.rst.tableparser' + +:Classes: + - `CustomTestSuite` + - `CustomTestCase` + - `ParserTestSuite` + - `ParserTestCase` + - `TableParserTestSuite` + - `TableParserTestCase` +""" +__docformat__ = 'reStructuredText' + +import UnitTestFolder +import sys, os, unittest, difflib, inspect, os, sys +from pprint import pformat +import docutils +from docutils import statemachine, nodes, urischemes, utils, transforms +from docutils.transforms import universal +from docutils.parsers import rst +from docutils.parsers.rst import states, tableparser, directives, languages +from docutils.statemachine import string2lines + +try: + import mypdb as pdb +except: + import pdb + + +class CustomTestSuite(unittest.TestSuite): + + """ + A collection of custom TestCases. + + """ + + id = '' + """Identifier for the TestSuite. Prepended to the + TestCase identifiers to make identification easier.""" + + nextTestCaseId = 0 + """The next identifier to use for non-identified test cases.""" + + def __init__(self, tests=(), id=None): + """ + Initialize the CustomTestSuite. + + Arguments: + + id -- identifier for the suite, prepended to test cases. + """ + unittest.TestSuite.__init__(self, tests) + if id is None: + outerframes = inspect.getouterframes(inspect.currentframe()) + mypath = outerframes[0][1] + for outerframe in outerframes[1:]: + if outerframe[3] != '__init__': + callerpath = outerframe[1] + break + mydir, myname = os.path.split(mypath) + if not mydir: + mydir = os.curdir + if callerpath.startswith(mydir): + self.id = callerpath[len(mydir) + 1:] # caller's module + else: + self.id = callerpath + else: + self.id = id + + def addTestCase(self, testCaseClass, methodName, input, expected, + id=None, runInDebugger=0, shortDescription=None, + **kwargs): + """ + Create a custom TestCase in the CustomTestSuite. + Also return it, just in case. + + Arguments: + + testCaseClass -- + methodName -- + input -- input to the parser. + expected -- expected output from the parser. + id -- unique test identifier, used by the test framework. + runInDebugger -- if true, run this test under the pdb debugger. + shortDescription -- override to default test description. + """ + if id is None: # generate id if required + id = self.nextTestCaseId + self.nextTestCaseId += 1 + # test identifier will become suiteid.testid + tcid = '%s: %s' % (self.id, id) + # generate and add test case + tc = testCaseClass(methodName, input, expected, tcid, + runInDebugger=runInDebugger, + shortDescription=shortDescription, + **kwargs) + self.addTest(tc) + return tc + + +class CustomTestCase(unittest.TestCase): + + compare = difflib.Differ().compare + """Comparison method shared by all subclasses.""" + + def __init__(self, methodName, input, expected, id, + runInDebugger=0, shortDescription=None): + """ + Initialise the CustomTestCase. + + Arguments: + + methodName -- name of test method to run. + input -- input to the parser. + expected -- expected output from the parser. + id -- unique test identifier, used by the test framework. + runInDebugger -- if true, run this test under the pdb debugger. + shortDescription -- override to default test description. + """ + self.id = id + self.input = input + self.expected = expected + self.runInDebugger = runInDebugger + # Ring your mother. + unittest.TestCase.__init__(self, methodName) + + def __str__(self): + """ + Return string conversion. Overridden to give test id, in addition to + method name. + """ + return '%s; %s' % (self.id, unittest.TestCase.__str__(self)) + + def compareOutput(self, input, output, expected): + """`input`, `output`, and `expected` should all be strings.""" + try: + self.assertEquals('\n' + output, '\n' + expected) + except AssertionError: + print >>sys.stderr, '\n%s\ninput:' % (self,) + print >>sys.stderr, input + print >>sys.stderr, '-: expected\n+: output' + print >>sys.stderr, ''.join(self.compare(expected.splitlines(1), + output.splitlines(1))) + raise + + +class TransformTestSuite(CustomTestSuite): + + """ + A collection of TransformTestCases. + + A TransformTestSuite instance manufactures TransformTestCases, + keeps track of them, and provides a shared test fixture (a-la + setUp and tearDown). + """ + + def __init__(self, parser): + self.parser = parser + """Parser shared by all test cases.""" + + CustomTestSuite.__init__(self) + + def generateTests(self, dict, dictname='totest', + testmethod='test_transforms'): + """ + Stock the suite with test cases generated from a test data dictionary. + + Each dictionary key (test type's name) maps to a list of transform + classes and list of tests. Each test is a list: input, expected + output, optional modifier. The optional third entry, a behavior + modifier, can be 0 (temporarily disable this test) or 1 (run this test + under the pdb debugger). Tests should be self-documenting and not + require external comments. + """ + for name, (transforms, cases) in dict.items(): + for casenum in range(len(cases)): + case = cases[casenum] + runInDebugger = 0 + if len(case)==3: + if case[2]: + runInDebugger = 1 + else: + continue + self.addTestCase( + TransformTestCase, testmethod, + transforms=transforms, parser=self.parser, + input=case[0], expected=case[1], + id='%s[%r][%s]' % (dictname, name, casenum), + runInDebugger=runInDebugger) + + +class TransformTestCase(CustomTestCase): + + """ + Output checker for the transform. + + Should probably be called TransformOutputChecker, but I can deal with + that later when/if someone comes up with a category of transform test + cases that have nothing to do with the input and output of the transform. + """ + + def __init__(self, *args, **kwargs): + self.transforms = kwargs['transforms'] + """List of transforms to perform for this test case.""" + + self.parser = kwargs['parser'] + """Input parser for this test case.""" + + del kwargs['transforms'], kwargs['parser'] # only wanted here + CustomTestCase.__init__(self, *args, **kwargs) + + def test_transforms(self): + if self.runInDebugger: + pdb.set_trace() + doctree = utils.newdocument(warninglevel=5, errorlevel=5, + debug=UnitTestFolder.debug) + self.parser.parse(self.input, doctree) + for transformClass in (self.transforms + universal.test_transforms): + transformClass(doctree).transform() + output = doctree.pformat() + self.compareOutput(self.input, output, self.expected) + + def test_transforms_verbosely(self): + if self.runInDebugger: + pdb.set_trace() + print '\n', self.id + print '-' * 70 + print self.input + doctree = utils.newdocument(warninglevel=5, errorlevel=5, + debug=UnitTestFolder.debug) + self.parser.parse(self.input, doctree) + print '-' * 70 + print doctree.pformat() + for transformClass in self.transforms: + transformClass(doctree).transform() + output = doctree.pformat() + print '-' * 70 + print output + self.compareOutput(self.input, output, self.expected) + + +class ParserTestSuite(CustomTestSuite): + + """ + A collection of ParserTestCases. + + A ParserTestSuite instance manufactures ParserTestCases, + keeps track of them, and provides a shared test fixture (a-la + setUp and tearDown). + """ + + def generateTests(self, dict, dictname='totest'): + """ + Stock the suite with test cases generated from a test data dictionary. + + Each dictionary key (test type name) maps to a list of tests. Each + test is a list: input, expected output, optional modifier. The + optional third entry, a behavior modifier, can be 0 (temporarily + disable this test) or 1 (run this test under the pdb debugger). Tests + should be self-documenting and not require external comments. + """ + for name, cases in dict.items(): + for casenum in range(len(cases)): + case = cases[casenum] + runInDebugger = 0 + if len(case)==3: + if case[2]: + runInDebugger = 1 + else: + continue + self.addTestCase( + ParserTestCase, 'test_parser', + input=case[0], expected=case[1], + id='%s[%r][%s]' % (dictname, name, casenum), + runInDebugger=runInDebugger) + + +class ParserTestCase(CustomTestCase): + + """ + Output checker for the parser. + + Should probably be called ParserOutputChecker, but I can deal with + that later when/if someone comes up with a category of parser test + cases that have nothing to do with the input and output of the parser. + """ + + parser = rst.Parser() + """Parser shared by all ParserTestCases.""" + + def test_parser(self): + if self.runInDebugger: + pdb.set_trace() + document = utils.newdocument(warninglevel=5, errorlevel=5, + debug=UnitTestFolder.debug) + self.parser.parse(self.input, document) + output = document.pformat() + self.compareOutput(self.input, output, self.expected) + + +class TableParserTestSuite(CustomTestSuite): + + """ + A collection of TableParserTestCases. + + A TableParserTestSuite instance manufactures TableParserTestCases, + keeps track of them, and provides a shared test fixture (a-la + setUp and tearDown). + """ + + def generateTests(self, dict, dictname='totest'): + """ + Stock the suite with test cases generated from a test data dictionary. + + Each dictionary key (test type name) maps to a list of tests. Each + test is a list: an input table, expected output from parsegrid(), + expected output from parse(), optional modifier. The optional fourth + entry, a behavior modifier, can be 0 (temporarily disable this test) + or 1 (run this test under the pdb debugger). Tests should be + self-documenting and not require external comments. + """ + for name, cases in dict.items(): + for casenum in range(len(cases)): + case = cases[casenum] + runInDebugger = 0 + if len(case) == 4: + if case[3]: + runInDebugger = 1 + else: + continue + self.addTestCase(TableParserTestCase, 'test_parsegrid', + input=case[0], expected=case[1], + id='%s[%r][%s]' % (dictname, name, casenum), + runInDebugger=runInDebugger) + self.addTestCase(TableParserTestCase, 'test_parse', + input=case[0], expected=case[2], + id='%s[%r][%s]' % (dictname, name, casenum), + runInDebugger=runInDebugger) + + +class TableParserTestCase(CustomTestCase): + + parser = tableparser.TableParser() + + def test_parsegrid(self): + self.parser.setup(string2lines(self.input)) + try: + self.parser.findheadbodysep() + self.parser.parsegrid() + output = self.parser.cells + except Exception, details: + output = '%s: %s' % (details.__class__.__name__, details) + self.compareOutput(self.input, pformat(output) + '\n', + pformat(self.expected) + '\n') + + def test_parse(self): + try: + output = self.parser.parse(string2lines(self.input)) + except Exception, details: + output = '%s: %s' % (details.__class__.__name__, details) + self.compareOutput(self.input, pformat(output) + '\n', + pformat(self.expected) + '\n') diff --git a/test/UnitTestFolder.py b/test/UnitTestFolder.py new file mode 100644 index 000000000..529d8c7e8 --- /dev/null +++ b/test/UnitTestFolder.py @@ -0,0 +1,135 @@ +#! /usr/bin/env python + +""" +:Author: Garth Kidd +:Contact: garth@deadlybloodyserious.com +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. +""" + +import sys, os, getopt, types, unittest, re + + +# So that individual test modules can share a bit of state, +# `UnitTestFolder` acts as an intermediary for the following +# variables: +debug = 0 +verbosity = 1 + +USAGE = """\ +Usage: test_whatever [options] + +Options: + -h, --help Show this message + -v, --verbose Verbose output + -q, --quiet Minimal output + -d, --debug Debug mode +""" + +def usageExit(msg=None): + """Print usage and exit.""" + if msg: + print msg + print USAGE + sys.exit(2) + +def parseArgs(argv=sys.argv): + """Parse command line arguments and set TestFramework state. + + State is to be acquired by test_* modules by a grotty hack: + ``from TestFramework import *``. For this stylistic + transgression, I expect to be first up against the wall + when the revolution comes. --Garth""" + global verbosity, debug + try: + options, args = getopt.getopt(argv[1:], 'hHvqd', + ['help', 'verbose', 'quiet', 'debug']) + for opt, value in options: + if opt in ('-h', '-H', '--help'): + usageExit() + if opt in ('-q', '--quiet'): + verbosity = 0 + if opt in ('-v', '--verbose'): + verbosity = 2 + if opt in ('-d', '--debug'): + debug =1 + if len(args) != 0: + usageExit("No command-line arguments supported yet.") + except getopt.error, msg: + self.usageExit(msg) + +def loadModulesFromFolder(path, name='', subfolders=None): + """ + Return a test suite composed of all the tests from modules in a folder. + + Search for modules in directory `path`, beginning with `name`. If + `subfolders` is true, search subdirectories (also beginning with `name`) + recursively. + """ + testLoader = unittest.defaultTestLoader + testSuite = unittest.TestSuite() + testModules = [] + paths = [path] + while paths: + p = paths.pop(0) + if not p: + p = os.curdir + files = os.listdir(p) + for filename in files: + if filename.startswith(name): + fullpath = os.path.join(p, filename) + if filename.endswith('.py'): + testModules.append(fullpath) + elif subfolders and os.path.isdir(fullpath): + paths.append(fullpath) + sys.path.insert(0, '') + # Import modules and add their tests to the suite. + for modpath in testModules: + if debug: + print >>sys.stderr, "importing %s" % modpath + sys.path[0], filename = os.path.split(modpath) + modname = filename[:-3] # strip off the '.py' + module = __import__(modname) + # if there's a suite defined, incorporate its contents + try: + suite = getattr(module, 'suite') + except AttributeError: + # Look for individual tests + moduleTests = testLoader.loadTestsFromModule(module) + # unittest.TestSuite.addTests() doesn't work as advertised, + # as it can't load tests from another TestSuite, so we have + # to cheat: + testSuite.addTest(moduleTests) + continue + if type(suite) == types.FunctionType: + testSuite.addTest(suite()) + elif type(suite) == types.InstanceType \ + and isinstance(suite, unittest.TestSuite): + testSuite.addTest(suite) + else: + raise AssertionError, "don't understand suite (%s)" % modpath + return testSuite + + +def main(suite=None): + """ + Shared `main` for any individual test_* file. + + suite -- TestSuite to run. If not specified, look for any globally defined + tests and run them. + """ + parseArgs() + if suite is None: + # Load any globally defined tests. + suite = unittest.defaultTestLoader.loadTestsFromModule( + __import__('__main__')) + if debug: + print >>sys.stderr, "Debug: Suite=%s" % suite + testRunner = unittest.TextTestRunner(verbosity=verbosity) + # run suites (if we were called from test_all) or suite... + if type(suite) == type([]): + for s in suite: + testRunner.run(s) + else: + testRunner.run(suite) diff --git a/test/alltests.py b/test/alltests.py new file mode 100755 index 000000000..930cbdc1b --- /dev/null +++ b/test/alltests.py @@ -0,0 +1,41 @@ +#!/usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. +""" + +import time +start = time.time() + +import sys, os + + +class Tee: + + """Write to a file and a stream (default: stdout) simultaneously.""" + + def __init__(self, filename, stream=sys.__stdout__): + self.file = open(filename, 'w') + self.stream = stream + + def write(self, string): + string = string.encode('raw-unicode-escape') + self.stream.write(string) + self.file.write(string) + +# must redirect stderr *before* first import of unittest +sys.stdout = sys.stderr = Tee('alltests.out') + +import UnitTestFolder + + +if __name__ == '__main__': + path, script = os.path.split(sys.argv[0]) + suite = UnitTestFolder.loadModulesFromFolder(path, 'test_', subfolders=1) + UnitTestFolder.main(suite) + finish = time.time() + print 'Elapsed time: %.3f seconds' % (finish - start) diff --git a/test/difflib.py b/test/difflib.py new file mode 100644 index 000000000..a41d4d5ba --- /dev/null +++ b/test/difflib.py @@ -0,0 +1,1089 @@ +#! /usr/bin/env python + +""" +Module difflib -- helpers for computing deltas between objects. + +Function get_close_matches(word, possibilities, n=3, cutoff=0.6): + Use SequenceMatcher to return list of the best "good enough" matches. + +Function ndiff(a, b): + Return a delta: the difference between `a` and `b` (lists of strings). + +Function restore(delta, which): + Return one of the two sequences that generated an ndiff delta. + +Class SequenceMatcher: + A flexible class for comparing pairs of sequences of any type. + +Class Differ: + For producing human-readable deltas from sequences of lines of text. +""" + +__all__ = ['get_close_matches', 'ndiff', 'restore', 'SequenceMatcher', + 'Differ'] + +TRACE = 0 + +class SequenceMatcher: + + """ + SequenceMatcher is a flexible class for comparing pairs of sequences of + any type, so long as the sequence elements are hashable. The basic + algorithm predates, and is a little fancier than, an algorithm + published in the late 1980's by Ratcliff and Obershelp under the + hyperbolic name "gestalt pattern matching". The basic idea is to find + the longest contiguous matching subsequence that contains no "junk" + elements (R-O doesn't address junk). The same idea is then applied + recursively to the pieces of the sequences to the left and to the right + of the matching subsequence. This does not yield minimal edit + sequences, but does tend to yield matches that "look right" to people. + + SequenceMatcher tries to compute a "human-friendly diff" between two + sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the + longest *contiguous* & junk-free matching subsequence. That's what + catches peoples' eyes. The Windows(tm) windiff has another interesting + notion, pairing up elements that appear uniquely in each sequence. + That, and the method here, appear to yield more intuitive difference + reports than does diff. This method appears to be the least vulnerable + to synching up on blocks of "junk lines", though (like blank lines in + ordinary text files, or maybe "<P>" lines in HTML files). That may be + because this is the only method of the 3 that has a *concept* of + "junk" <wink>. + + Example, comparing two strings, and considering blanks to be "junk": + + >>> s = SequenceMatcher(lambda x: x == " ", + ... "private Thread currentThread;", + ... "private volatile Thread currentThread;") + >>> + + .ratio() returns a float in [0, 1], measuring the "similarity" of the + sequences. As a rule of thumb, a .ratio() value over 0.6 means the + sequences are close matches: + + >>> print round(s.ratio(), 3) + 0.866 + >>> + + If you're only interested in where the sequences match, + .get_matching_blocks() is handy: + + >>> for block in s.get_matching_blocks(): + ... print "a[%d] and b[%d] match for %d elements" % block + a[0] and b[0] match for 8 elements + a[8] and b[17] match for 6 elements + a[14] and b[23] match for 15 elements + a[29] and b[38] match for 0 elements + + Note that the last tuple returned by .get_matching_blocks() is always a + dummy, (len(a), len(b), 0), and this is the only case in which the last + tuple element (number of elements matched) is 0. + + If you want to know how to change the first sequence into the second, + use .get_opcodes(): + + >>> for opcode in s.get_opcodes(): + ... print "%6s a[%d:%d] b[%d:%d]" % opcode + equal a[0:8] b[0:8] + insert a[8:8] b[8:17] + equal a[8:14] b[17:23] + equal a[14:29] b[23:38] + + See the Differ class for a fancy human-friendly file differencer, which + uses SequenceMatcher both to compare sequences of lines, and to compare + sequences of characters within similar (near-matching) lines. + + See also function get_close_matches() in this module, which shows how + simple code building on SequenceMatcher can be used to do useful work. + + Timing: Basic R-O is cubic time worst case and quadratic time expected + case. SequenceMatcher is quadratic time for the worst case and has + expected-case behavior dependent in a complicated way on how many + elements the sequences have in common; best case time is linear. + + Methods: + + __init__(isjunk=None, a='', b='') + Construct a SequenceMatcher. + + set_seqs(a, b) + Set the two sequences to be compared. + + set_seq1(a) + Set the first sequence to be compared. + + set_seq2(b) + Set the second sequence to be compared. + + find_longest_match(alo, ahi, blo, bhi) + Find longest matching block in a[alo:ahi] and b[blo:bhi]. + + get_matching_blocks() + Return list of triples describing matching subsequences. + + get_opcodes() + Return list of 5-tuples describing how to turn a into b. + + ratio() + Return a measure of the sequences' similarity (float in [0,1]). + + quick_ratio() + Return an upper bound on .ratio() relatively quickly. + + real_quick_ratio() + Return an upper bound on ratio() very quickly. + """ + + def __init__(self, isjunk=None, a='', b=''): + """Construct a SequenceMatcher. + + Optional arg isjunk is None (the default), or a one-argument + function that takes a sequence element and returns true iff the + element is junk. None is equivalent to passing "lambda x: 0", i.e. + no elements are considered to be junk. For example, pass + lambda x: x in " \\t" + if you're comparing lines as sequences of characters, and don't + want to synch up on blanks or hard tabs. + + Optional arg a is the first of two sequences to be compared. By + default, an empty string. The elements of a must be hashable. See + also .set_seqs() and .set_seq1(). + + Optional arg b is the second of two sequences to be compared. By + default, an empty string. The elements of b must be hashable. See + also .set_seqs() and .set_seq2(). + """ + + # Members: + # a + # first sequence + # b + # second sequence; differences are computed as "what do + # we need to do to 'a' to change it into 'b'?" + # b2j + # for x in b, b2j[x] is a list of the indices (into b) + # at which x appears; junk elements do not appear + # b2jhas + # b2j.has_key + # fullbcount + # for x in b, fullbcount[x] == the number of times x + # appears in b; only materialized if really needed (used + # only for computing quick_ratio()) + # matching_blocks + # a list of (i, j, k) triples, where a[i:i+k] == b[j:j+k]; + # ascending & non-overlapping in i and in j; terminated by + # a dummy (len(a), len(b), 0) sentinel + # opcodes + # a list of (tag, i1, i2, j1, j2) tuples, where tag is + # one of + # 'replace' a[i1:i2] should be replaced by b[j1:j2] + # 'delete' a[i1:i2] should be deleted + # 'insert' b[j1:j2] should be inserted + # 'equal' a[i1:i2] == b[j1:j2] + # isjunk + # a user-supplied function taking a sequence element and + # returning true iff the element is "junk" -- this has + # subtle but helpful effects on the algorithm, which I'll + # get around to writing up someday <0.9 wink>. + # DON'T USE! Only __chain_b uses this. Use isbjunk. + # isbjunk + # for x in b, isbjunk(x) == isjunk(x) but much faster; + # it's really the has_key method of a hidden dict. + # DOES NOT WORK for x in a! + + self.isjunk = isjunk + self.a = self.b = None + self.set_seqs(a, b) + + def set_seqs(self, a, b): + """Set the two sequences to be compared. + + >>> s = SequenceMatcher() + >>> s.set_seqs("abcd", "bcde") + >>> s.ratio() + 0.75 + """ + + self.set_seq1(a) + self.set_seq2(b) + + def set_seq1(self, a): + """Set the first sequence to be compared. + + The second sequence to be compared is not changed. + + >>> s = SequenceMatcher(None, "abcd", "bcde") + >>> s.ratio() + 0.75 + >>> s.set_seq1("bcde") + >>> s.ratio() + 1.0 + >>> + + SequenceMatcher computes and caches detailed information about the + second sequence, so if you want to compare one sequence S against + many sequences, use .set_seq2(S) once and call .set_seq1(x) + repeatedly for each of the other sequences. + + See also set_seqs() and set_seq2(). + """ + + if a is self.a: + return + self.a = a + self.matching_blocks = self.opcodes = None + + def set_seq2(self, b): + """Set the second sequence to be compared. + + The first sequence to be compared is not changed. + + >>> s = SequenceMatcher(None, "abcd", "bcde") + >>> s.ratio() + 0.75 + >>> s.set_seq2("abcd") + >>> s.ratio() + 1.0 + >>> + + SequenceMatcher computes and caches detailed information about the + second sequence, so if you want to compare one sequence S against + many sequences, use .set_seq2(S) once and call .set_seq1(x) + repeatedly for each of the other sequences. + + See also set_seqs() and set_seq1(). + """ + + if b is self.b: + return + self.b = b + self.matching_blocks = self.opcodes = None + self.fullbcount = None + self.__chain_b() + + # For each element x in b, set b2j[x] to a list of the indices in + # b where x appears; the indices are in increasing order; note that + # the number of times x appears in b is len(b2j[x]) ... + # when self.isjunk is defined, junk elements don't show up in this + # map at all, which stops the central find_longest_match method + # from starting any matching block at a junk element ... + # also creates the fast isbjunk function ... + # note that this is only called when b changes; so for cross-product + # kinds of matches, it's best to call set_seq2 once, then set_seq1 + # repeatedly + + def __chain_b(self): + # Because isjunk is a user-defined (not C) function, and we test + # for junk a LOT, it's important to minimize the number of calls. + # Before the tricks described here, __chain_b was by far the most + # time-consuming routine in the whole module! If anyone sees + # Jim Roskind, thank him again for profile.py -- I never would + # have guessed that. + # The first trick is to build b2j ignoring the possibility + # of junk. I.e., we don't call isjunk at all yet. Throwing + # out the junk later is much cheaper than building b2j "right" + # from the start. + b = self.b + self.b2j = b2j = {} + self.b2jhas = b2jhas = b2j.has_key + for i in xrange(len(b)): + elt = b[i] + if b2jhas(elt): + b2j[elt].append(i) + else: + b2j[elt] = [i] + + # Now b2j.keys() contains elements uniquely, and especially when + # the sequence is a string, that's usually a good deal smaller + # than len(string). The difference is the number of isjunk calls + # saved. + isjunk, junkdict = self.isjunk, {} + if isjunk: + for elt in b2j.keys(): + if isjunk(elt): + junkdict[elt] = 1 # value irrelevant; it's a set + del b2j[elt] + + # Now for x in b, isjunk(x) == junkdict.has_key(x), but the + # latter is much faster. Note too that while there may be a + # lot of junk in the sequence, the number of *unique* junk + # elements is probably small. So the memory burden of keeping + # this dict alive is likely trivial compared to the size of b2j. + self.isbjunk = junkdict.has_key + + def find_longest_match(self, alo, ahi, blo, bhi): + """Find longest matching block in a[alo:ahi] and b[blo:bhi]. + + If isjunk is not defined: + + Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where + alo <= i <= i+k <= ahi + blo <= j <= j+k <= bhi + and for all (i',j',k') meeting those conditions, + k >= k' + i <= i' + and if i == i', j <= j' + + In other words, of all maximal matching blocks, return one that + starts earliest in a, and of all those maximal matching blocks that + start earliest in a, return the one that starts earliest in b. + + >>> s = SequenceMatcher(None, " abcd", "abcd abcd") + >>> s.find_longest_match(0, 5, 0, 9) + (0, 4, 5) + + If isjunk is defined, first the longest matching block is + determined as above, but with the additional restriction that no + junk element appears in the block. Then that block is extended as + far as possible by matching (only) junk elements on both sides. So + the resulting block never matches on junk except as identical junk + happens to be adjacent to an "interesting" match. + + Here's the same example as before, but considering blanks to be + junk. That prevents " abcd" from matching the " abcd" at the tail + end of the second sequence directly. Instead only the "abcd" can + match, and matches the leftmost "abcd" in the second sequence: + + >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") + >>> s.find_longest_match(0, 5, 0, 9) + (1, 0, 4) + + If no blocks match, return (alo, blo, 0). + + >>> s = SequenceMatcher(None, "ab", "c") + >>> s.find_longest_match(0, 2, 0, 1) + (0, 0, 0) + """ + + # CAUTION: stripping common prefix or suffix would be incorrect. + # E.g., + # ab + # acab + # Longest matching block is "ab", but if common prefix is + # stripped, it's "a" (tied with "b"). UNIX(tm) diff does so + # strip, so ends up claiming that ab is changed to acab by + # inserting "ca" in the middle. That's minimal but unintuitive: + # "it's obvious" that someone inserted "ac" at the front. + # Windiff ends up at the same place as diff, but by pairing up + # the unique 'b's and then matching the first two 'a's. + + a, b, b2j, isbjunk = self.a, self.b, self.b2j, self.isbjunk + besti, bestj, bestsize = alo, blo, 0 + # find longest junk-free match + # during an iteration of the loop, j2len[j] = length of longest + # junk-free match ending with a[i-1] and b[j] + j2len = {} + nothing = [] + for i in xrange(alo, ahi): + # look at all instances of a[i] in b; note that because + # b2j has no junk keys, the loop is skipped if a[i] is junk + j2lenget = j2len.get + newj2len = {} + for j in b2j.get(a[i], nothing): + # a[i] matches b[j] + if j < blo: + continue + if j >= bhi: + break + k = newj2len[j] = j2lenget(j-1, 0) + 1 + if k > bestsize: + besti, bestj, bestsize = i-k+1, j-k+1, k + j2len = newj2len + + # Now that we have a wholly interesting match (albeit possibly + # empty!), we may as well suck up the matching junk on each + # side of it too. Can't think of a good reason not to, and it + # saves post-processing the (possibly considerable) expense of + # figuring out what to do with it. In the case of an empty + # interesting match, this is clearly the right thing to do, + # because no other kind of match is possible in the regions. + while besti > alo and bestj > blo and \ + isbjunk(b[bestj-1]) and \ + a[besti-1] == b[bestj-1]: + besti, bestj, bestsize = besti-1, bestj-1, bestsize+1 + while besti+bestsize < ahi and bestj+bestsize < bhi and \ + isbjunk(b[bestj+bestsize]) and \ + a[besti+bestsize] == b[bestj+bestsize]: + bestsize = bestsize + 1 + + if TRACE: + print "get_matching_blocks", alo, ahi, blo, bhi + print " returns", besti, bestj, bestsize + return besti, bestj, bestsize + + def get_matching_blocks(self): + """Return list of triples describing matching subsequences. + + Each triple is of the form (i, j, n), and means that + a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in + i and in j. + + The last triple is a dummy, (len(a), len(b), 0), and is the only + triple with n==0. + + >>> s = SequenceMatcher(None, "abxcd", "abcd") + >>> s.get_matching_blocks() + [(0, 0, 2), (3, 2, 2), (5, 4, 0)] + """ + + if self.matching_blocks is not None: + return self.matching_blocks + self.matching_blocks = [] + la, lb = len(self.a), len(self.b) + self.__helper(0, la, 0, lb, self.matching_blocks) + self.matching_blocks.append( (la, lb, 0) ) + if TRACE: + print '*** matching blocks', self.matching_blocks + return self.matching_blocks + + # builds list of matching blocks covering a[alo:ahi] and + # b[blo:bhi], appending them in increasing order to answer + + def __helper(self, alo, ahi, blo, bhi, answer): + i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi) + # a[alo:i] vs b[blo:j] unknown + # a[i:i+k] same as b[j:j+k] + # a[i+k:ahi] vs b[j+k:bhi] unknown + if k: + if alo < i and blo < j: + self.__helper(alo, i, blo, j, answer) + answer.append(x) + if i+k < ahi and j+k < bhi: + self.__helper(i+k, ahi, j+k, bhi, answer) + + def get_opcodes(self): + """Return list of 5-tuples describing how to turn a into b. + + Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple + has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the + tuple preceding it, and likewise for j1 == the previous j2. + + The tags are strings, with these meanings: + + 'replace': a[i1:i2] should be replaced by b[j1:j2] + 'delete': a[i1:i2] should be deleted. + Note that j1==j2 in this case. + 'insert': b[j1:j2] should be inserted at a[i1:i1]. + Note that i1==i2 in this case. + 'equal': a[i1:i2] == b[j1:j2] + + >>> a = "qabxcd" + >>> b = "abycdf" + >>> s = SequenceMatcher(None, a, b) + >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): + ... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" % + ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2])) + delete a[0:1] (q) b[0:0] () + equal a[1:3] (ab) b[0:2] (ab) + replace a[3:4] (x) b[2:3] (y) + equal a[4:6] (cd) b[3:5] (cd) + insert a[6:6] () b[5:6] (f) + """ + + if self.opcodes is not None: + return self.opcodes + i = j = 0 + self.opcodes = answer = [] + for ai, bj, size in self.get_matching_blocks(): + # invariant: we've pumped out correct diffs to change + # a[:i] into b[:j], and the next matching block is + # a[ai:ai+size] == b[bj:bj+size]. So we need to pump + # out a diff to change a[i:ai] into b[j:bj], pump out + # the matching block, and move (i,j) beyond the match + tag = '' + if i < ai and j < bj: + tag = 'replace' + elif i < ai: + tag = 'delete' + elif j < bj: + tag = 'insert' + if tag: + answer.append( (tag, i, ai, j, bj) ) + i, j = ai+size, bj+size + # the list of matching blocks is terminated by a + # sentinel with size 0 + if size: + answer.append( ('equal', ai, i, bj, j) ) + return answer + + def ratio(self): + """Return a measure of the sequences' similarity (float in [0,1]). + + Where T is the total number of elements in both sequences, and + M is the number of matches, this is 2,0*M / T. + Note that this is 1 if the sequences are identical, and 0 if + they have nothing in common. + + .ratio() is expensive to compute if you haven't already computed + .get_matching_blocks() or .get_opcodes(), in which case you may + want to try .quick_ratio() or .real_quick_ratio() first to get an + upper bound. + + >>> s = SequenceMatcher(None, "abcd", "bcde") + >>> s.ratio() + 0.75 + >>> s.quick_ratio() + 0.75 + >>> s.real_quick_ratio() + 1.0 + """ + + matches = reduce(lambda sum, triple: sum + triple[-1], + self.get_matching_blocks(), 0) + return 2.0 * matches / (len(self.a) + len(self.b)) + + def quick_ratio(self): + """Return an upper bound on ratio() relatively quickly. + + This isn't defined beyond that it is an upper bound on .ratio(), and + is faster to compute. + """ + + # viewing a and b as multisets, set matches to the cardinality + # of their intersection; this counts the number of matches + # without regard to order, so is clearly an upper bound + if self.fullbcount is None: + self.fullbcount = fullbcount = {} + for elt in self.b: + fullbcount[elt] = fullbcount.get(elt, 0) + 1 + fullbcount = self.fullbcount + # avail[x] is the number of times x appears in 'b' less the + # number of times we've seen it in 'a' so far ... kinda + avail = {} + availhas, matches = avail.has_key, 0 + for elt in self.a: + if availhas(elt): + numb = avail[elt] + else: + numb = fullbcount.get(elt, 0) + avail[elt] = numb - 1 + if numb > 0: + matches = matches + 1 + return 2.0 * matches / (len(self.a) + len(self.b)) + + def real_quick_ratio(self): + """Return an upper bound on ratio() very quickly. + + This isn't defined beyond that it is an upper bound on .ratio(), and + is faster to compute than either .ratio() or .quick_ratio(). + """ + + la, lb = len(self.a), len(self.b) + # can't have more matches than the number of elements in the + # shorter sequence + return 2.0 * min(la, lb) / (la + lb) + +def get_close_matches(word, possibilities, n=3, cutoff=0.6): + """Use SequenceMatcher to return list of the best "good enough" matches. + + word is a sequence for which close matches are desired (typically a + string). + + possibilities is a list of sequences against which to match word + (typically a list of strings). + + Optional arg n (default 3) is the maximum number of close matches to + return. n must be > 0. + + Optional arg cutoff (default 0.6) is a float in [0, 1]. Possibilities + that don't score at least that similar to word are ignored. + + The best (no more than n) matches among the possibilities are returned + in a list, sorted by similarity score, most similar first. + + >>> get_close_matches("appel", ["ape", "apple", "peach", "puppy"]) + ['apple', 'ape'] + >>> import keyword as _keyword + >>> get_close_matches("wheel", _keyword.kwlist) + ['while'] + >>> get_close_matches("apple", _keyword.kwlist) + [] + >>> get_close_matches("accept", _keyword.kwlist) + ['except'] + """ + + if not n > 0: + raise ValueError("n must be > 0: " + `n`) + if not 0.0 <= cutoff <= 1.0: + raise ValueError("cutoff must be in [0.0, 1.0]: " + `cutoff`) + result = [] + s = SequenceMatcher() + s.set_seq2(word) + for x in possibilities: + s.set_seq1(x) + if s.real_quick_ratio() >= cutoff and \ + s.quick_ratio() >= cutoff and \ + s.ratio() >= cutoff: + result.append((s.ratio(), x)) + # Sort by score. + result.sort() + # Retain only the best n. + result = result[-n:] + # Move best-scorer to head of list. + result.reverse() + # Strip scores. + return [x for score, x in result] + + +def _count_leading(line, ch): + """ + Return number of `ch` characters at the start of `line`. + + Example: + + >>> _count_leading(' abc', ' ') + 3 + """ + + i, n = 0, len(line) + while i < n and line[i] == ch: + i += 1 + return i + +class Differ: + r""" + Differ is a class for comparing sequences of lines of text, and + producing human-readable differences or deltas. Differ uses + SequenceMatcher both to compare sequences of lines, and to compare + sequences of characters within similar (near-matching) lines. + + Each line of a Differ delta begins with a two-letter code: + + '- ' line unique to sequence 1 + '+ ' line unique to sequence 2 + ' ' line common to both sequences + '? ' line not present in either input sequence + + Lines beginning with '? ' attempt to guide the eye to intraline + differences, and were not present in either input sequence. These lines + can be confusing if the sequences contain tab characters. + + Note that Differ makes no claim to produce a *minimal* diff. To the + contrary, minimal diffs are often counter-intuitive, because they synch + up anywhere possible, sometimes accidental matches 100 pages apart. + Restricting synch points to contiguous matches preserves some notion of + locality, at the occasional cost of producing a longer diff. + + Example: Comparing two texts. + + First we set up the texts, sequences of individual single-line strings + ending with newlines (such sequences can also be obtained from the + `readlines()` method of file-like objects): + + >>> text1 = ''' 1. Beautiful is better than ugly. + ... 2. Explicit is better than implicit. + ... 3. Simple is better than complex. + ... 4. Complex is better than complicated. + ... '''.splitlines(1) + >>> len(text1) + 4 + >>> text1[0][-1] + '\n' + >>> text2 = ''' 1. Beautiful is better than ugly. + ... 3. Simple is better than complex. + ... 4. Complicated is better than complex. + ... 5. Flat is better than nested. + ... '''.splitlines(1) + + Next we instantiate a Differ object: + + >>> d = Differ() + + Note that when instantiating a Differ object we may pass functions to + filter out line and character 'junk'. See Differ.__init__ for details. + + Finally, we compare the two: + + >>> result = d.compare(text1, text2) + + 'result' is a list of strings, so let's pretty-print it: + + >>> from pprint import pprint as _pprint + >>> _pprint(result) + [' 1. Beautiful is better than ugly.\n', + '- 2. Explicit is better than implicit.\n', + '- 3. Simple is better than complex.\n', + '+ 3. Simple is better than complex.\n', + '? ++\n', + '- 4. Complex is better than complicated.\n', + '? ^ ---- ^\n', + '+ 4. Complicated is better than complex.\n', + '? ++++ ^ ^\n', + '+ 5. Flat is better than nested.\n'] + + As a single multi-line string it looks like this: + + >>> print ''.join(result), + 1. Beautiful is better than ugly. + - 2. Explicit is better than implicit. + - 3. Simple is better than complex. + + 3. Simple is better than complex. + ? ++ + - 4. Complex is better than complicated. + ? ^ ---- ^ + + 4. Complicated is better than complex. + ? ++++ ^ ^ + + 5. Flat is better than nested. + + Methods: + + __init__(linejunk=None, charjunk=None) + Construct a text differencer, with optional filters. + + compare(a, b) + Compare two sequences of lines; return the resulting delta (list). + """ + + def __init__(self, linejunk=None, charjunk=None): + """ + Construct a text differencer, with optional filters. + + The two optional keyword parameters are for filter functions: + + - `linejunk`: A function that should accept a single string argument, + and return true iff the string is junk. The module-level function + `IS_LINE_JUNK` may be used to filter out lines without visible + characters, except for at most one splat ('#'). + + - `charjunk`: A function that should accept a string of length 1. The + module-level function `IS_CHARACTER_JUNK` may be used to filter out + whitespace characters (a blank or tab; **note**: bad idea to include + newline in this!). + """ + + self.linejunk = linejunk + self.charjunk = charjunk + self.results = [] + + def compare(self, a, b): + r""" + Compare two sequences of lines; return the resulting delta (list). + + Each sequence must contain individual single-line strings ending with + newlines. Such sequences can be obtained from the `readlines()` method + of file-like objects. The list returned is also made up of + newline-terminated strings, ready to be used with the `writelines()` + method of a file-like object. + + Example: + + >>> print ''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(1), + ... 'ore\ntree\nemu\n'.splitlines(1))), + - one + ? ^ + + ore + ? ^ + - two + - three + ? - + + tree + + emu + """ + + cruncher = SequenceMatcher(self.linejunk, a, b) + for tag, alo, ahi, blo, bhi in cruncher.get_opcodes(): + if tag == 'replace': + self._fancy_replace(a, alo, ahi, b, blo, bhi) + elif tag == 'delete': + self._dump('-', a, alo, ahi) + elif tag == 'insert': + self._dump('+', b, blo, bhi) + elif tag == 'equal': + self._dump(' ', a, alo, ahi) + else: + raise ValueError, 'unknown tag ' + `tag` + results = self.results + self.results = [] + return results + + def _dump(self, tag, x, lo, hi): + """Store comparison results for a same-tagged range.""" + for i in xrange(lo, hi): + self.results.append('%s %s' % (tag, x[i])) + + def _plain_replace(self, a, alo, ahi, b, blo, bhi): + assert alo < ahi and blo < bhi + # dump the shorter block first -- reduces the burden on short-term + # memory if the blocks are of very different sizes + if bhi - blo < ahi - alo: + self._dump('+', b, blo, bhi) + self._dump('-', a, alo, ahi) + else: + self._dump('-', a, alo, ahi) + self._dump('+', b, blo, bhi) + + def _fancy_replace(self, a, alo, ahi, b, blo, bhi): + r""" + When replacing one block of lines with another, search the blocks + for *similar* lines; the best-matching pair (if any) is used as a + synch point, and intraline difference marking is done on the + similar pair. Lots of work, but often worth it. + + Example: + + >>> d = Differ() + >>> d._fancy_replace(['abcDefghiJkl\n'], 0, 1, ['abcdefGhijkl\n'], 0, 1) + >>> print ''.join(d.results), + - abcDefghiJkl + ? ^ ^ ^ + + abcdefGhijkl + ? ^ ^ ^ + """ + + if TRACE: + self.results.append('*** _fancy_replace %s %s %s %s\n' + % (alo, ahi, blo, bhi)) + self._dump('>', a, alo, ahi) + self._dump('<', b, blo, bhi) + + # don't synch up unless the lines have a similarity score of at + # least cutoff; best_ratio tracks the best score seen so far + best_ratio, cutoff = 0.74, 0.75 + cruncher = SequenceMatcher(self.charjunk) + eqi, eqj = None, None # 1st indices of equal lines (if any) + + # search for the pair that matches best without being identical + # (identical lines must be junk lines, & we don't want to synch up + # on junk -- unless we have to) + for j in xrange(blo, bhi): + bj = b[j] + cruncher.set_seq2(bj) + for i in xrange(alo, ahi): + ai = a[i] + if ai == bj: + if eqi is None: + eqi, eqj = i, j + continue + cruncher.set_seq1(ai) + # computing similarity is expensive, so use the quick + # upper bounds first -- have seen this speed up messy + # compares by a factor of 3. + # note that ratio() is only expensive to compute the first + # time it's called on a sequence pair; the expensive part + # of the computation is cached by cruncher + if cruncher.real_quick_ratio() > best_ratio and \ + cruncher.quick_ratio() > best_ratio and \ + cruncher.ratio() > best_ratio: + best_ratio, best_i, best_j = cruncher.ratio(), i, j + if best_ratio < cutoff: + # no non-identical "pretty close" pair + if eqi is None: + # no identical pair either -- treat it as a straight replace + self._plain_replace(a, alo, ahi, b, blo, bhi) + return + # no close pair, but an identical pair -- synch up on that + best_i, best_j, best_ratio = eqi, eqj, 1.0 + else: + # there's a close pair, so forget the identical pair (if any) + eqi = None + + # a[best_i] very similar to b[best_j]; eqi is None iff they're not + # identical + if TRACE: + self.results.append('*** best_ratio %s %s %s %s\n' + % (best_ratio, best_i, best_j)) + self._dump('>', a, best_i, best_i+1) + self._dump('<', b, best_j, best_j+1) + + # pump out diffs from before the synch point + self._fancy_helper(a, alo, best_i, b, blo, best_j) + + # do intraline marking on the synch pair + aelt, belt = a[best_i], b[best_j] + if eqi is None: + # pump out a '-', '?', '+', '?' quad for the synched lines + atags = btags = "" + cruncher.set_seqs(aelt, belt) + for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes(): + la, lb = ai2 - ai1, bj2 - bj1 + if tag == 'replace': + atags += '^' * la + btags += '^' * lb + elif tag == 'delete': + atags += '-' * la + elif tag == 'insert': + btags += '+' * lb + elif tag == 'equal': + atags += ' ' * la + btags += ' ' * lb + else: + raise ValueError, 'unknown tag ' + `tag` + self._qformat(aelt, belt, atags, btags) + else: + # the synch pair is identical + self.results.append(' ' + aelt) + + # pump out diffs from after the synch point + self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi) + + def _fancy_helper(self, a, alo, ahi, b, blo, bhi): + if alo < ahi: + if blo < bhi: + self._fancy_replace(a, alo, ahi, b, blo, bhi) + else: + self._dump('-', a, alo, ahi) + elif blo < bhi: + self._dump('+', b, blo, bhi) + + def _qformat(self, aline, bline, atags, btags): + r""" + Format "?" output and deal with leading tabs. + + Example: + + >>> d = Differ() + >>> d._qformat('\tabcDefghiJkl\n', '\t\tabcdefGhijkl\n', + ... ' ^ ^ ^ ', '+ ^ ^ ^ ') + >>> for line in d.results: print repr(line) + ... + '- \tabcDefghiJkl\n' + '? \t ^ ^ ^\n' + '+ \t\tabcdefGhijkl\n' + '? \t ^ ^ ^\n' + """ + + # Can hurt, but will probably help most of the time. + common = min(_count_leading(aline, "\t"), + _count_leading(bline, "\t")) + common = min(common, _count_leading(atags[:common], " ")) + atags = atags[common:].rstrip() + btags = btags[common:].rstrip() + + self.results.append("- " + aline) + if atags: + self.results.append("? %s%s\n" % ("\t" * common, atags)) + + self.results.append("+ " + bline) + if btags: + self.results.append("? %s%s\n" % ("\t" * common, btags)) + +# With respect to junk, an earlier version of ndiff simply refused to +# *start* a match with a junk element. The result was cases like this: +# before: private Thread currentThread; +# after: private volatile Thread currentThread; +# If you consider whitespace to be junk, the longest contiguous match +# not starting with junk is "e Thread currentThread". So ndiff reported +# that "e volatil" was inserted between the 't' and the 'e' in "private". +# While an accurate view, to people that's absurd. The current version +# looks for matching blocks that are entirely junk-free, then extends the +# longest one of those as far as possible but only with matching junk. +# So now "currentThread" is matched, then extended to suck up the +# preceding blank; then "private" is matched, and extended to suck up the +# following blank; then "Thread" is matched; and finally ndiff reports +# that "volatile " was inserted before "Thread". The only quibble +# remaining is that perhaps it was really the case that " volatile" +# was inserted after "private". I can live with that <wink>. + +import re + +def IS_LINE_JUNK(line, pat=re.compile(r"\s*#?\s*$").match): + r""" + Return 1 for ignorable line: iff `line` is blank or contains a single '#'. + + Examples: + + >>> IS_LINE_JUNK('\n') + 1 + >>> IS_LINE_JUNK(' # \n') + 1 + >>> IS_LINE_JUNK('hello\n') + 0 + """ + + return pat(line) is not None + +def IS_CHARACTER_JUNK(ch, ws=" \t"): + r""" + Return 1 for ignorable character: iff `ch` is a space or tab. + + Examples: + + >>> IS_CHARACTER_JUNK(' ') + 1 + >>> IS_CHARACTER_JUNK('\t') + 1 + >>> IS_CHARACTER_JUNK('\n') + 0 + >>> IS_CHARACTER_JUNK('x') + 0 + """ + + return ch in ws + +del re + +def ndiff(a, b, linejunk=IS_LINE_JUNK, charjunk=IS_CHARACTER_JUNK): + r""" + Compare `a` and `b` (lists of strings); return a `Differ`-style delta. + + Optional keyword parameters `linejunk` and `charjunk` are for filter + functions (or None): + + - linejunk: A function that should accept a single string argument, and + return true iff the string is junk. The default is module-level function + IS_LINE_JUNK, which filters out lines without visible characters, except + for at most one splat ('#'). + + - charjunk: A function that should accept a string of length 1. The + default is module-level function IS_CHARACTER_JUNK, which filters out + whitespace characters (a blank or tab; note: bad idea to include newline + in this!). + + Tools/scripts/ndiff.py is a command-line front-end to this function. + + Example: + + >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1), + ... 'ore\ntree\nemu\n'.splitlines(1)) + >>> print ''.join(diff), + - one + ? ^ + + ore + ? ^ + - two + - three + ? - + + tree + + emu + """ + return Differ(linejunk, charjunk).compare(a, b) + +def restore(delta, which): + r""" + Return one of the two sequences that generated a delta. + + Given a `delta` produced by `Differ.compare()` or `ndiff()`, extract + lines originating from file 1 or 2 (parameter `which`), stripping off line + prefixes. + + Examples: + + >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1), + ... 'ore\ntree\nemu\n'.splitlines(1)) + >>> print ''.join(restore(diff, 1)), + one + two + three + >>> print ''.join(restore(diff, 2)), + ore + tree + emu + """ + try: + tag = {1: "- ", 2: "+ "}[int(which)] + except KeyError: + raise ValueError, ('unknown delta choice (must be 1 or 2): %r' + % which) + prefixes = (" ", tag) + results = [] + for line in delta: + if line[:2] in prefixes: + results.append(line[2:]) + return results + +def _test(): + import doctest, difflib + return doctest.testmod(difflib) + +if __name__ == "__main__": + _test() diff --git a/test/test_nodes.py b/test/test_nodes.py new file mode 100755 index 000000000..15e633357 --- /dev/null +++ b/test/test_nodes.py @@ -0,0 +1,83 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Test module for nodes.py. +""" + +import unittest +from DocutilsTestSupport import nodes + +debug = 0 + + +class TextTests(unittest.TestCase): + + def setUp(self): + self.text = nodes.Text('Line 1.\nLine 2.') + + def test_repr(self): + self.assertEquals(repr(self.text), r"<#text: 'Line 1.\nLine 2.'>") + + def test_str(self): + self.assertEquals(str(self.text), 'Line 1.\nLine 2.') + + def test_asdom(self): + dom = self.text.asdom() + self.assertEquals(dom.toxml(), 'Line 1.\nLine 2.') + dom.unlink() + + def test_astext(self): + self.assertEquals(self.text.astext(), 'Line 1.\nLine 2.') + + def test_pformat(self): + self.assertEquals(self.text.pformat(), 'Line 1.\nLine 2.\n') + + +class ElementTests(unittest.TestCase): + + def test_empty(self): + element = nodes.Element() + self.assertEquals(repr(element), '<Element: >') + self.assertEquals(str(element), '<Element/>') + dom = element.asdom() + self.assertEquals(dom.toxml(), '<Element/>') + dom.unlink() + element['attr'] = '1' + self.assertEquals(repr(element), '<Element: >') + self.assertEquals(str(element), '<Element attr="1"/>') + dom = element.asdom() + self.assertEquals(dom.toxml(), '<Element attr="1"/>') + dom.unlink() + self.assertEquals(element.pformat(), '<Element attr="1">\n') + + def test_withtext(self): + element = nodes.Element('text\nmore', nodes.Text('text\nmore')) + self.assertEquals(repr(element), r"<Element: <#text: 'text\nmore'>>") + self.assertEquals(str(element), '<Element>text\nmore</Element>') + dom = element.asdom() + self.assertEquals(dom.toxml(), '<Element>text\nmore</Element>') + dom.unlink() + element['attr'] = '1' + self.assertEquals(repr(element), r"<Element: <#text: 'text\nmore'>>") + self.assertEquals(str(element), + '<Element attr="1">text\nmore</Element>') + dom = element.asdom() + self.assertEquals(dom.toxml(), + '<Element attr="1">text\nmore</Element>') + dom.unlink() + self.assertEquals(element.pformat(), +"""\ +<Element attr="1"> + text + more +""") + + +if __name__ == '__main__': + unittest.main() diff --git a/test/test_parsers/test_rst/test_TableParser.py b/test/test_parsers/test_rst/test_TableParser.py new file mode 100755 index 000000000..ed6083d50 --- /dev/null +++ b/test/test_parsers/test_rst/test_TableParser.py @@ -0,0 +1,197 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.TableParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['tables'] = [ +["""\ ++-------------------------------------+ +| A table with one cell and one line. | ++-------------------------------------+ +""", +[(0, 0, 2, 38, ['A table with one cell and one line.'])], +([37], + [], + [[(0, 0, 1, ['A table with one cell and one line.'])]])], +["""\ ++--------------+--------------+ +| A table with | two columns. | ++--------------+--------------+ +""", +[(0, 0, 2, 15, ['A table with']), + (0, 15, 2, 30, ['two columns.'])], +([14, 14], + [], + [[(0, 0, 1, ['A table with']), + (0, 0, 1, ['two columns.'])]])], +["""\ ++--------------+-------------+ +| A table with | two columns | ++--------------+-------------+ +| and | two rows. | ++--------------+-------------+ +""", +[(0, 0, 2, 15, ['A table with']), + (0, 15, 2, 29, ['two columns']), + (2, 0, 4, 15, ['and']), + (2, 15, 4, 29, ['two rows.'])], +([14, 13], + [], + [[(0, 0, 1, ['A table with']), + (0, 0, 1, ['two columns'])], + [(0, 0, 3, ['and']), + (0, 0, 3, ['two rows.'])]])], +["""\ ++--------------------------+ +| A table with three rows, | ++------------+-------------+ +| and two | columns. | ++------------+-------------+ +| First and last rows | +| contain column spans. | ++--------------------------+ +""", +[(0, 0, 2, 27, ['A table with three rows,']), + (2, 0, 4, 13, ['and two']), + (2, 13, 4, 27, ['columns.']), + (4, 0, 7, 27, ['First and last rows', 'contain column spans.'])], +([12, 13], + [], + [[(0, 1, 1, ['A table with three rows,']), + None], + [(0, 0, 3, ['and two']), + (0, 0, 3, ['columns.'])], + [(0, 1, 5, ['First and last rows', 'contain column spans.']), + None]])], +["""\ ++------------+-------------+---------------+ +| A table | two rows in | and row spans | +| with three +-------------+ to left and | +| columns, | the middle, | right. | ++------------+-------------+---------------+ +""", +[(0, 0, 4, 13, ['A table', 'with three', 'columns,']), + (0, 13, 2, 27, ['two rows in']), + (0, 27, 4, 43, ['and row spans', 'to left and', 'right.']), + (2, 13, 4, 27, ['the middle,'])], +([12, 13, 15], + [], + [[(1, 0, 1, ['A table', 'with three', 'columns,']), + (0, 0, 1, ['two rows in']), + (1, 0, 1, ['and row spans', 'to left and', 'right.'])], + [None, + (0, 0, 3, ['the middle,']), + None]])], +["""\ ++------------+-------------+---------------+ +| A table | | two rows in | and funny | +| with 3 +--+-------------+-+ stuff. | +| columns, | the middle, | | | ++------------+-------------+---------------+ +""", +[(0, 0, 4, 13, ['A table |', 'with 3 +--', 'columns,']), + (0, 13, 2, 27, ['two rows in']), + (0, 27, 4, 43, [' and funny', '-+ stuff.', ' |']), + (2, 13, 4, 27, ['the middle,'])], +([12, 13, 15], + [], + [[(1, 0, 1, ['A table |', 'with 3 +--', 'columns,']), + (0, 0, 1, ['two rows in']), + (1, 0, 1, [' and funny', '-+ stuff.', ' |'])], + [None, + (0, 0, 3, ['the middle,']), + None]])], +["""\ ++-----------+-------------------------+ +| W/NW cell | N/NE cell | +| +-------------+-----------+ +| | Middle cell | E/SE cell | ++-----------+-------------+ | +| S/SE cell | | ++-------------------------+-----------+ +""", +[(0, 0, 4, 12, ['W/NW cell', '', '']), + (0, 12, 2, 38, ['N/NE cell']), + (2, 12, 4, 26, ['Middle cell']), + (2, 26, 6, 38, ['E/SE cell', '', '']), + (4, 0, 6, 26, ['S/SE cell'])], +([11, 13, 11], + [], + [[(1, 0, 1, ['W/NW cell', '', '']), + (0, 1, 1, ['N/NE cell']), + None], + [None, + (0, 0, 3, ['Middle cell']), + (1, 0, 3, ['E/SE cell', '', ''])], + [(0, 1, 5, ['S/SE cell']), + None, + None]])], +["""\ ++--------------+-------------+ +| A bad table. | | ++--------------+ | +| Cells must be rectangles. | ++----------------------------+ +""", +'TableMarkupError: Malformed table; parse incomplete.', +'TableMarkupError: Malformed table; parse incomplete.'], +["""\ ++-------------------------------+ +| A table with two header rows, | ++------------+------------------+ +| the first | with a span. | ++============+==================+ +| Two body | rows, | ++------------+------------------+ +| the second with a span. | ++-------------------------------+ +""", +[(0, 0, 2, 32, ['A table with two header rows,']), + (2, 0, 4, 13, ['the first']), + (2, 13, 4, 32, ['with a span.']), + (4, 0, 6, 13, ['Two body']), + (4, 13, 6, 32, ['rows,']), + (6, 0, 8, 32, ['the second with a span.'])], +([12, 18], + [[(0, 1, 1, ['A table with two header rows,']), + None], + [(0, 0, 3, ['the first']), + (0, 0, 3, ['with a span.'])]], + [[(0, 0, 5, ['Two body']), + (0, 0, 5, ['rows,'])], + [(0, 1, 7, ['the second with a span.']), + None]])], +["""\ ++-------------------------------+ +| A table with two head/body | ++=============+=================+ +| row | separators. | ++=============+=================+ +| That's bad. | | ++-------------+-----------------+ +""", +'TableMarkupError: Multiple head/body row separators in table ' +'(at line offset 2 and 4); only one allowed.', +'TableMarkupError: Multiple head/body row separators in table ' +'(at line offset 2 and 4); only one allowed.'], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_block_quotes.py b/test/test_parsers/test_rst/test_block_quotes.py new file mode 100755 index 000000000..e047d0e92 --- /dev/null +++ b/test/test_parsers/test_rst/test_block_quotes.py @@ -0,0 +1,124 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['block_quotes'] = [ +["""\ +Line 1. +Line 2. + + Indented. +""", +"""\ +<document> + <paragraph> + Line 1. + Line 2. + <block_quote> + <paragraph> + Indented. +"""], +["""\ +Line 1. +Line 2. + + Indented 1. + + Indented 2. +""", +"""\ +<document> + <paragraph> + Line 1. + Line 2. + <block_quote> + <paragraph> + Indented 1. + <block_quote> + <paragraph> + Indented 2. +"""], +["""\ +Line 1. +Line 2. + Unexpectedly indented. +""", +"""\ +<document> + <paragraph> + Line 1. + Line 2. + <system_message level="3" type="ERROR"> + <paragraph> + Unexpected indentation at line 3. + <block_quote> + <paragraph> + Unexpectedly indented. +"""], +["""\ +Line 1. +Line 2. + + Indented. +no blank line +""", +"""\ +<document> + <paragraph> + Line 1. + Line 2. + <block_quote> + <paragraph> + Indented. + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 5. + <paragraph> + no blank line +"""], +["""\ +Here is a paragraph. + + Indent 8 spaces. + + Indent 4 spaces. + +Is this correct? Should it generate a warning? +Yes, it is correct, no warning necessary. +""", +"""\ +<document> + <paragraph> + Here is a paragraph. + <block_quote> + <block_quote> + <paragraph> + Indent 8 spaces. + <paragraph> + Indent 4 spaces. + <paragraph> + Is this correct? Should it generate a warning? + Yes, it is correct, no warning necessary. +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_bullet_lists.py b/test/test_parsers/test_rst/test_bullet_lists.py new file mode 100755 index 000000000..b9552042e --- /dev/null +++ b/test/test_parsers/test_rst/test_bullet_lists.py @@ -0,0 +1,181 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['bullet_lists'] = [ +["""\ +- item +""", +"""\ +<document> + <bullet_list bullet="-"> + <list_item> + <paragraph> + item +"""], +["""\ +* item 1 + +* item 2 +""", +"""\ +<document> + <bullet_list bullet="*"> + <list_item> + <paragraph> + item 1 + <list_item> + <paragraph> + item 2 +"""], +["""\ +No blank line between: + ++ item 1 ++ item 2 +""", +"""\ +<document> + <paragraph> + No blank line between: + <bullet_list bullet="+"> + <list_item> + <paragraph> + item 1 + <list_item> + <paragraph> + item 2 +"""], +["""\ +- item 1, para 1. + + item 1, para 2. + +- item 2 +""", +"""\ +<document> + <bullet_list bullet="-"> + <list_item> + <paragraph> + item 1, para 1. + <paragraph> + item 1, para 2. + <list_item> + <paragraph> + item 2 +"""], +["""\ +- item 1, line 1 + item 1, line 2 +- item 2 +""", +"""\ +<document> + <bullet_list bullet="-"> + <list_item> + <paragraph> + item 1, line 1 + item 1, line 2 + <list_item> + <paragraph> + item 2 +"""], +["""\ +Different bullets: + +- item 1 + ++ item 2 + +* item 3 +- item 4 +""", +"""\ +<document> + <paragraph> + Different bullets: + <bullet_list bullet="-"> + <list_item> + <paragraph> + item 1 + <bullet_list bullet="+"> + <list_item> + <paragraph> + item 2 + <bullet_list bullet="*"> + <list_item> + <paragraph> + item 3 + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 8. + <bullet_list bullet="-"> + <list_item> + <paragraph> + item 4 +"""], +["""\ +- item +no blank line +""", +"""\ +<document> + <bullet_list bullet="-"> + <list_item> + <paragraph> + item + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 2. + <paragraph> + no blank line +"""], +["""\ +- + +empty item above +""", +"""\ +<document> + <bullet_list bullet="-"> + <list_item> + <paragraph> + empty item above +"""], +["""\ +- +empty item above, no blank line +""", +"""\ +<document> + <bullet_list bullet="-"> + <list_item> + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 2. + <paragraph> + empty item above, no blank line +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_citations.py b/test/test_parsers/test_rst/test_citations.py new file mode 100755 index 000000000..15568c1fd --- /dev/null +++ b/test/test_parsers/test_rst/test_citations.py @@ -0,0 +1,139 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['citations'] = [ +["""\ +.. [citation] This is a citation. +""", +"""\ +<document> + <citation id="citation" name="citation"> + <label> + citation + <paragraph> + This is a citation. +"""], +["""\ +.. [citation1234] This is a citation with year. +""", +"""\ +<document> + <citation id="citation1234" name="citation1234"> + <label> + citation1234 + <paragraph> + This is a citation with year. +"""], +["""\ +.. [citation] This is a citation + on multiple lines. +""", +"""\ +<document> + <citation id="citation" name="citation"> + <label> + citation + <paragraph> + This is a citation + on multiple lines. +"""], +["""\ +.. [citation1] This is a citation + on multiple lines with more space. + +.. [citation2] This is a citation + on multiple lines with less space. +""", +"""\ +<document> + <citation id="citation1" name="citation1"> + <label> + citation1 + <paragraph> + This is a citation + on multiple lines with more space. + <citation id="citation2" name="citation2"> + <label> + citation2 + <paragraph> + This is a citation + on multiple lines with less space. +"""], +["""\ +.. [citation] + This is a citation on multiple lines + whose block starts on line 2. +""", +"""\ +<document> + <citation id="citation" name="citation"> + <label> + citation + <paragraph> + This is a citation on multiple lines + whose block starts on line 2. +"""], +["""\ +.. [citation] + +That was an empty citation. +""", +"""\ +<document> + <citation id="citation" name="citation"> + <label> + citation + <paragraph> + That was an empty citation. +"""], +["""\ +.. [citation] +No blank line. +""", +"""\ +<document> + <citation id="citation" name="citation"> + <label> + citation + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 2. + <paragraph> + No blank line. +"""], +["""\ +.. [citation label with spaces] this isn't a citation + +.. [*citationlabelwithmarkup*] this isn't a citation +""", +"""\ +<document> + <comment> + [citation label with spaces] this isn't a citation + <comment> + [*citationlabelwithmarkup*] this isn't a citation +"""], +] + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_comments.py b/test/test_parsers/test_rst/test_comments.py new file mode 100755 index 000000000..4e2e9db0d --- /dev/null +++ b/test/test_parsers/test_rst/test_comments.py @@ -0,0 +1,238 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['comments'] = [ +["""\ +.. A comment + +Paragraph. +""", +"""\ +<document> + <comment> + A comment + <paragraph> + Paragraph. +"""], +["""\ +.. A comment + block. + +Paragraph. +""", +"""\ +<document> + <comment> + A comment + block. + <paragraph> + Paragraph. +"""], +["""\ +.. + A comment consisting of multiple lines + starting on the line after the + explicit markup start. +""", +"""\ +<document> + <comment> + A comment consisting of multiple lines + starting on the line after the + explicit markup start. +"""], +["""\ +.. A comment. +.. Another. + +Paragraph. +""", +"""\ +<document> + <comment> + A comment. + <comment> + Another. + <paragraph> + Paragraph. +"""], +["""\ +.. A comment +no blank line + +Paragraph. +""", +"""\ +<document> + <comment> + A comment + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 2. + <paragraph> + no blank line + <paragraph> + Paragraph. +"""], +["""\ +.. A comment:: + +Paragraph. +""", +"""\ +<document> + <comment> + A comment:: + <paragraph> + Paragraph. +"""], +["""\ +.. Next is an empty comment, which serves to end this comment and + prevents the following block quote being swallowed up. + +.. + + A block quote. +""", +"""\ +<document> + <comment> + Next is an empty comment, which serves to end this comment and + prevents the following block quote being swallowed up. + <comment> + <block_quote> + <paragraph> + A block quote. +"""], +["""\ +term 1 + definition 1 + + .. a comment + +term 2 + definition 2 +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + term 1 + <definition> + <paragraph> + definition 1 + <comment> + a comment + <definition_list_item> + <term> + term 2 + <definition> + <paragraph> + definition 2 +"""], +["""\ +term 1 + definition 1 + +.. a comment + +term 2 + definition 2 +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + term 1 + <definition> + <paragraph> + definition 1 + <comment> + a comment + <definition_list> + <definition_list_item> + <term> + term 2 + <definition> + <paragraph> + definition 2 +"""], +["""\ ++ bullet paragraph 1 + + bullet paragraph 2 + + .. comment between bullet paragraphs 2 and 3 + + bullet paragraph 3 +""", +"""\ +<document> + <bullet_list bullet="+"> + <list_item> + <paragraph> + bullet paragraph 1 + <paragraph> + bullet paragraph 2 + <comment> + comment between bullet paragraphs 2 and 3 + <paragraph> + bullet paragraph 3 +"""], +["""\ ++ bullet paragraph 1 + + .. comment between bullet paragraphs 1 (leader) and 2 + + bullet paragraph 2 +""", +"""\ +<document> + <bullet_list bullet="+"> + <list_item> + <paragraph> + bullet paragraph 1 + <comment> + comment between bullet paragraphs 1 (leader) and 2 + <paragraph> + bullet paragraph 2 +"""], +["""\ ++ bullet + + .. trailing comment +""", +"""\ +<document> + <bullet_list bullet="+"> + <list_item> + <paragraph> + bullet + <comment> + trailing comment +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_definition_lists.py b/test/test_parsers/test_rst/test_definition_lists.py new file mode 100755 index 000000000..daafd0f92 --- /dev/null +++ b/test/test_parsers/test_rst/test_definition_lists.py @@ -0,0 +1,317 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['definition_lists'] = [ +["""\ +term + definition +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + term + <definition> + <paragraph> + definition +"""], +["""\ +term + definition + +paragraph +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + term + <definition> + <paragraph> + definition + <paragraph> + paragraph +"""], +["""\ +term + definition +no blank line +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + term + <definition> + <paragraph> + definition + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 3. + <paragraph> + no blank line +"""], +["""\ +A paragraph:: + A literal block without a blank line first? +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + A paragraph:: + <definition> + <system_message level="1" type="INFO"> + <paragraph> + Blank line missing before literal block? Interpreted as a definition list item. At line 2. + <paragraph> + A literal block without a blank line first? +"""], +["""\ +term 1 + definition 1 + +term 2 + definition 2 +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + term 1 + <definition> + <paragraph> + definition 1 + <definition_list_item> + <term> + term 2 + <definition> + <paragraph> + definition 2 +"""], +["""\ +term 1 + definition 1 (no blank line below) +term 2 + definition 2 +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + term 1 + <definition> + <paragraph> + definition 1 (no blank line below) + <definition_list_item> + <term> + term 2 + <definition> + <paragraph> + definition 2 +"""], +["""\ +term 1 + definition 1 + + term 1a + definition 1a + + term 1b + definition 1b + +term 2 + definition 2 + +paragraph +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + term 1 + <definition> + <paragraph> + definition 1 + <definition_list> + <definition_list_item> + <term> + term 1a + <definition> + <paragraph> + definition 1a + <definition_list_item> + <term> + term 1b + <definition> + <paragraph> + definition 1b + <definition_list_item> + <term> + term 2 + <definition> + <paragraph> + definition 2 + <paragraph> + paragraph +"""], +["""\ +Term : classifier + The ' : ' indicates a classifier in + definition list item terms only. +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + Term + <classifier> + classifier + <definition> + <paragraph> + The ' : ' indicates a classifier in + definition list item terms only. +"""], +["""\ +Term: not a classifier + Because there's no space before the colon. +Term :not a classifier + Because there's no space after the colon. +Term \: not a classifier + Because the colon is escaped. +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + Term: not a classifier + <definition> + <paragraph> + Because there's no space before the colon. + <definition_list_item> + <term> + Term :not a classifier + <definition> + <paragraph> + Because there's no space after the colon. + <definition_list_item> + <term> + Term : not a classifier + <definition> + <paragraph> + Because the colon is escaped. +"""], +["""\ +Term `with *inline ``text **errors : classifier `with *errors ``too + Definition `with *inline ``text **markup errors. +""", +"""\ +<document> + <definition_list> + <definition_list_item> + <term> + Term \n\ + <problematic id="id2" refid="id1"> + ` + with \n\ + <problematic id="id4" refid="id3"> + * + inline \n\ + <problematic id="id6" refid="id5"> + `` + text \n\ + <problematic id="id8" refid="id7"> + ** + errors + <classifier> + classifier \n\ + <problematic id="id10" refid="id9"> + ` + with \n\ + <problematic id="id12" refid="id11"> + * + errors \n\ + <problematic id="id14" refid="id13"> + `` + too + <definition> + <system_message backrefs="id2" id="id1" level="2" type="WARNING"> + <paragraph> + Inline interpreted text or phrase reference start-string without end-string at line 1. + <system_message backrefs="id4" id="id3" level="2" type="WARNING"> + <paragraph> + Inline emphasis start-string without end-string at line 1. + <system_message backrefs="id6" id="id5" level="2" type="WARNING"> + <paragraph> + Inline literal start-string without end-string at line 1. + <system_message backrefs="id8" id="id7" level="2" type="WARNING"> + <paragraph> + Inline strong start-string without end-string at line 1. + <system_message backrefs="id10" id="id9" level="2" type="WARNING"> + <paragraph> + Inline interpreted text or phrase reference start-string without end-string at line 1. + <system_message backrefs="id12" id="id11" level="2" type="WARNING"> + <paragraph> + Inline emphasis start-string without end-string at line 1. + <system_message backrefs="id14" id="id13" level="2" type="WARNING"> + <paragraph> + Inline literal start-string without end-string at line 1. + <paragraph> + Definition \n\ + <problematic id="id16" refid="id15"> + ` + with \n\ + <problematic id="id18" refid="id17"> + * + inline \n\ + <problematic id="id20" refid="id19"> + `` + text \n\ + <problematic id="id22" refid="id21"> + ** + markup errors. + <system_message backrefs="id16" id="id15" level="2" type="WARNING"> + <paragraph> + Inline interpreted text or phrase reference start-string without end-string at line 2. + <system_message backrefs="id18" id="id17" level="2" type="WARNING"> + <paragraph> + Inline emphasis start-string without end-string at line 2. + <system_message backrefs="id20" id="id19" level="2" type="WARNING"> + <paragraph> + Inline literal start-string without end-string at line 2. + <system_message backrefs="id22" id="id21" level="2" type="WARNING"> + <paragraph> + Inline strong start-string without end-string at line 2. +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_directives/test_admonitions.py b/test/test_parsers/test_rst/test_directives/test_admonitions.py new file mode 100755 index 000000000..f231967e0 --- /dev/null +++ b/test/test_parsers/test_rst/test_directives/test_admonitions.py @@ -0,0 +1,117 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for admonitions.py directives. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['admonitions'] = [ +["""\ +.. Attention:: Directives at large. + +.. Note:: This is a note. + +.. Tip:: 15% if the + service is good. + +.. Hint:: It's bigger than a bread box. + +- .. WARNING:: Strong prose may provoke extreme mental exertion. + Reader discretion is strongly advised. +- .. Error:: Does not compute. + +.. Caution:: + + Don't take any wooden nickels. + +.. DANGER:: Mad scientist at work! + +.. Important:: + - Wash behind your ears. + - Clean up your room. + - Call your mother. + - Back up your data. +""", +"""\ +<document> + <attention> + <paragraph> + Directives at large. + <note> + <paragraph> + This is a note. + <tip> + <paragraph> + 15% if the + service is good. + <hint> + <paragraph> + It's bigger than a bread box. + <bullet_list bullet="-"> + <list_item> + <warning> + <paragraph> + Strong prose may provoke extreme mental exertion. + Reader discretion is strongly advised. + <list_item> + <error> + <paragraph> + Does not compute. + <caution> + <paragraph> + Don't take any wooden nickels. + <danger> + <paragraph> + Mad scientist at work! + <important> + <bullet_list bullet="-"> + <list_item> + <paragraph> + Wash behind your ears. + <list_item> + <paragraph> + Clean up your room. + <list_item> + <paragraph> + Call your mother. + <list_item> + <paragraph> + Back up your data. +"""], +["""\ +.. note:: One-line notes. +.. note:: One after the other. +.. note:: No blank lines in-between. +""", +"""\ +<document> + <note> + <paragraph> + One-line notes. + <note> + <paragraph> + One after the other. + <note> + <paragraph> + No blank lines in-between. +"""], +] + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_directives/test_contents.py b/test/test_parsers/test_rst/test_directives/test_contents.py new file mode 100755 index 000000000..0f394103d --- /dev/null +++ b/test/test_parsers/test_rst/test_directives/test_contents.py @@ -0,0 +1,166 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for components.py contents directives. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['contents'] = [ +["""\ +.. contents:: +""", +"""\ +<document> + <pending> + .. internal attributes: + .transform: docutils.transforms.components.Contents + .stage: 'last_reader' + .details: + title: None +"""], +["""\ +.. contents:: Table of Contents +""", +"""\ +<document> + <pending> + .. internal attributes: + .transform: docutils.transforms.components.Contents + .stage: 'last_reader' + .details: + title: + <title> + Table of Contents +"""], +["""\ +.. contents:: + Table of Contents +""", +"""\ +<document> + <pending> + .. internal attributes: + .transform: docutils.transforms.components.Contents + .stage: 'last_reader' + .details: + title: + <title> + Table of Contents +"""], +["""\ +.. contents:: Table + of + Contents +""", +"""\ +<document> + <pending> + .. internal attributes: + .transform: docutils.transforms.components.Contents + .stage: 'last_reader' + .details: + title: + <title> + Table of Contents +"""], +["""\ +.. contents:: *Table* of ``Contents`` +""", +"""\ +<document> + <pending> + .. internal attributes: + .transform: docutils.transforms.components.Contents + .stage: 'last_reader' + .details: + title: + <title> + <emphasis> + Table + of + <literal> + Contents +"""], +["""\ +.. contents:: + :depth: 2 + :local: +""", +"""\ +<document> + <pending> + .. internal attributes: + .transform: docutils.transforms.components.Contents + .stage: 'last_reader' + .details: + depth: 2 + local: None + title: None +"""], +["""\ +.. contents:: Table of Contents + :local: + :depth: 2 +""", +"""\ +<document> + <pending> + .. internal attributes: + .transform: docutils.transforms.components.Contents + .stage: 'last_reader' + .details: + depth: 2 + local: None + title: + <title> + Table of Contents +"""], +["""\ +.. contents:: + :depth: two +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Error in "contents" directive attributes at line 1: + invalid attribute value: + (attribute "depth", value "'two'") invalid literal for int(): two. + <literal_block> + .. contents:: + :depth: two +"""], +["""\ +.. contents:: + :width: 2 +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Error in "contents" directive attributes at line 1: + unknown attribute: "width". + <literal_block> + .. contents:: + :width: 2 +"""], +] + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_directives/test_figures.py b/test/test_parsers/test_rst/test_directives/test_figures.py new file mode 100755 index 000000000..290584a21 --- /dev/null +++ b/test/test_parsers/test_rst/test_directives/test_figures.py @@ -0,0 +1,286 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for images.py figure directives. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['figures'] = [ +["""\ +.. figure:: picture.png +""", +"""\ +<document> + <figure> + <image uri="picture.png"> +"""], +["""\ +.. figure:: not an image URI +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Image URI at line 1 contains whitespace. + <literal_block> + .. figure:: not an image URI +"""], +["""\ +.. figure:: picture.png + + A picture with a caption. +""", +"""\ +<document> + <figure> + <image uri="picture.png"> + <caption> + A picture with a caption. +"""], +["""\ +.. figure:: picture.png + + - A picture with an invalid caption. +""", +"""\ +<document> + <figure> + <image uri="picture.png"> + <system_message level="3" type="ERROR"> + <paragraph> + Figure caption must be a paragraph or empty comment. + <literal_block> + .. figure:: picture.png + \n\ + - A picture with an invalid caption. +"""], +["""\ +.. figure:: not an image URI + + And a caption. +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Image URI at line 1 contains whitespace. + <literal_block> + .. figure:: not an image URI + + And a caption. +"""], +["""\ +.. figure:: picture.png + + .. + + A picture with a legend but no caption. +""", +"""\ +<document> + <figure> + <image uri="picture.png"> + <legend> + <paragraph> + A picture with a legend but no caption. +"""], +["""\ +.. Figure:: picture.png + :height: 100 + :width: 200 + :scale: 50 + + A picture with image attributes and a caption. +""", +"""\ +<document> + <figure> + <image height="100" scale="50" uri="picture.png" width="200"> + <caption> + A picture with image attributes and a caption. +"""], +["""\ +.. Figure:: picture.png + :height: 100 + :alt: alternate text + :width: 200 + :scale: 50 + + A picture with image attributes on individual lines, and this caption. +""", +"""\ +<document> + <figure> + <image alt="alternate text" height="100" scale="50" uri="picture.png" width="200"> + <caption> + A picture with image attributes on individual lines, and this caption. +"""], +["""\ +This figure lacks a caption. It may still have a +"Figure 1."-style caption appended in the output. + +.. figure:: picture.png +""", +"""\ +<document> + <paragraph> + This figure lacks a caption. It may still have a + "Figure 1."-style caption appended in the output. + <figure> + <image uri="picture.png"> +"""], +["""\ +.. figure:: picture.png + + A picture with a caption and a legend. + + +-----------------------+-----------------------+ + | Symbol | Meaning | + +=======================+=======================+ + | .. image:: tent.png | Campground | + +-----------------------+-----------------------+ + | .. image:: waves.png | Lake | + +-----------------------+-----------------------+ + | .. image:: peak.png | Mountain | + +-----------------------+-----------------------+ +""", +"""\ +<document> + <figure> + <image uri="picture.png"> + <caption> + A picture with a caption and a legend. + <legend> + <table> + <tgroup cols="2"> + <colspec colwidth="23"> + <colspec colwidth="23"> + <thead> + <row> + <entry> + <paragraph> + Symbol + <entry> + <paragraph> + Meaning + <tbody> + <row> + <entry> + <image uri="tent.png"> + <entry> + <paragraph> + Campground + <row> + <entry> + <image uri="waves.png"> + <entry> + <paragraph> + Lake + <row> + <entry> + <image uri="peak.png"> + <entry> + <paragraph> + Mountain +"""], +["""\ +.. figure:: picture.png + + .. + + A picture with a legend but no caption. + (The empty comment replaces the caption, which must + be a single paragraph.) +""", +"""\ +<document> + <figure> + <image uri="picture.png"> + <legend> + <paragraph> + A picture with a legend but no caption. + (The empty comment replaces the caption, which must + be a single paragraph.) +"""], +["""\ +Testing for line-leaks: + +.. figure:: picture.png + + A picture with a caption. +.. figure:: picture.png + + A picture with a caption. +.. figure:: picture.png + + A picture with a caption. +.. figure:: picture.png +.. figure:: picture.png +.. figure:: picture.png +.. figure:: picture.png + + A picture with a caption. + +.. figure:: picture.png + +.. figure:: picture.png + + A picture with a caption. + +.. figure:: picture.png +""", +"""\ +<document> + <paragraph> + Testing for line-leaks: + <figure> + <image uri="picture.png"> + <caption> + A picture with a caption. + <figure> + <image uri="picture.png"> + <caption> + A picture with a caption. + <figure> + <image uri="picture.png"> + <caption> + A picture with a caption. + <figure> + <image uri="picture.png"> + <figure> + <image uri="picture.png"> + <figure> + <image uri="picture.png"> + <figure> + <image uri="picture.png"> + <caption> + A picture with a caption. + <figure> + <image uri="picture.png"> + <figure> + <image uri="picture.png"> + <caption> + A picture with a caption. + <figure> + <image uri="picture.png"> +"""], +] + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_directives/test_images.py b/test/test_parsers/test_rst/test_directives/test_images.py new file mode 100755 index 000000000..c97dec1ea --- /dev/null +++ b/test/test_parsers/test_rst/test_directives/test_images.py @@ -0,0 +1,233 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for images.py image directives. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['images'] = [ +["""\ +.. image:: picture.png +""", +"""\ +<document> + <image uri="picture.png"> +"""], +["""\ +.. image:: +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Missing image URI argument at line 1. + <literal_block> + .. image:: +"""], +["""\ +.. image:: one two three +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Image URI at line 1 contains whitespace. + <literal_block> + .. image:: one two three +"""], +["""\ +.. image:: picture.png + :height: 100 + :width: 200 + :scale: 50 +""", +"""\ +<document> + <image height="100" scale="50" uri="picture.png" width="200"> +"""], +["""\ +.. image:: + picture.png + :height: 100 + :width: 200 + :scale: 50 +""", +"""\ +<document> + <image height="100" scale="50" uri="picture.png" width="200"> +"""], +["""\ +.. image:: + :height: 100 + :width: 200 + :scale: 50 +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Missing image URI argument at line 1. + <literal_block> + .. image:: + :height: 100 + :width: 200 + :scale: 50 +"""], +["""\ +.. image:: a/very/long/path/to/ + picture.png + :height: 100 + :width: 200 + :scale: 50 +""", +"""\ +<document> + <image height="100" scale="50" uri="a/very/long/path/to/picture.png" width="200"> +"""], +["""\ +.. image:: picture.png + :height: 100 + :width: 200 + :scale: 50 + :alt: Alternate text for the picture +""", +"""\ +<document> + <image alt="Alternate text for the picture" height="100" scale="50" uri="picture.png" width="200"> +"""], +["""\ +.. image:: picture.png + :scale: - 50 +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Error in "image" directive attributes at line 1: + invalid attribute data: extension attribute field body may contain + a single paragraph only (attribute "scale"). + <literal_block> + .. image:: picture.png + :scale: - 50 +"""], +["""\ +.. image:: picture.png + :scale: +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Error in "image" directive attributes at line 1: + invalid attribute value: + (attribute "scale", value "None") object can't be converted to int. + <literal_block> + .. image:: picture.png + :scale: +"""], +["""\ +.. image:: picture.png + :scale 50 +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Error in "image" directive attributes at line 1: + invalid attribute block. + <literal_block> + .. image:: picture.png + :scale 50 +"""], +["""\ +.. image:: picture.png + scale: 50 +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Image URI at line 1 contains whitespace. + <literal_block> + .. image:: picture.png + scale: 50 +"""], +["""\ +.. image:: picture.png + :: 50 +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Error in "image" directive attributes at line 1: + invalid attribute block. + <literal_block> + .. image:: picture.png + :: 50 +"""], +["""\ +.. image:: picture.png + :sale: 50 +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Error in "image" directive attributes at line 1: + unknown attribute: "sale". + <literal_block> + .. image:: picture.png + :sale: 50 +"""], +["""\ +.. image:: picture.png + :scale: fifty +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Error in "image" directive attributes at line 1: + invalid attribute value: + (attribute "scale", value "'fifty'") invalid literal for int(): fifty. + <literal_block> + .. image:: picture.png + :scale: fifty +"""], +["""\ +.. image:: picture.png + :scale: 50 + :scale: 50 +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Error in "image" directive attributes at line 1: + invalid attribute data: duplicate attribute "scale". + <literal_block> + .. image:: picture.png + :scale: 50 + :scale: 50 +"""], +] + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_directives/test_meta.py b/test/test_parsers/test_rst/test_directives/test_meta.py new file mode 100755 index 000000000..20b5a10f1 --- /dev/null +++ b/test/test_parsers/test_rst/test_directives/test_meta.py @@ -0,0 +1,141 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for html.py meta directives. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['meta'] = [ +["""\ +.. meta:: + :description: The reStructuredText plaintext markup language + :keywords: plaintext,markup language +""", +"""\ +<document> + <meta content="The reStructuredText plaintext markup language" name="description"> + <meta content="plaintext,markup language" name="keywords"> +"""], +["""\ +.. meta:: + :description lang=en: An amusing story + :description lang=fr: Un histoire amusant +""", +"""\ +<document> + <meta content="An amusing story" lang="en" name="description"> + <meta content="Un histoire amusant" lang="fr" name="description"> +"""], +["""\ +.. meta:: + :http-equiv=Content-Type: text/html; charset=ISO-8859-1 +""", +"""\ +<document> + <meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> +"""], +["""\ +.. meta:: + :name: content + over multiple lines +""", +"""\ +<document> + <meta content="content over multiple lines" name="name"> +"""], +["""\ +Paragraph + +.. meta:: + :name: content +""", +"""\ +<document> + <paragraph> + Paragraph + <meta content="content" name="name"> +"""], +["""\ +.. meta:: +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Empty meta directive at line 1. +"""], +["""\ +.. meta:: + :empty: +""", +"""\ +<document> + <system_message level="1" type="INFO"> + <paragraph> + No content for meta tag "empty". + <literal_block> + :empty: + <meta content="" name="empty"> +"""], +["""\ +.. meta:: + not a field list +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Invalid meta directive at line 2. + <literal_block> + .. meta:: + not a field list +"""], +["""\ +.. meta:: + :name: content + not a field +""", +"""\ +<document> + <meta content="content" name="name"> + <system_message level="3" type="ERROR"> + <paragraph> + Invalid meta directive at line 3. + <literal_block> + .. meta:: + :name: content + not a field +"""], +["""\ +.. meta:: + :name notattval: content +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Error parsing meta tag attribute "notattval": missing "=" + <literal_block> + :name notattval: content + <meta content="content" name="name"> +"""], +] + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_directives/test_test_directives.py b/test/test_parsers/test_rst/test_directives/test_test_directives.py new file mode 100755 index 000000000..6a6434b1b --- /dev/null +++ b/test/test_parsers/test_rst/test_directives/test_test_directives.py @@ -0,0 +1,109 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for misc.py test directives. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['test_directives'] = [ +["""\ +.. reStructuredText-test-directive:: + +Paragraph. +""", +"""\ +<document> + <system_message level="1" type="INFO"> + <paragraph> + Directive processed. Type="reStructuredText-test-directive", data="", directive block: None + <paragraph> + Paragraph. +"""], +["""\ +.. reStructuredText-test-directive:: argument + +Paragraph. +""", +"""\ +<document> + <system_message level="1" type="INFO"> + <paragraph> + Directive processed. Type="reStructuredText-test-directive", data="argument", directive block: None + <paragraph> + Paragraph. +"""], +["""\ +.. reStructuredText-test-directive:: + + Directive block contains one paragraph, with a blank line before. + +Paragraph. +""", +"""\ +<document> + <system_message level="1" type="INFO"> + <paragraph> + Directive processed. Type="reStructuredText-test-directive", data="", directive block: + <literal_block> + Directive block contains one paragraph, with a blank line before. + <paragraph> + Paragraph. +"""], +["""\ +.. reStructuredText-test-directive:: + Directive block contains one paragraph, no blank line before. + +Paragraph. +""", +"""\ +<document> + <system_message level="1" type="INFO"> + <paragraph> + Directive processed. Type="reStructuredText-test-directive", data="", directive block: + <literal_block> + Directive block contains one paragraph, no blank line before. + <paragraph> + Paragraph. +"""], +["""\ +.. reStructuredText-test-directive:: + block +no blank line. + +Paragraph. +""", +"""\ +<document> + <system_message level="1" type="INFO"> + <paragraph> + Directive processed. Type="reStructuredText-test-directive", data="", directive block: + <literal_block> + block + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 3. + <paragraph> + no blank line. + <paragraph> + Paragraph. +"""], +] + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_directives/test_unknown.py b/test/test_parsers/test_rst/test_directives/test_unknown.py new file mode 100755 index 000000000..67935d652 --- /dev/null +++ b/test/test_parsers/test_rst/test_directives/test_unknown.py @@ -0,0 +1,55 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for unknown directives. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['unknown'] = [ +["""\ +.. reStructuredText-unknown-directive:: + +.. reStructuredText-unknown-directive:: argument + +.. reStructuredText-unknown-directive:: + block +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Unknown directive type "reStructuredText-unknown-directive" at line 1. + <literal_block> + .. reStructuredText-unknown-directive:: + <system_message level="3" type="ERROR"> + <paragraph> + Unknown directive type "reStructuredText-unknown-directive" at line 3. + <literal_block> + .. reStructuredText-unknown-directive:: argument + <system_message level="3" type="ERROR"> + <paragraph> + Unknown directive type "reStructuredText-unknown-directive" at line 5. + <literal_block> + .. reStructuredText-unknown-directive:: + block +"""], +] + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_doctest_blocks.py b/test/test_parsers/test_rst/test_doctest_blocks.py new file mode 100755 index 000000000..869b6f187 --- /dev/null +++ b/test/test_parsers/test_rst/test_doctest_blocks.py @@ -0,0 +1,74 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['doctest_blocks'] = [ +["""\ +Paragraph. + +>>> print "Doctest block." +Doctest block. + +Paragraph. +""", +"""\ +<document> + <paragraph> + Paragraph. + <doctest_block> + >>> print "Doctest block." + Doctest block. + <paragraph> + Paragraph. +"""], +["""\ +Paragraph. + +>>> print " Indented output." + Indented output. +""", +"""\ +<document> + <paragraph> + Paragraph. + <doctest_block> + >>> print " Indented output." + Indented output. +"""], +["""\ +Paragraph. + + >>> print " Indented block & output." + Indented block & output. +""", +"""\ +<document> + <paragraph> + Paragraph. + <block_quote> + <doctest_block> + >>> print " Indented block & output." + Indented block & output. +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_enumerated_lists.py b/test/test_parsers/test_rst/test_enumerated_lists.py new file mode 100755 index 000000000..ff2c0cda9 --- /dev/null +++ b/test/test_parsers/test_rst/test_enumerated_lists.py @@ -0,0 +1,662 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['enumerated_lists'] = [ +["""\ +1. Item one. + +2. Item two. + +3. Item three. +""", +"""\ +<document> + <enumerated_list enumtype="arabic" prefix="" suffix="."> + <list_item> + <paragraph> + Item one. + <list_item> + <paragraph> + Item two. + <list_item> + <paragraph> + Item three. +"""], +["""\ +No blank lines betwen items: + +1. Item one. +2. Item two. +3. Item three. +""", +"""\ +<document> + <paragraph> + No blank lines betwen items: + <enumerated_list enumtype="arabic" prefix="" suffix="."> + <list_item> + <paragraph> + Item one. + <list_item> + <paragraph> + Item two. + <list_item> + <paragraph> + Item three. +"""], +["""\ +1. +empty item above, no blank line +""", +"""\ +<document> + <enumerated_list enumtype="arabic" prefix="" suffix="."> + <list_item> + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 2. + <paragraph> + empty item above, no blank line +"""], +["""\ +Scrambled: + +3. Item three. +2. Item two. +1. Item one. +""", +"""\ +<document> + <paragraph> + Scrambled: + <system_message level="1" type="INFO"> + <paragraph> + Enumerated list start value not ordinal-1 at line 3: '3' (ordinal 3) + <enumerated_list enumtype="arabic" prefix="" start="3" suffix="."> + <list_item> + <paragraph> + Item three. + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 4. + <system_message level="1" type="INFO"> + <paragraph> + Enumerated list start value not ordinal-1 at line 4: '2' (ordinal 2) + <enumerated_list enumtype="arabic" prefix="" start="2" suffix="."> + <list_item> + <paragraph> + Item two. + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 5. + <enumerated_list enumtype="arabic" prefix="" suffix="."> + <list_item> + <paragraph> + Item one. +"""], +["""\ +Skipping item 3: + +1. Item 1. +2. Item 2. +4. Item 4. +""", +"""\ +<document> + <paragraph> + Skipping item 3: + <enumerated_list enumtype="arabic" prefix="" suffix="."> + <list_item> + <paragraph> + Item 1. + <list_item> + <paragraph> + Item 2. + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 4. + <system_message level="1" type="INFO"> + <paragraph> + Enumerated list start value not ordinal-1 at line 5: '4' (ordinal 4) + <enumerated_list enumtype="arabic" prefix="" start="4" suffix="."> + <list_item> + <paragraph> + Item 4. +"""], +["""\ +Start with non-ordinal-1: + +0. Item zero. +1. Item one. +2. Item two. +3. Item three. + +And again: + +2. Item two. +3. Item three. +""", +"""\ +<document> + <paragraph> + Start with non-ordinal-1: + <system_message level="1" type="INFO"> + <paragraph> + Enumerated list start value not ordinal-1 at line 3: '0' (ordinal 0) + <enumerated_list enumtype="arabic" prefix="" start="0" suffix="."> + <list_item> + <paragraph> + Item zero. + <list_item> + <paragraph> + Item one. + <list_item> + <paragraph> + Item two. + <list_item> + <paragraph> + Item three. + <paragraph> + And again: + <system_message level="1" type="INFO"> + <paragraph> + Enumerated list start value not ordinal-1 at line 10: '2' (ordinal 2) + <enumerated_list enumtype="arabic" prefix="" start="2" suffix="."> + <list_item> + <paragraph> + Item two. + <list_item> + <paragraph> + Item three. +"""], +["""\ +1. Item one: line 1, + line 2. +2. Item two: line 1, + line 2. +3. Item three: paragraph 1, line 1, + line 2. + + Paragraph 2. +""", +"""\ +<document> + <enumerated_list enumtype="arabic" prefix="" suffix="."> + <list_item> + <paragraph> + Item one: line 1, + line 2. + <list_item> + <paragraph> + Item two: line 1, + line 2. + <list_item> + <paragraph> + Item three: paragraph 1, line 1, + line 2. + <paragraph> + Paragraph 2. +"""], +["""\ +Different enumeration sequences: + +1. Item 1. +2. Item 2. +3. Item 3. + +A. Item A. +B. Item B. +C. Item C. + +a. Item a. +b. Item b. +c. Item c. + +I. Item I. +II. Item II. +III. Item III. + +i. Item i. +ii. Item ii. +iii. Item iii. +""", +"""\ +<document> + <paragraph> + Different enumeration sequences: + <enumerated_list enumtype="arabic" prefix="" suffix="."> + <list_item> + <paragraph> + Item 1. + <list_item> + <paragraph> + Item 2. + <list_item> + <paragraph> + Item 3. + <enumerated_list enumtype="upperalpha" prefix="" suffix="."> + <list_item> + <paragraph> + Item A. + <list_item> + <paragraph> + Item B. + <list_item> + <paragraph> + Item C. + <enumerated_list enumtype="loweralpha" prefix="" suffix="."> + <list_item> + <paragraph> + Item a. + <list_item> + <paragraph> + Item b. + <list_item> + <paragraph> + Item c. + <enumerated_list enumtype="upperroman" prefix="" suffix="."> + <list_item> + <paragraph> + Item I. + <list_item> + <paragraph> + Item II. + <list_item> + <paragraph> + Item III. + <enumerated_list enumtype="lowerroman" prefix="" suffix="."> + <list_item> + <paragraph> + Item i. + <list_item> + <paragraph> + Item ii. + <list_item> + <paragraph> + Item iii. +"""], +["""\ +Bad Roman numerals: + +i. i +ii. ii +iii. iii +iiii. iiii + +(I) I +(IVXLCDM) IVXLCDM +""", +"""\ +<document> + <paragraph> + Bad Roman numerals: + <enumerated_list enumtype="lowerroman" prefix="" suffix="."> + <list_item> + <paragraph> + i + <list_item> + <paragraph> + ii + <list_item> + <paragraph> + iii + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 4. + <system_message level="3" type="ERROR"> + <paragraph> + Enumerated list start value invalid at line 6: 'iiii' (sequence 'lowerroman') + <block_quote> + <paragraph> + iiii + <enumerated_list enumtype="upperroman" prefix="(" suffix=")"> + <list_item> + <paragraph> + I + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 9. + <system_message level="3" type="ERROR"> + <paragraph> + Enumerated list start value invalid at line 9: 'IVXLCDM' (sequence 'upperroman') + <block_quote> + <paragraph> + IVXLCDM +"""], +["""\ +Potentially ambiguous cases: + +A. Item A. +B. Item B. +C. Item C. + +I. Item I. +II. Item II. +III. Item III. + +a. Item a. +b. Item b. +c. Item c. + +i. Item i. +ii. Item ii. +iii. Item iii. + +Phew! Safe! +""", +"""\ +<document> + <paragraph> + Potentially ambiguous cases: + <enumerated_list enumtype="upperalpha" prefix="" suffix="."> + <list_item> + <paragraph> + Item A. + <list_item> + <paragraph> + Item B. + <list_item> + <paragraph> + Item C. + <enumerated_list enumtype="upperroman" prefix="" suffix="."> + <list_item> + <paragraph> + Item I. + <list_item> + <paragraph> + Item II. + <list_item> + <paragraph> + Item III. + <enumerated_list enumtype="loweralpha" prefix="" suffix="."> + <list_item> + <paragraph> + Item a. + <list_item> + <paragraph> + Item b. + <list_item> + <paragraph> + Item c. + <enumerated_list enumtype="lowerroman" prefix="" suffix="."> + <list_item> + <paragraph> + Item i. + <list_item> + <paragraph> + Item ii. + <list_item> + <paragraph> + Item iii. + <paragraph> + Phew! Safe! +"""], +["""\ +Definitely ambiguous: + +A. Item A. +B. Item B. +C. Item C. +D. Item D. +E. Item E. +F. Item F. +G. Item G. +H. Item H. +I. Item I. +II. Item II. +III. Item III. + +a. Item a. +b. Item b. +c. Item c. +d. Item d. +e. Item e. +f. Item f. +g. Item g. +h. Item h. +i. Item i. +ii. Item ii. +iii. Item iii. +""", +"""\ +<document> + <paragraph> + Definitely ambiguous: + <enumerated_list enumtype="upperalpha" prefix="" suffix="."> + <list_item> + <paragraph> + Item A. + <list_item> + <paragraph> + Item B. + <list_item> + <paragraph> + Item C. + <list_item> + <paragraph> + Item D. + <list_item> + <paragraph> + Item E. + <list_item> + <paragraph> + Item F. + <list_item> + <paragraph> + Item G. + <list_item> + <paragraph> + Item H. + <list_item> + <paragraph> + Item I. + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 4. + <system_message level="1" type="INFO"> + <paragraph> + Enumerated list start value not ordinal-1 at line 12: 'II' (ordinal 2) + <enumerated_list enumtype="upperroman" prefix="" start="2" suffix="."> + <list_item> + <paragraph> + Item II. + <list_item> + <paragraph> + Item III. + <enumerated_list enumtype="loweralpha" prefix="" suffix="."> + <list_item> + <paragraph> + Item a. + <list_item> + <paragraph> + Item b. + <list_item> + <paragraph> + Item c. + <list_item> + <paragraph> + Item d. + <list_item> + <paragraph> + Item e. + <list_item> + <paragraph> + Item f. + <list_item> + <paragraph> + Item g. + <list_item> + <paragraph> + Item h. + <list_item> + <paragraph> + Item i. + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 16. + <system_message level="1" type="INFO"> + <paragraph> + Enumerated list start value not ordinal-1 at line 24: 'ii' (ordinal 2) + <enumerated_list enumtype="lowerroman" prefix="" start="2" suffix="."> + <list_item> + <paragraph> + Item ii. + <list_item> + <paragraph> + Item iii. +"""], +["""\ +Different enumeration formats: + +1. Item 1. +2. Item 2. +3. Item 3. + +1) Item 1). +2) Item 2). +3) Item 3). + +(1) Item (1). +(2) Item (2). +(3) Item (3). +""", +"""\ +<document> + <paragraph> + Different enumeration formats: + <enumerated_list enumtype="arabic" prefix="" suffix="."> + <list_item> + <paragraph> + Item 1. + <list_item> + <paragraph> + Item 2. + <list_item> + <paragraph> + Item 3. + <enumerated_list enumtype="arabic" prefix="" suffix=")"> + <list_item> + <paragraph> + Item 1). + <list_item> + <paragraph> + Item 2). + <list_item> + <paragraph> + Item 3). + <enumerated_list enumtype="arabic" prefix="(" suffix=")"> + <list_item> + <paragraph> + Item (1). + <list_item> + <paragraph> + Item (2). + <list_item> + <paragraph> + Item (3). +"""], +["""\ +Nested enumerated lists: + +1. Item 1. + + A) Item A). + B) Item B). + C) Item C). + +2. Item 2. + + (a) Item (a). + + I) Item I). + II) Item II). + III) Item III). + + (b) Item (b). + + (c) Item (c). + + (i) Item (i). + (ii) Item (ii). + (iii) Item (iii). + +3. Item 3. +""", +"""\ +<document> + <paragraph> + Nested enumerated lists: + <enumerated_list enumtype="arabic" prefix="" suffix="."> + <list_item> + <paragraph> + Item 1. + <enumerated_list enumtype="upperalpha" prefix="" suffix=")"> + <list_item> + <paragraph> + Item A). + <list_item> + <paragraph> + Item B). + <list_item> + <paragraph> + Item C). + <list_item> + <paragraph> + Item 2. + <enumerated_list enumtype="loweralpha" prefix="(" suffix=")"> + <list_item> + <paragraph> + Item (a). + <enumerated_list enumtype="upperroman" prefix="" suffix=")"> + <list_item> + <paragraph> + Item I). + <list_item> + <paragraph> + Item II). + <list_item> + <paragraph> + Item III). + <list_item> + <paragraph> + Item (b). + <list_item> + <paragraph> + Item (c). + <enumerated_list enumtype="lowerroman" prefix="(" suffix=")"> + <list_item> + <paragraph> + Item (i). + <list_item> + <paragraph> + Item (ii). + <list_item> + <paragraph> + Item (iii). + <list_item> + <paragraph> + Item 3. +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_field_lists.py b/test/test_parsers/test_rst/test_field_lists.py new file mode 100755 index 000000000..5ffe99123 --- /dev/null +++ b/test/test_parsers/test_rst/test_field_lists.py @@ -0,0 +1,491 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['field_lists'] = [ +["""\ +One-liners: + +:Author: Me + +:Version: 1 + +:Date: 2001-08-11 + +:Parameter i: integer +""", +"""\ +<document> + <paragraph> + One-liners: + <field_list> + <field> + <field_name> + Author + <field_body> + <paragraph> + Me + <field> + <field_name> + Version + <field_body> + <paragraph> + 1 + <field> + <field_name> + Date + <field_body> + <paragraph> + 2001-08-11 + <field> + <field_name> + Parameter + <field_argument> + i + <field_body> + <paragraph> + integer +"""], +["""\ +One-liners, no blank lines: + +:Author: Me +:Version: 1 +:Date: 2001-08-11 +:Parameter i: integer +""", +"""\ +<document> + <paragraph> + One-liners, no blank lines: + <field_list> + <field> + <field_name> + Author + <field_body> + <paragraph> + Me + <field> + <field_name> + Version + <field_body> + <paragraph> + 1 + <field> + <field_name> + Date + <field_body> + <paragraph> + 2001-08-11 + <field> + <field_name> + Parameter + <field_argument> + i + <field_body> + <paragraph> + integer +"""], +["""\ +:field: +empty item above, no blank line +""", +"""\ +<document> + <field_list> + <field> + <field_name> + field + <field_body> + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 2. + <paragraph> + empty item above, no blank line +"""], +["""\ +Field bodies starting on the next line: + +:Author: + Me +:Version: + 1 +:Date: + 2001-08-11 +:Parameter i: + integer +""", +"""\ +<document> + <paragraph> + Field bodies starting on the next line: + <field_list> + <field> + <field_name> + Author + <field_body> + <paragraph> + Me + <field> + <field_name> + Version + <field_body> + <paragraph> + 1 + <field> + <field_name> + Date + <field_body> + <paragraph> + 2001-08-11 + <field> + <field_name> + Parameter + <field_argument> + i + <field_body> + <paragraph> + integer +"""], +["""\ +One-paragraph, multi-liners: + +:Authors: Me, + Myself, + and I +:Version: 1 + or so +:Date: 2001-08-11 + (Saturday) +:Parameter i: counter + (integer) +""", +"""\ +<document> + <paragraph> + One-paragraph, multi-liners: + <field_list> + <field> + <field_name> + Authors + <field_body> + <paragraph> + Me, + Myself, + and I + <field> + <field_name> + Version + <field_body> + <paragraph> + 1 + or so + <field> + <field_name> + Date + <field_body> + <paragraph> + 2001-08-11 + (Saturday) + <field> + <field_name> + Parameter + <field_argument> + i + <field_body> + <paragraph> + counter + (integer) +"""], +["""\ +One-paragraph, multi-liners, not lined up: + +:Authors: Me, + Myself, + and I +:Version: 1 + or so +:Date: 2001-08-11 + (Saturday) +:Parameter i: counter + (integer) +""", +"""\ +<document> + <paragraph> + One-paragraph, multi-liners, not lined up: + <field_list> + <field> + <field_name> + Authors + <field_body> + <paragraph> + Me, + Myself, + and I + <field> + <field_name> + Version + <field_body> + <paragraph> + 1 + or so + <field> + <field_name> + Date + <field_body> + <paragraph> + 2001-08-11 + (Saturday) + <field> + <field_name> + Parameter + <field_argument> + i + <field_body> + <paragraph> + counter + (integer) +"""], +["""\ +Multiple body elements: + +:Authors: - Me + - Myself + - I + +:Abstract: + This is a field list item's body, + containing multiple elements. + + Here's a literal block:: + + def f(x): + return x**2 + x + + Even nested field lists are possible: + + :Date: 2001-08-11 + :Day: Saturday + :Time: 15:07 +""", +"""\ +<document> + <paragraph> + Multiple body elements: + <field_list> + <field> + <field_name> + Authors + <field_body> + <bullet_list bullet="-"> + <list_item> + <paragraph> + Me + <list_item> + <paragraph> + Myself + <list_item> + <paragraph> + I + <field> + <field_name> + Abstract + <field_body> + <paragraph> + This is a field list item's body, + containing multiple elements. + <paragraph> + Here's a literal block: + <literal_block> + def f(x): + return x**2 + x + <paragraph> + Even nested field lists are possible: + <field_list> + <field> + <field_name> + Date + <field_body> + <paragraph> + 2001-08-11 + <field> + <field_name> + Day + <field_body> + <paragraph> + Saturday + <field> + <field_name> + Time + <field_body> + <paragraph> + 15:07 +"""], +["""\ +Nested field lists on one line: + +:field1: :field2: :field3: body +:field4: :field5: :field6: body + :field7: body + :field8: body + :field9: body line 1 + body line 2 +""", +"""\ +<document> + <paragraph> + Nested field lists on one line: + <field_list> + <field> + <field_name> + field1 + <field_body> + <field_list> + <field> + <field_name> + field2 + <field_body> + <field_list> + <field> + <field_name> + field3 + <field_body> + <paragraph> + body + <field> + <field_name> + field4 + <field_body> + <field_list> + <field> + <field_name> + field5 + <field_body> + <field_list> + <field> + <field_name> + field6 + <field_body> + <paragraph> + body + <field> + <field_name> + field7 + <field_body> + <paragraph> + body + <field> + <field_name> + field8 + <field_body> + <paragraph> + body + <field> + <field_name> + field9 + <field_body> + <paragraph> + body line 1 + body line 2 +"""], +["""\ +:Parameter i j k: multiple arguments +""", +"""\ +<document> + <field_list> + <field> + <field_name> + Parameter + <field_argument> + i + <field_argument> + j + <field_argument> + k + <field_body> + <paragraph> + multiple arguments +"""], +["""\ +Some edge cases: + +:Empty: +:Author: Me +No blank line before this paragraph. + +:*Field* `with` **inline** ``markup``: inline markup shouldn't be recognized. + +: Field: marker must not begin with whitespace. + +:Field : marker must not end with whitespace. + +Field: marker is missing its open-colon. + +:Field marker is missing its close-colon. +""", +"""\ +<document> + <paragraph> + Some edge cases: + <field_list> + <field> + <field_name> + Empty + <field_body> + <field> + <field_name> + Author + <field_body> + <paragraph> + Me + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 4. + <paragraph> + No blank line before this paragraph. + <field_list> + <field> + <field_name> + *Field* + <field_argument> + `with` + <field_argument> + **inline** + <field_argument> + ``markup`` + <field_body> + <paragraph> + inline markup shouldn't be recognized. + <paragraph> + : Field: marker must not begin with whitespace. + <paragraph> + :Field : marker must not end with whitespace. + <paragraph> + Field: marker is missing its open-colon. + <paragraph> + :Field marker is missing its close-colon. +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_footnotes.py b/test/test_parsers/test_rst/test_footnotes.py new file mode 100755 index 000000000..af42acc0f --- /dev/null +++ b/test/test_parsers/test_rst/test_footnotes.py @@ -0,0 +1,332 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['footnotes'] = [ +["""\ +.. [1] This is a footnote. +""", +"""\ +<document> + <footnote id="id1" name="1"> + <label> + 1 + <paragraph> + This is a footnote. +"""], +["""\ +.. [1] This is a footnote + on multiple lines. +""", +"""\ +<document> + <footnote id="id1" name="1"> + <label> + 1 + <paragraph> + This is a footnote + on multiple lines. +"""], +["""\ +.. [1] This is a footnote + on multiple lines with more space. + +.. [2] This is a footnote + on multiple lines with less space. +""", +"""\ +<document> + <footnote id="id1" name="1"> + <label> + 1 + <paragraph> + This is a footnote + on multiple lines with more space. + <footnote id="id2" name="2"> + <label> + 2 + <paragraph> + This is a footnote + on multiple lines with less space. +"""], +["""\ +.. [1] + This is a footnote on multiple lines + whose block starts on line 2. +""", +"""\ +<document> + <footnote id="id1" name="1"> + <label> + 1 + <paragraph> + This is a footnote on multiple lines + whose block starts on line 2. +"""], +["""\ +.. [1] + +That was an empty footnote. +""", +"""\ +<document> + <footnote id="id1" name="1"> + <label> + 1 + <paragraph> + That was an empty footnote. +"""], +["""\ +.. [1] +No blank line. +""", +"""\ +<document> + <footnote id="id1" name="1"> + <label> + 1 + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 2. + <paragraph> + No blank line. +"""], +] + +totest['auto_numbered_footnotes'] = [ +["""\ +[#]_ is the first auto-numbered footnote reference. +[#]_ is the second auto-numbered footnote reference. + +.. [#] Auto-numbered footnote 1. +.. [#] Auto-numbered footnote 2. +.. [#] Auto-numbered footnote 3. + +[#]_ is the third auto-numbered footnote reference. +""", +"""\ +<document> + <paragraph> + <footnote_reference auto="1" id="id1"> + is the first auto-numbered footnote reference. + <footnote_reference auto="1" id="id2"> + is the second auto-numbered footnote reference. + <footnote auto="1" id="id3"> + <paragraph> + Auto-numbered footnote 1. + <footnote auto="1" id="id4"> + <paragraph> + Auto-numbered footnote 2. + <footnote auto="1" id="id5"> + <paragraph> + Auto-numbered footnote 3. + <paragraph> + <footnote_reference auto="1" id="id6"> + is the third auto-numbered footnote reference. +"""], +["""\ +[#third]_ is a reference to the third auto-numbered footnote. + +.. [#first] First auto-numbered footnote. +.. [#second] Second auto-numbered footnote. +.. [#third] Third auto-numbered footnote. + +[#second]_ is a reference to the second auto-numbered footnote. +[#first]_ is a reference to the first auto-numbered footnote. +[#third]_ is another reference to the third auto-numbered footnote. + +Here are some internal cross-references to the targets generated by +the footnotes: first_, second_, third_. +""", +"""\ +<document> + <paragraph> + <footnote_reference auto="1" id="id1" refname="third"> + is a reference to the third auto-numbered footnote. + <footnote auto="1" id="first" name="first"> + <paragraph> + First auto-numbered footnote. + <footnote auto="1" id="second" name="second"> + <paragraph> + Second auto-numbered footnote. + <footnote auto="1" id="third" name="third"> + <paragraph> + Third auto-numbered footnote. + <paragraph> + <footnote_reference auto="1" id="id2" refname="second"> + is a reference to the second auto-numbered footnote. + <footnote_reference auto="1" id="id3" refname="first"> + is a reference to the first auto-numbered footnote. + <footnote_reference auto="1" id="id4" refname="third"> + is another reference to the third auto-numbered footnote. + <paragraph> + Here are some internal cross-references to the targets generated by + the footnotes: \n\ + <reference refname="first"> + first + , \n\ + <reference refname="second"> + second + , \n\ + <reference refname="third"> + third + . +"""], +["""\ +Mixed anonymous and labelled auto-numbered footnotes: + +[#four]_ should be 4, [#]_ should be 1, +[#]_ should be 3, [#]_ is one too many, +[#two]_ should be 2, and [#six]_ doesn't exist. + +.. [#] Auto-numbered footnote 1. +.. [#two] Auto-numbered footnote 2. +.. [#] Auto-numbered footnote 3. +.. [#four] Auto-numbered footnote 4. +.. [#five] Auto-numbered footnote 5. +.. [#five] Auto-numbered footnote 5 again (duplicate). +""", +"""\ +<document> + <paragraph> + Mixed anonymous and labelled auto-numbered footnotes: + <paragraph> + <footnote_reference auto="1" id="id1" refname="four"> + should be 4, \n\ + <footnote_reference auto="1" id="id2"> + should be 1, + <footnote_reference auto="1" id="id3"> + should be 3, \n\ + <footnote_reference auto="1" id="id4"> + is one too many, + <footnote_reference auto="1" id="id5" refname="two"> + should be 2, and \n\ + <footnote_reference auto="1" id="id6" refname="six"> + doesn't exist. + <footnote auto="1" id="id7"> + <paragraph> + Auto-numbered footnote 1. + <footnote auto="1" id="two" name="two"> + <paragraph> + Auto-numbered footnote 2. + <footnote auto="1" id="id8"> + <paragraph> + Auto-numbered footnote 3. + <footnote auto="1" id="four" name="four"> + <paragraph> + Auto-numbered footnote 4. + <footnote auto="1" dupname="five" id="five"> + <paragraph> + Auto-numbered footnote 5. + <footnote auto="1" dupname="five" id="id9"> + <system_message backrefs="id9" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "five". + <paragraph> + Auto-numbered footnote 5 again (duplicate). +"""], +["""\ +Mixed manually-numbered, anonymous auto-numbered, +and labelled auto-numbered footnotes: + +[#four]_ should be 4, [#]_ should be 2, +[1]_ is 1, [3]_ is 3, +[#]_ should be 6, [#]_ is one too many, +[#five]_ should be 5, and [#six]_ doesn't exist. + +.. [1] Manually-numbered footnote 1. +.. [#] Auto-numbered footnote 2. +.. [#four] Auto-numbered footnote 4. +.. [3] Manually-numbered footnote 3 +.. [#five] Auto-numbered footnote 5. +.. [#five] Auto-numbered footnote 5 again (duplicate). +.. [#] Auto-numbered footnote 6. +""", +"""\ +<document> + <paragraph> + Mixed manually-numbered, anonymous auto-numbered, + and labelled auto-numbered footnotes: + <paragraph> + <footnote_reference auto="1" id="id1" refname="four"> + should be 4, \n\ + <footnote_reference auto="1" id="id2"> + should be 2, + <footnote_reference id="id3" refname="1"> + 1 + is 1, \n\ + <footnote_reference id="id4" refname="3"> + 3 + is 3, + <footnote_reference auto="1" id="id5"> + should be 6, \n\ + <footnote_reference auto="1" id="id6"> + is one too many, + <footnote_reference auto="1" id="id7" refname="five"> + should be 5, and \n\ + <footnote_reference auto="1" id="id8" refname="six"> + doesn't exist. + <footnote id="id9" name="1"> + <label> + 1 + <paragraph> + Manually-numbered footnote 1. + <footnote auto="1" id="id10"> + <paragraph> + Auto-numbered footnote 2. + <footnote auto="1" id="four" name="four"> + <paragraph> + Auto-numbered footnote 4. + <footnote id="id11" name="3"> + <label> + 3 + <paragraph> + Manually-numbered footnote 3 + <footnote auto="1" dupname="five" id="five"> + <paragraph> + Auto-numbered footnote 5. + <footnote auto="1" dupname="five" id="id12"> + <system_message backrefs="id12" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "five". + <paragraph> + Auto-numbered footnote 5 again (duplicate). + <footnote auto="1" id="id13"> + <paragraph> + Auto-numbered footnote 6. +"""], +] + +totest['auto_symbol_footnotes'] = [ +["""\ +.. [*] This is an auto-symbol footnote. +""", +"""\ +<document> + <footnote auto="*" id="id1"> + <paragraph> + This is an auto-symbol footnote. +"""], +] + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_functions.py b/test/test_parsers/test_rst/test_functions.py new file mode 100755 index 000000000..d1d9786e3 --- /dev/null +++ b/test/test_parsers/test_rst/test_functions.py @@ -0,0 +1,37 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import unittest +from DocutilsTestSupport import states + + +class FuctionTests(unittest.TestCase): + + escaped = r'escapes: \*one, \\*two, \\\*three' + nulled = 'escapes: \x00*one, \x00\\*two, \x00\\\x00*three' + unescaped = r'escapes: *one, \*two, \*three' + + def test_escape2null(self): + nulled = states.escape2null(self.escaped) + self.assertEquals(nulled, self.nulled) + nulled = states.escape2null(self.escaped + '\\') + self.assertEquals(nulled, self.nulled + '\x00') + + def test_unescape(self): + unescaped = states.unescape(self.nulled) + self.assertEquals(unescaped, self.unescaped) + restored = states.unescape(self.nulled, 1) + self.assertEquals(restored, self.escaped) + + +if __name__ == '__main__': + unittest.main() diff --git a/test/test_parsers/test_rst/test_inline_markup.py b/test/test_parsers/test_rst/test_inline_markup.py new file mode 100755 index 000000000..b1485ccf6 --- /dev/null +++ b/test/test_parsers/test_rst/test_inline_markup.py @@ -0,0 +1,659 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['emphasis'] = [ +["""\ +*emphasis* +""", +"""\ +<document> + <paragraph> + <emphasis> + emphasis +"""], +["""\ +*emphasized sentence +across lines* +""", +"""\ +<document> + <paragraph> + <emphasis> + emphasized sentence + across lines +"""], +["""\ +*emphasis without closing asterisk +""", +"""\ +<document> + <paragraph> + <problematic id="id2" refid="id1"> + * + emphasis without closing asterisk + <system_message backrefs="id2" id="id1" level="2" type="WARNING"> + <paragraph> + Inline emphasis start-string without end-string at line 1. +"""], +["""\ +'*emphasis*' but not '*' or '"*"' or x*2* or 2*x* or \\*args or * +or *the\\* *stars\\\\\\* *inside* + +(however, '*args' will trigger a warning and may be problematic) + +what about *this**? +""", +"""\ +<document> + <paragraph> + ' + <emphasis> + emphasis + ' but not '*' or '"*"' or x*2* or 2*x* or *args or * + or \n\ + <emphasis> + the* *stars\* *inside + <paragraph> + (however, ' + <problematic id="id2" refid="id1"> + * + args' will trigger a warning and may be problematic) + <system_message backrefs="id2" id="id1" level="2" type="WARNING"> + <paragraph> + Inline emphasis start-string without end-string at line 4. + <paragraph> + what about \n\ + <emphasis> + this* + ? +"""], +["""\ +Emphasized asterisk: *\\** + +Emphasized double asterisk: *\\*** +""", +"""\ +<document> + <paragraph> + Emphasized asterisk: \n\ + <emphasis> + * + <paragraph> + Emphasized double asterisk: \n\ + <emphasis> + ** +"""], +] + +totest['strong'] = [ +["""\ +**strong** +""", +"""\ +<document> + <paragraph> + <strong> + strong +"""], +["""\ +(**strong**) but not (**) or '(** ' or x**2 or \\**kwargs or ** + +(however, '**kwargs' will trigger a warning and may be problematic) +""", +"""\ +<document> + <paragraph> + ( + <strong> + strong + ) but not (**) or '(** ' or x**2 or **kwargs or ** + <paragraph> + (however, ' + <problematic id="id2" refid="id1"> + ** + kwargs' will trigger a warning and may be problematic) + <system_message backrefs="id2" id="id1" level="2" type="WARNING"> + <paragraph> + Inline strong start-string without end-string at line 3. +"""], +["""\ +Strong asterisk: ***** + +Strong double asterisk: ****** +""", +"""\ +<document> + <paragraph> + Strong asterisk: \n\ + <strong> + * + <paragraph> + Strong double asterisk: \n\ + <strong> + ** +"""], +] + +totest['literal'] = [ +["""\ +``literal`` +""", +"""\ +<document> + <paragraph> + <literal> + literal +"""], +["""\ +``\\literal`` +""", +"""\ +<document> + <paragraph> + <literal> + \\literal +"""], +["""\ +``lite\\ral`` +""", +"""\ +<document> + <paragraph> + <literal> + lite\\ral +"""], +["""\ +``literal\\`` +""", +"""\ +<document> + <paragraph> + <literal> + literal\\ +"""], +["""\ +``literal ``TeX quotes'' & \\backslash`` but not "``" or `` + +(however, ``standalone TeX quotes'' will trigger a warning +and may be problematic) +""", +"""\ +<document> + <paragraph> + <literal> + literal ``TeX quotes'' & \\backslash + but not "``" or `` + <paragraph> + (however, \n\ + <problematic id="id2" refid="id1"> + `` + standalone TeX quotes'' will trigger a warning + and may be problematic) + <system_message backrefs="id2" id="id1" level="2" type="WARNING"> + <paragraph> + Inline literal start-string without end-string at line 3. +"""], +["""\ +Find the ```interpreted text``` in this paragraph! +""", +"""\ +<document> + <paragraph> + Find the \n\ + <literal> + `interpreted text` + in this paragraph! +"""], +] + +totest['interpreted'] = [ +["""\ +`interpreted` +""", +"""\ +<document> + <paragraph> + <interpreted> + interpreted +"""], +["""\ +:role:`interpreted` +""", +"""\ +<document> + <paragraph> + <interpreted position="prefix" role="role"> + interpreted +"""], +["""\ +`interpreted`:role: +""", +"""\ +<document> + <paragraph> + <interpreted position="suffix" role="role"> + interpreted +"""], +["""\ +:role:`:not-role: interpreted` +""", +"""\ +<document> + <paragraph> + <interpreted position="prefix" role="role"> + :not-role: interpreted +"""], +["""\ +:very.long-role_name:`interpreted` +""", +"""\ +<document> + <paragraph> + <interpreted position="prefix" role="very.long-role_name"> + interpreted +"""], +["""\ +`interpreted` but not \\`interpreted` [`] or ({[`] or [`]}) or ` +""", +"""\ +<document> + <paragraph> + <interpreted> + interpreted + but not `interpreted` [`] or ({[`] or [`]}) or ` +"""], +["""\ +`interpreted`-text `interpreted`: text `interpreted`:text `text`'s interpreted +""", +"""\ +<document> + <paragraph> + <interpreted> + interpreted + -text \n\ + <interpreted> + interpreted + : text \n\ + <interpreted> + interpreted + :text \n\ + <interpreted> + text + 's interpreted +"""], +] + +totest['references'] = [ +["""\ +ref_ +""", +"""\ +<document> + <paragraph> + <reference refname="ref"> + ref +"""], +["""\ +ref__ +""", +"""\ +<document> + <paragraph> + <reference anonymous="1"> + ref +"""], +["""\ +ref_, r_, r_e-f_, and anonymousref__, but not _ref_ or -ref_ +""", +"""\ +<document> + <paragraph> + <reference refname="ref"> + ref + , \n\ + <reference refname="r"> + r + , \n\ + <reference refname="r_e-f"> + r_e-f + , and \n\ + <reference anonymous="1"> + anonymousref + , but not _ref_ or -ref_ +"""], +] + +totest['phrase_references'] = [ +["""\ +`phrase reference`_ +""", +"""\ +<document> + <paragraph> + <reference refname="phrase reference"> + phrase reference +"""], +["""\ +`anonymous reference`__ +""", +"""\ +<document> + <paragraph> + <reference anonymous="1"> + anonymous reference +"""], +["""\ +`phrase reference +across lines`_ +""", +"""\ +<document> + <paragraph> + <reference refname="phrase reference across lines"> + phrase reference + across lines +"""], +["""\ +`phrase\`_ reference`_ +""", +"""\ +<document> + <paragraph> + <reference refname="phrase`_ reference"> + phrase`_ reference +"""], +["""\ +Invalid phrase reference: + +:role:`phrase reference`_ +""", +"""\ +<document> + <paragraph> + Invalid phrase reference: + <paragraph> + :role: + <problematic id="id2" refid="id1"> + ` + phrase reference`_ + <system_message backrefs="id2" id="id1" level="2" type="WARNING"> + <paragraph> + Mismatch: inline interpreted text start-string and role with phrase-reference end-string at line 3. +"""], +["""\ +Invalid phrase reference: + +`phrase reference`:role:_ +""", +"""\ +<document> + <paragraph> + Invalid phrase reference: + <paragraph> + <interpreted> + phrase reference + :role:_ +"""], +] + +totest['inline_targets'] = [ +["""\ +_`target` + +Here is _`another target` in some text. And _`yet +another target`, spanning lines. + +_`Here is a TaRgeT` with case and spacial difficulties. +""", +"""\ +<document> + <paragraph> + <target id="target" name="target"> + target + <paragraph> + Here is \n\ + <target id="another-target" name="another target"> + another target + in some text. And \n\ + <target id="yet-another-target" name="yet another target"> + yet + another target + , spanning lines. + <paragraph> + <target id="here-is-a-target" name="here is a target"> + Here is a TaRgeT + with case and spacial difficulties. +"""], +["""\ +But this isn't a _target; targets require backquotes. + +And _`this`_ is just plain confusing. +""", +"""\ +<document> + <paragraph> + But this isn't a _target; targets require backquotes. + <paragraph> + And \n\ + <problematic id="id2" refid="id1"> + _` + this`_ is just plain confusing. + <system_message backrefs="id2" id="id1" level="2" type="WARNING"> + <paragraph> + Inline target start-string without end-string at line 3. +"""], +] + +totest['footnote_reference'] = [ +["""\ +[1]_ +""", +"""\ +<document> + <paragraph> + <footnote_reference id="id1" refname="1"> + 1 +"""], +["""\ +[#]_ +""", +"""\ +<document> + <paragraph> + <footnote_reference auto="1" id="id1"> +"""], +["""\ +[#label]_ +""", +"""\ +<document> + <paragraph> + <footnote_reference auto="1" id="id1" refname="label"> +"""], +["""\ +[*]_ +""", +"""\ +<document> + <paragraph> + <footnote_reference auto="*" id="id1"> +"""], +] + +totest['citation_reference'] = [ +["""\ +[citation]_ +""", +"""\ +<document> + <paragraph> + <citation_reference id="id1" refname="citation"> + citation +"""], +["""\ +[citation]_ and [cit-ation]_ and [cit.ation]_ and [CIT1]_ but not [CIT 1]_ +""", +"""\ +<document> + <paragraph> + <citation_reference id="id1" refname="citation"> + citation + and \n\ + <citation_reference id="id2" refname="cit-ation"> + cit-ation + and \n\ + <citation_reference id="id3" refname="cit.ation"> + cit.ation + and \n\ + <citation_reference id="id4" refname="cit1"> + CIT1 + but not [CIT 1]_ +"""], +] + +totest['substitution_references'] = [ +["""\ +|subref| +""", +"""\ +<document> + <paragraph> + <substitution_reference refname="subref"> + subref +"""], +["""\ +|subref|_ and |subref|__ +""", +"""\ +<document> + <paragraph> + <reference refname="subref"> + <substitution_reference refname="subref"> + subref + and \n\ + <reference anonymous="1"> + <substitution_reference refname="subref"> + subref +"""], +["""\ +|substitution reference| +""", +"""\ +<document> + <paragraph> + <substitution_reference refname="substitution reference"> + substitution reference +"""], +["""\ +|substitution +reference| +""", +"""\ +<document> + <paragraph> + <substitution_reference refname="substitution reference"> + substitution + reference +"""], +] + +totest['standalone_hyperlink'] = [ +["""\ +http://www.standalone.hyperlink.com + +http:/one-slash-only.absolute.path + +http://[1080:0:0:0:8:800:200C:417A]/IPv6address.html + +http://[3ffe:2a00:100:7031::1] + +mailto:someone@somewhere.com + +news:comp.lang.python + +An email address in a sentence: someone@somewhere.com. + +ftp://ends.with.a.period. + +(a.question.mark@end?) +""", +"""\ +<document> + <paragraph> + <reference refuri="http://www.standalone.hyperlink.com"> + http://www.standalone.hyperlink.com + <paragraph> + <reference refuri="http:/one-slash-only.absolute.path"> + http:/one-slash-only.absolute.path + <paragraph> + <reference refuri="http://[1080:0:0:0:8:800:200C:417A]/IPv6address.html"> + http://[1080:0:0:0:8:800:200C:417A]/IPv6address.html + <paragraph> + <reference refuri="http://[3ffe:2a00:100:7031::1]"> + http://[3ffe:2a00:100:7031::1] + <paragraph> + <reference refuri="mailto:someone@somewhere.com"> + mailto:someone@somewhere.com + <paragraph> + <reference refuri="news:comp.lang.python"> + news:comp.lang.python + <paragraph> + An email address in a sentence: \n\ + <reference refuri="mailto:someone@somewhere.com"> + someone@somewhere.com + . + <paragraph> + <reference refuri="ftp://ends.with.a.period"> + ftp://ends.with.a.period + . + <paragraph> + ( + <reference refuri="mailto:a.question.mark@end"> + a.question.mark@end + ?) +"""], +["""\ +None of these are standalone hyperlinks (their "schemes" +are not recognized): signal:noise, a:b. +""", +"""\ +<document> + <paragraph> + None of these are standalone hyperlinks (their "schemes" + are not recognized): signal:noise, a:b. +"""], +] + +totest['miscellaneous'] = [ +["""\ +__This__ should be left alone. +""", +"""\ +<document> + <paragraph> + __This__ should be left alone. +"""], +] + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_literal_blocks.py b/test/test_parsers/test_rst/test_literal_blocks.py new file mode 100755 index 000000000..b9449d064 --- /dev/null +++ b/test/test_parsers/test_rst/test_literal_blocks.py @@ -0,0 +1,190 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['literal_blocks'] = [ +["""\ +A paragraph:: + + A literal block. +""", +"""\ +<document> + <paragraph> + A paragraph: + <literal_block> + A literal block. +"""], +["""\ +A paragraph:: + + A literal block. + +Another paragraph:: + + Another literal block. + With two blank lines following. + + +A final paragraph. +""", +"""\ +<document> + <paragraph> + A paragraph: + <literal_block> + A literal block. + <paragraph> + Another paragraph: + <literal_block> + Another literal block. + With two blank lines following. + <paragraph> + A final paragraph. +"""], +["""\ +A paragraph +on more than +one line:: + + A literal block. +""", +"""\ +<document> + <paragraph> + A paragraph + on more than + one line: + <literal_block> + A literal block. +"""], +["""\ +A paragraph +on more than +one line:: + A literal block + with no blank line above. +""", +"""\ +<document> + <paragraph> + A paragraph + on more than + one line: + <system_message level="3" type="ERROR"> + <paragraph> + Unexpected indentation at line 4. + <literal_block> + A literal block + with no blank line above. +"""], +["""\ +A paragraph:: + + A literal block. +no blank line +""", +"""\ +<document> + <paragraph> + A paragraph: + <literal_block> + A literal block. + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 4. + <paragraph> + no blank line +"""], +["""\ +A paragraph: :: + + A literal block. +""", +"""\ +<document> + <paragraph> + A paragraph: + <literal_block> + A literal block. +"""], +["""\ +A paragraph: + +:: + + A literal block. +""", +"""\ +<document> + <paragraph> + A paragraph: + <literal_block> + A literal block. +"""], +["""\ +A paragraph:: + +Not a literal block. +""", +"""\ +<document> + <paragraph> + A paragraph: + <system_message level="2" type="WARNING"> + <paragraph> + Literal block expected at line 2; none found. + <paragraph> + Not a literal block. +"""], +["""\ +A paragraph:: + + A wonky literal block. + Literal line 2. + + Literal line 3. +""", +"""\ +<document> + <paragraph> + A paragraph: + <literal_block> + A wonky literal block. + Literal line 2. + \n\ + Literal line 3. +"""], +["""\ +EOF, even though a literal block is indicated:: +""", +"""\ +<document> + <paragraph> + EOF, even though a literal block is indicated: + <system_message level="2" type="WARNING"> + <paragraph> + Literal block expected at line 2; none found. +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_option_lists.py b/test/test_parsers/test_rst/test_option_lists.py new file mode 100755 index 000000000..125a49bcd --- /dev/null +++ b/test/test_parsers/test_rst/test_option_lists.py @@ -0,0 +1,684 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['option_lists'] = [ +["""\ +Short options: + +-a option -a + +-b file option -b + +-c name option -c +""", +"""\ +<document> + <paragraph> + Short options: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + -a + <description> + <paragraph> + option -a + <option_list_item> + <option_group> + <option> + <option_string> + -b + <option_argument delimiter=" "> + file + <description> + <paragraph> + option -b + <option_list_item> + <option_group> + <option> + <option_string> + -c + <option_argument delimiter=" "> + name + <description> + <paragraph> + option -c +"""], +["""\ +Long options: + +--aaaa option --aaaa +--bbbb=file option --bbbb +--cccc name option --cccc +--d-e-f-g option --d-e-f-g +--h_i_j_k option --h_i_j_k +""", +"""\ +<document> + <paragraph> + Long options: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + --aaaa + <description> + <paragraph> + option --aaaa + <option_list_item> + <option_group> + <option> + <option_string> + --bbbb + <option_argument delimiter="="> + file + <description> + <paragraph> + option --bbbb + <option_list_item> + <option_group> + <option> + <option_string> + --cccc + <option_argument delimiter=" "> + name + <description> + <paragraph> + option --cccc + <option_list_item> + <option_group> + <option> + <option_string> + --d-e-f-g + <description> + <paragraph> + option --d-e-f-g + <option_list_item> + <option_group> + <option> + <option_string> + --h_i_j_k + <description> + <paragraph> + option --h_i_j_k +"""], +["""\ +Old GNU-style options: + ++a option +a + ++b file option +b + ++c name option +c +""", +"""\ +<document> + <paragraph> + Old GNU-style options: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + +a + <description> + <paragraph> + option +a + <option_list_item> + <option_group> + <option> + <option_string> + +b + <option_argument delimiter=" "> + file + <description> + <paragraph> + option +b + <option_list_item> + <option_group> + <option> + <option_string> + +c + <option_argument delimiter=" "> + name + <description> + <paragraph> + option +c +"""], +["""\ +VMS/DOS-style options: + +/A option /A +/B file option /B +/CCC option /CCC +/DDD string option /DDD +/EEE=int option /EEE +""", +"""\ +<document> + <paragraph> + VMS/DOS-style options: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + /A + <description> + <paragraph> + option /A + <option_list_item> + <option_group> + <option> + <option_string> + /B + <option_argument delimiter=" "> + file + <description> + <paragraph> + option /B + <option_list_item> + <option_group> + <option> + <option_string> + /CCC + <description> + <paragraph> + option /CCC + <option_list_item> + <option_group> + <option> + <option_string> + /DDD + <option_argument delimiter=" "> + string + <description> + <paragraph> + option /DDD + <option_list_item> + <option_group> + <option> + <option_string> + /EEE + <option_argument delimiter="="> + int + <description> + <paragraph> + option /EEE +"""], +["""\ +Mixed short, long, and VMS/DOS options: + +-a option -a +--bbbb=file option -bbbb +/C option /C +--dddd name option --dddd +-e string option -e +/F file option /F +""", +"""\ +<document> + <paragraph> + Mixed short, long, and VMS/DOS options: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + -a + <description> + <paragraph> + option -a + <option_list_item> + <option_group> + <option> + <option_string> + --bbbb + <option_argument delimiter="="> + file + <description> + <paragraph> + option -bbbb + <option_list_item> + <option_group> + <option> + <option_string> + /C + <description> + <paragraph> + option /C + <option_list_item> + <option_group> + <option> + <option_string> + --dddd + <option_argument delimiter=" "> + name + <description> + <paragraph> + option --dddd + <option_list_item> + <option_group> + <option> + <option_string> + -e + <option_argument delimiter=" "> + string + <description> + <paragraph> + option -e + <option_list_item> + <option_group> + <option> + <option_string> + /F + <option_argument delimiter=" "> + file + <description> + <paragraph> + option /F +"""], +["""\ +Aliased options: + +-a, --aaaa, /A option -a, --aaaa, /A +-b file, --bbbb=file, /B file option -b, --bbbb, /B +""", +"""\ +<document> + <paragraph> + Aliased options: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + -a + <option> + <option_string> + --aaaa + <option> + <option_string> + /A + <description> + <paragraph> + option -a, --aaaa, /A + <option_list_item> + <option_group> + <option> + <option_string> + -b + <option_argument delimiter=" "> + file + <option> + <option_string> + --bbbb + <option_argument delimiter="="> + file + <option> + <option_string> + /B + <option_argument delimiter=" "> + file + <description> + <paragraph> + option -b, --bbbb, /B +"""], +["""\ +Multiple lines in descriptions, aligned: + +-a option -a, line 1 + line 2 +-b file option -b, line 1 + line 2 +""", +"""\ +<document> + <paragraph> + Multiple lines in descriptions, aligned: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + -a + <description> + <paragraph> + option -a, line 1 + line 2 + <option_list_item> + <option_group> + <option> + <option_string> + -b + <option_argument delimiter=" "> + file + <description> + <paragraph> + option -b, line 1 + line 2 +"""], +["""\ +Multiple lines in descriptions, not aligned: + +-a option -a, line 1 + line 2 +-b file option -b, line 1 + line 2 +""", +"""\ +<document> + <paragraph> + Multiple lines in descriptions, not aligned: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + -a + <description> + <paragraph> + option -a, line 1 + line 2 + <option_list_item> + <option_group> + <option> + <option_string> + -b + <option_argument delimiter=" "> + file + <description> + <paragraph> + option -b, line 1 + line 2 +"""], +["""\ +Descriptions begin on next line: + +-a + option -a, line 1 + line 2 +-b file + option -b, line 1 + line 2 +""", +"""\ +<document> + <paragraph> + Descriptions begin on next line: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + -a + <description> + <paragraph> + option -a, line 1 + line 2 + <option_list_item> + <option_group> + <option> + <option_string> + -b + <option_argument delimiter=" "> + file + <description> + <paragraph> + option -b, line 1 + line 2 +"""], +["""\ +Multiple body elements in descriptions: + +-a option -a, para 1 + + para 2 +-b file + option -b, para 1 + + para 2 +""", +"""\ +<document> + <paragraph> + Multiple body elements in descriptions: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + -a + <description> + <paragraph> + option -a, para 1 + <paragraph> + para 2 + <option_list_item> + <option_group> + <option> + <option_string> + -b + <option_argument delimiter=" "> + file + <description> + <paragraph> + option -b, para 1 + <paragraph> + para 2 +"""], +["""\ +--option +empty item above, no blank line +""", +"""\ +<document> + <paragraph> + --option + empty item above, no blank line +"""], +["""\ +An option list using equals: + +--long1=arg1 Description 1 +--long2=arg2 Description 2 + +An option list using spaces: + +--long1 arg1 Description 1 +--long2 arg2 Description 2 + +An option list using mixed delimiters: + +--long1=arg1 Description 1 +--long2 arg2 Description 2 + +An option list using mixed delimiters in one line: + +--long1=arg1, --long2 arg2 Description +""", +"""\ +<document> + <paragraph> + An option list using equals: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + --long1 + <option_argument delimiter="="> + arg1 + <description> + <paragraph> + Description 1 + <option_list_item> + <option_group> + <option> + <option_string> + --long2 + <option_argument delimiter="="> + arg2 + <description> + <paragraph> + Description 2 + <paragraph> + An option list using spaces: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + --long1 + <option_argument delimiter=" "> + arg1 + <description> + <paragraph> + Description 1 + <option_list_item> + <option_group> + <option> + <option_string> + --long2 + <option_argument delimiter=" "> + arg2 + <description> + <paragraph> + Description 2 + <paragraph> + An option list using mixed delimiters: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + --long1 + <option_argument delimiter="="> + arg1 + <description> + <paragraph> + Description 1 + <option_list_item> + <option_group> + <option> + <option_string> + --long2 + <option_argument delimiter=" "> + arg2 + <description> + <paragraph> + Description 2 + <paragraph> + An option list using mixed delimiters in one line: + <option_list> + <option_list_item> + <option_group> + <option> + <option_string> + --long1 + <option_argument delimiter="="> + arg1 + <option> + <option_string> + --long2 + <option_argument delimiter=" "> + arg2 + <description> + <paragraph> + Description +"""], +["""\ +Some edge cases: + +--option=arg arg too many arguments + +--option=arg,arg not supported (yet?) + +--option=arg=arg too many arguments + +--option arg arg too many arguments + +-a letter arg2 too many arguments + +/A letter arg2 too many arguments + +--option= argument missing + +--=argument option missing + +-- everything missing + +- this should be a bullet list item + +These next ones should be simple paragraphs: + +-1 + +--option + +--1 + +-1 and this one too. +""", +"""\ +<document> + <paragraph> + Some edge cases: + <paragraph> + --option=arg arg too many arguments + <paragraph> + --option=arg,arg not supported (yet?) + <paragraph> + --option=arg=arg too many arguments + <paragraph> + --option arg arg too many arguments + <paragraph> + -a letter arg2 too many arguments + <paragraph> + /A letter arg2 too many arguments + <paragraph> + --option= argument missing + <paragraph> + --=argument option missing + <paragraph> + -- everything missing + <bullet_list bullet="-"> + <list_item> + <paragraph> + this should be a bullet list item + <paragraph> + These next ones should be simple paragraphs: + <paragraph> + -1 + <paragraph> + --option + <paragraph> + --1 + <paragraph> + -1 and this one too. +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_outdenting.py b/test/test_parsers/test_rst/test_outdenting.py new file mode 100755 index 000000000..8e9afc8c6 --- /dev/null +++ b/test/test_parsers/test_rst/test_outdenting.py @@ -0,0 +1,90 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['outdenting'] = [ +["""\ +Anywhere a paragraph would have an effect on the current +indentation level, a comment or list item should also. + ++ bullet + +This paragraph ends the bullet list item before a block quote. + + Block quote. +""", +"""\ +<document> + <paragraph> + Anywhere a paragraph would have an effect on the current + indentation level, a comment or list item should also. + <bullet_list bullet="+"> + <list_item> + <paragraph> + bullet + <paragraph> + This paragraph ends the bullet list item before a block quote. + <block_quote> + <paragraph> + Block quote. +"""], +["""\ ++ bullet + +.. Comments swallow up all indented text following. + + (Therefore this is not a) block quote. + +- bullet + + If we want a block quote after this bullet list item, + we need to use an empty comment: + +.. + + Block quote. +""", +"""\ +<document> + <bullet_list bullet="+"> + <list_item> + <paragraph> + bullet + <comment> + Comments swallow up all indented text following. + \n\ + (Therefore this is not a) block quote. + <bullet_list bullet="-"> + <list_item> + <paragraph> + bullet + <paragraph> + If we want a block quote after this bullet list item, + we need to use an empty comment: + <comment> + <block_quote> + <paragraph> + Block quote. +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_paragraphs.py b/test/test_parsers/test_rst/test_paragraphs.py new file mode 100755 index 000000000..83dc61845 --- /dev/null +++ b/test/test_parsers/test_rst/test_paragraphs.py @@ -0,0 +1,79 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['paragraphs'] = [ +["""\ +A paragraph. +""", +"""\ +<document> + <paragraph> + A paragraph. +"""], +["""\ +Paragraph 1. + +Paragraph 2. +""", +"""\ +<document> + <paragraph> + Paragraph 1. + <paragraph> + Paragraph 2. +"""], +["""\ +Line 1. +Line 2. +Line 3. +""", +"""\ +<document> + <paragraph> + Line 1. + Line 2. + Line 3. +"""], +["""\ +Paragraph 1, Line 1. +Line 2. +Line 3. + +Paragraph 2, Line 1. +Line 2. +Line 3. +""", +"""\ +<document> + <paragraph> + Paragraph 1, Line 1. + Line 2. + Line 3. + <paragraph> + Paragraph 2, Line 1. + Line 2. + Line 3. +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_section_headers.py b/test/test_parsers/test_rst/test_section_headers.py new file mode 100755 index 000000000..53a15d10c --- /dev/null +++ b/test/test_parsers/test_rst/test_section_headers.py @@ -0,0 +1,555 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['section_headers'] = [ +["""\ +Title +===== + +Paragraph. +""", +"""\ +<document> + <section id="title" name="title"> + <title> + Title + <paragraph> + Paragraph. +"""], +["""\ +Title +===== +Paragraph (no blank line). +""", +"""\ +<document> + <section id="title" name="title"> + <title> + Title + <paragraph> + Paragraph (no blank line). +"""], +["""\ +Paragraph. + +Title +===== + +Paragraph. +""", +"""\ +<document> + <paragraph> + Paragraph. + <section id="title" name="title"> + <title> + Title + <paragraph> + Paragraph. +"""], +["""\ +Test unexpected section titles. + + Title + ===== + Paragraph. + + ----- + Title + ----- + Paragraph. +""", +"""\ +<document> + <paragraph> + Test unexpected section titles. + <block_quote> + <system_message level="4" type="SEVERE"> + <paragraph> + Unexpected section title at line 4. + <literal_block> + Title + ===== + <paragraph> + Paragraph. + <system_message level="4" type="SEVERE"> + <paragraph> + Unexpected section title or transition at line 7. + <literal_block> + ----- + <system_message level="4" type="SEVERE"> + <paragraph> + Unexpected section title at line 9. + <literal_block> + Title + ----- + <paragraph> + Paragraph. +"""], +["""\ +Title +==== + +Test short underline. +""", +"""\ +<document> + <system_message level="1" type="INFO"> + <paragraph> + Title underline too short at line 2. + <literal_block> + Title + ==== + <section id="title" name="title"> + <title> + Title + <paragraph> + Test short underline. +"""], +["""\ +===== +Title +===== + +Test overline title. +""", +"""\ +<document> + <section id="title" name="title"> + <title> + Title + <paragraph> + Test overline title. +"""], +["""\ +======= + Title +======= + +Test overline title with inset. +""", +"""\ +<document> + <section id="title" name="title"> + <title> + Title + <paragraph> + Test overline title with inset. +"""], +["""\ +======================== + Test Missing Underline +""", +"""\ +<document> + <system_message level="4" type="SEVERE"> + <paragraph> + Incomplete section title at line 1. + <literal_block> + ======================== + Test Missing Underline +"""], +["""\ +======================== + Test Missing Underline + +""", +"""\ +<document> + <system_message level="4" type="SEVERE"> + <paragraph> + Missing underline for overline at line 1. + <literal_block> + ======================== + Test Missing Underline +"""], +["""\ +======= + Title + +Test missing underline, with paragraph. +""", +"""\ +<document> + <system_message level="4" type="SEVERE"> + <paragraph> + Missing underline for overline at line 1. + <literal_block> + ======= + Title + <paragraph> + Test missing underline, with paragraph. +"""], +["""\ +======= + Long Title +======= + +Test long title and space normalization. +""", +"""\ +<document> + <system_message level="1" type="INFO"> + <paragraph> + Title overline too short at line 1. + <literal_block> + ======= + Long Title + ======= + <section id="long-title" name="long title"> + <title> + Long Title + <paragraph> + Test long title and space normalization. +"""], +["""\ +======= + Title +------- + +Paragraph. +""", +"""\ +<document> + <system_message level="4" type="SEVERE"> + <paragraph> + Title overline & underline mismatch at line 1. + <literal_block> + ======= + Title + ------- + <paragraph> + Paragraph. +"""], +["""\ +======================== + +======================== + +Test missing titles; blank line in-between. + +======================== + +======================== +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Document or section may not begin with a transition (line 1). + <transition> + <system_message level="3" type="ERROR"> + <paragraph> + At least one body element must separate transitions; adjacent transitions at line 3. + <transition> + <paragraph> + Test missing titles; blank line in-between. + <transition> + <transition> + <system_message level="3" type="ERROR"> + <paragraph> + Document or section may not end with a transition (line 9). +"""], +["""\ +======================== +======================== + +Test missing titles; nothing in-between. + +======================== +======================== +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Invalid section title or transition marker at line 1. + <literal_block> + ======================== + ======================== + <paragraph> + Test missing titles; nothing in-between. + <system_message level="3" type="ERROR"> + <paragraph> + Invalid section title or transition marker at line 6. + <literal_block> + ======================== + ======================== +"""], +["""\ +.. Test return to existing, highest-level section (Title 3). + +Title 1 +======= +Paragraph 1. + +Title 2 +------- +Paragraph 2. + +Title 3 +======= +Paragraph 3. + +Title 4 +------- +Paragraph 4. +""", +"""\ +<document> + <comment> + Test return to existing, highest-level section (Title 3). + <section id="title-1" name="title 1"> + <title> + Title 1 + <paragraph> + Paragraph 1. + <section id="title-2" name="title 2"> + <title> + Title 2 + <paragraph> + Paragraph 2. + <section id="title-3" name="title 3"> + <title> + Title 3 + <paragraph> + Paragraph 3. + <section id="title-4" name="title 4"> + <title> + Title 4 + <paragraph> + Paragraph 4. +"""], +["""\ +Test return to existing, highest-level section (Title 3, with overlines). + +======= +Title 1 +======= +Paragraph 1. + +------- +Title 2 +------- +Paragraph 2. + +======= +Title 3 +======= +Paragraph 3. + +------- +Title 4 +------- +Paragraph 4. +""", +"""\ +<document> + <paragraph> + Test return to existing, highest-level section (Title 3, with overlines). + <section id="title-1" name="title 1"> + <title> + Title 1 + <paragraph> + Paragraph 1. + <section id="title-2" name="title 2"> + <title> + Title 2 + <paragraph> + Paragraph 2. + <section id="title-3" name="title 3"> + <title> + Title 3 + <paragraph> + Paragraph 3. + <section id="title-4" name="title 4"> + <title> + Title 4 + <paragraph> + Paragraph 4. +"""], +["""\ +Test return to existing, higher-level section (Title 4). + +Title 1 +======= +Paragraph 1. + +Title 2 +------- +Paragraph 2. + +Title 3 +``````` +Paragraph 3. + +Title 4 +------- +Paragraph 4. +""", +"""\ +<document> + <paragraph> + Test return to existing, higher-level section (Title 4). + <section id="title-1" name="title 1"> + <title> + Title 1 + <paragraph> + Paragraph 1. + <section id="title-2" name="title 2"> + <title> + Title 2 + <paragraph> + Paragraph 2. + <section id="title-3" name="title 3"> + <title> + Title 3 + <paragraph> + Paragraph 3. + <section id="title-4" name="title 4"> + <title> + Title 4 + <paragraph> + Paragraph 4. +"""], +["""\ +Test bad subsection order (Title 4). + +Title 1 +======= +Paragraph 1. + +Title 2 +------- +Paragraph 2. + +Title 3 +======= +Paragraph 3. + +Title 4 +``````` +Paragraph 4. +""", +"""\ +<document> + <paragraph> + Test bad subsection order (Title 4). + <section id="title-1" name="title 1"> + <title> + Title 1 + <paragraph> + Paragraph 1. + <section id="title-2" name="title 2"> + <title> + Title 2 + <paragraph> + Paragraph 2. + <section id="title-3" name="title 3"> + <title> + Title 3 + <paragraph> + Paragraph 3. + <system_message level="4" type="SEVERE"> + <paragraph> + Title level inconsistent at line 15: + <literal_block> + Title 4 + ``````` + <paragraph> + Paragraph 4. +"""], +["""\ +Test bad subsection order (Title 4, with overlines). + +======= +Title 1 +======= +Paragraph 1. + +------- +Title 2 +------- +Paragraph 2. + +======= +Title 3 +======= +Paragraph 3. + +``````` +Title 4 +``````` +Paragraph 4. +""", +"""\ +<document> + <paragraph> + Test bad subsection order (Title 4, with overlines). + <section id="title-1" name="title 1"> + <title> + Title 1 + <paragraph> + Paragraph 1. + <section id="title-2" name="title 2"> + <title> + Title 2 + <paragraph> + Paragraph 2. + <section id="title-3" name="title 3"> + <title> + Title 3 + <paragraph> + Paragraph 3. + <system_message level="4" type="SEVERE"> + <paragraph> + Title level inconsistent at line 19: + <literal_block> + ``````` + Title 4 + ``````` + <paragraph> + Paragraph 4. +"""], +["""\ +Title containing *inline* ``markup`` +==================================== + +Paragraph. +""", +"""\ +<document> + <section id="title-containing-inline-markup" name="title containing inline markup"> + <title> + Title containing \n\ + <emphasis> + inline + \n\ + <literal> + markup + <paragraph> + Paragraph. +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_substitutions.py b/test/test_parsers/test_rst/test_substitutions.py new file mode 100755 index 000000000..198e40596 --- /dev/null +++ b/test/test_parsers/test_rst/test_substitutions.py @@ -0,0 +1,192 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['substitution_definitions'] = [ +["""\ +Here's an image substitution definition: + +.. |symbol| image:: symbol.png +""", +"""\ +<document> + <paragraph> + Here's an image substitution definition: + <substitution_definition name="symbol"> + <image alt="symbol" uri="symbol.png"> +"""], +["""\ +Embedded directive starts on the next line: + +.. |symbol| + image:: symbol.png +""", +"""\ +<document> + <paragraph> + Embedded directive starts on the next line: + <substitution_definition name="symbol"> + <image alt="symbol" uri="symbol.png"> +"""], +["""\ +Here's a series of substitution definitions: + +.. |symbol 1| image:: symbol1.png +.. |SYMBOL 2| image:: symbol2.png + :height: 50 + :width: 100 +.. |symbol 3| image:: symbol3.png +""", +"""\ +<document> + <paragraph> + Here's a series of substitution definitions: + <substitution_definition name="symbol 1"> + <image alt="symbol 1" uri="symbol1.png"> + <substitution_definition name="symbol 2"> + <image alt="SYMBOL 2" height="50" uri="symbol2.png" width="100"> + <substitution_definition name="symbol 3"> + <image alt="symbol 3" uri="symbol3.png"> +"""], +["""\ +.. |very long substitution text, + split across lines| image:: symbol.png +""", +"""\ +<document> + <substitution_definition name="very long substitution text, split across lines"> + <image alt="very long substitution text, split across lines" uri="symbol.png"> +"""], +["""\ +.. |symbol 1| image:: symbol.png + +Followed by a paragraph. + +.. |symbol 2| image:: symbol.png + + Followed by a block quote. +""", +"""\ +<document> + <substitution_definition name="symbol 1"> + <image alt="symbol 1" uri="symbol.png"> + <paragraph> + Followed by a paragraph. + <substitution_definition name="symbol 2"> + <image alt="symbol 2" uri="symbol.png"> + <block_quote> + <paragraph> + Followed by a block quote. +"""], +["""\ +Here are some duplicate substitution definitions: + +.. |symbol| image:: symbol.png +.. |symbol| image:: symbol.png +""", +"""\ +<document> + <paragraph> + Here are some duplicate substitution definitions: + <substitution_definition dupname="symbol"> + <image alt="symbol" uri="symbol.png"> + <system_message level="3" type="ERROR"> + <paragraph> + Duplicate substitution definition name: "symbol". + <substitution_definition name="symbol"> + <image alt="symbol" uri="symbol.png"> +"""], +["""\ +Here are some bad cases: + +.. |symbol| image:: symbol.png +No blank line after. + +.. |empty| + +.. |unknown| directive:: symbol.png + +.. |invalid 1| there's no directive here +.. |invalid 2| there's no directive here + With some block quote text, line 1. + And some more, line 2. + +.. |invalid 3| there's no directive here + +.. | bad name | bad data +""", +"""\ +<document> + <paragraph> + Here are some bad cases: + <substitution_definition name="symbol"> + <image alt="symbol" uri="symbol.png"> + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 4. + <paragraph> + No blank line after. + <system_message level="2" type="WARNING"> + <paragraph> + Substitution definition "empty" missing contents at line 6. + <literal_block> + .. |empty| + <system_message level="3" type="ERROR"> + <paragraph> + Unknown directive type "directive" at line 8. + <literal_block> + directive:: symbol.png + <system_message level="2" type="WARNING"> + <paragraph> + Substitution definition "unknown" empty or invalid at line 8. + <literal_block> + .. |unknown| directive:: symbol.png + <system_message level="2" type="WARNING"> + <paragraph> + Substitution definition "invalid 1" empty or invalid at line 10. + <literal_block> + .. |invalid 1| there's no directive here + <system_message level="2" type="WARNING"> + <paragraph> + Substitution definition "invalid 2" empty or invalid at line 11. + <literal_block> + .. |invalid 2| there's no directive here + With some block quote text, line 1. + And some more, line 2. + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 12. + <block_quote> + <paragraph> + With some block quote text, line 1. + And some more, line 2. + <system_message level="2" type="WARNING"> + <paragraph> + Substitution definition "invalid 3" empty or invalid at line 15. + <literal_block> + .. |invalid 3| there's no directive here + <comment> + | bad name | bad data +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_tables.py b/test/test_parsers/test_rst/test_tables.py new file mode 100755 index 000000000..9def8df40 --- /dev/null +++ b/test/test_parsers/test_rst/test_tables.py @@ -0,0 +1,564 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['tables'] = [ +["""\ ++-------------------------------------+ +| A table with one cell and one line. | ++-------------------------------------+ +""", +"""\ +<document> + <table> + <tgroup cols="1"> + <colspec colwidth="37"> + <tbody> + <row> + <entry> + <paragraph> + A table with one cell and one line. +"""], +["""\ ++-----------------------+ +| A table with one cell | +| and two lines. | ++-----------------------+ +""", +"""\ +<document> + <table> + <tgroup cols="1"> + <colspec colwidth="23"> + <tbody> + <row> + <entry> + <paragraph> + A table with one cell + and two lines. +"""], +["""\ ++-----------------------+ +| A malformed table. | ++-----------------------+ +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Malformed table at line 1; formatting as a literal block. + <literal_block> + +-----------------------+ + | A malformed table. | + +-----------------------+ +"""], +["""\ ++------------------------+ +| A well-formed | table. | ++------------------------+ + ++------------------------+ +| This +----------+ too! | ++------------------------+ +""", +"""\ +<document> + <table> + <tgroup cols="1"> + <colspec colwidth="24"> + <tbody> + <row> + <entry> + <paragraph> + A well-formed | table. + <table> + <tgroup cols="1"> + <colspec colwidth="24"> + <tbody> + <row> + <entry> + <paragraph> + This +----------+ too! +"""], +["""\ ++--------------+--------------+ +| A table with | two columns. | ++--------------+--------------+ +""", +"""\ +<document> + <table> + <tgroup cols="2"> + <colspec colwidth="14"> + <colspec colwidth="14"> + <tbody> + <row> + <entry> + <paragraph> + A table with + <entry> + <paragraph> + two columns. +"""], +["""\ ++--------------+ +| A table with | ++--------------+ +| two rows. | ++--------------+ +""", +"""\ +<document> + <table> + <tgroup cols="1"> + <colspec colwidth="14"> + <tbody> + <row> + <entry> + <paragraph> + A table with + <row> + <entry> + <paragraph> + two rows. +"""], +["""\ ++--------------+-------------+ +| A table with | two columns | ++--------------+-------------+ +| and | two rows. | ++--------------+-------------+ +""", +"""\ +<document> + <table> + <tgroup cols="2"> + <colspec colwidth="14"> + <colspec colwidth="13"> + <tbody> + <row> + <entry> + <paragraph> + A table with + <entry> + <paragraph> + two columns + <row> + <entry> + <paragraph> + and + <entry> + <paragraph> + two rows. +"""], +["""\ ++--------------+---------------+ +| A table with | two columns, | ++--------------+---------------+ +| two rows, and a column span. | ++------------------------------+ +""", +"""\ +<document> + <table> + <tgroup cols="2"> + <colspec colwidth="14"> + <colspec colwidth="15"> + <tbody> + <row> + <entry> + <paragraph> + A table with + <entry> + <paragraph> + two columns, + <row> + <entry morecols="1"> + <paragraph> + two rows, and a column span. +"""], +["""\ ++--------------------------+ +| A table with three rows, | ++------------+-------------+ +| and two | columns. | ++------------+-------------+ +| First and last rows | +| contains column spans. | ++--------------------------+ +""", +"""\ +<document> + <table> + <tgroup cols="2"> + <colspec colwidth="12"> + <colspec colwidth="13"> + <tbody> + <row> + <entry morecols="1"> + <paragraph> + A table with three rows, + <row> + <entry> + <paragraph> + and two + <entry> + <paragraph> + columns. + <row> + <entry morecols="1"> + <paragraph> + First and last rows + contains column spans. +"""], +["""\ ++--------------+--------------+ +| A table with | two columns, | ++--------------+ and a row | +| two rows, | span. | ++--------------+--------------+ +""", +"""\ +<document> + <table> + <tgroup cols="2"> + <colspec colwidth="14"> + <colspec colwidth="14"> + <tbody> + <row> + <entry> + <paragraph> + A table with + <entry morerows="1"> + <paragraph> + two columns, + and a row + span. + <row> + <entry> + <paragraph> + two rows, +"""], +["""\ ++------------+-------------+---------------+ +| A table | two rows in | and row spans | +| with three +-------------+ to left and | +| columns, | the middle, | right. | ++------------+-------------+---------------+ +""", +"""\ +<document> + <table> + <tgroup cols="3"> + <colspec colwidth="12"> + <colspec colwidth="13"> + <colspec colwidth="15"> + <tbody> + <row> + <entry morerows="1"> + <paragraph> + A table + with three + columns, + <entry> + <paragraph> + two rows in + <entry morerows="1"> + <paragraph> + and row spans + to left and + right. + <row> + <entry> + <paragraph> + the middle, +"""], +["""\ +Complex spanning pattern (no edge knows all rows/cols): + ++-----------+-------------------------+ +| W/NW cell | N/NE cell | +| +-------------+-----------+ +| | Middle cell | E/SE cell | ++-----------+-------------+ | +| S/SE cell | | ++-------------------------+-----------+ +""", +"""\ +<document> + <paragraph> + Complex spanning pattern (no edge knows all rows/cols): + <table> + <tgroup cols="3"> + <colspec colwidth="11"> + <colspec colwidth="13"> + <colspec colwidth="11"> + <tbody> + <row> + <entry morerows="1"> + <paragraph> + W/NW cell + <entry morecols="1"> + <paragraph> + N/NE cell + <row> + <entry> + <paragraph> + Middle cell + <entry morerows="1"> + <paragraph> + E/SE cell + <row> + <entry morecols="1"> + <paragraph> + S/SE cell +"""], +["""\ ++------------------------+------------+----------+----------+ +| Header row, column 1 | Header 2 | Header 3 | Header 4 | ++========================+============+==========+==========+ +| body row 1, column 1 | column 2 | column 3 | column 4 | ++------------------------+------------+----------+----------+ +| body row 2 | Cells may span columns. | ++------------------------+------------+---------------------+ +| body row 3 | Cells may | - Table cells | ++------------------------+ span rows. | - contain | +| body row 4 | | - body elements. | ++------------------------+------------+---------------------+ +""", +"""\ +<document> + <table> + <tgroup cols="4"> + <colspec colwidth="24"> + <colspec colwidth="12"> + <colspec colwidth="10"> + <colspec colwidth="10"> + <thead> + <row> + <entry> + <paragraph> + Header row, column 1 + <entry> + <paragraph> + Header 2 + <entry> + <paragraph> + Header 3 + <entry> + <paragraph> + Header 4 + <tbody> + <row> + <entry> + <paragraph> + body row 1, column 1 + <entry> + <paragraph> + column 2 + <entry> + <paragraph> + column 3 + <entry> + <paragraph> + column 4 + <row> + <entry> + <paragraph> + body row 2 + <entry morecols="2"> + <paragraph> + Cells may span columns. + <row> + <entry> + <paragraph> + body row 3 + <entry morerows="1"> + <paragraph> + Cells may + span rows. + <entry morecols="1" morerows="1"> + <bullet_list bullet="-"> + <list_item> + <paragraph> + Table cells + <list_item> + <paragraph> + contain + <list_item> + <paragraph> + body elements. + <row> + <entry> + <paragraph> + body row 4 +"""], +["""\ ++-----------------+--------+ +| A simple table | cell 2 | ++-----------------+--------+ +| cell 3 | cell 4 | ++-----------------+--------+ +No blank line after table. +""", +"""\ +<document> + <table> + <tgroup cols="2"> + <colspec colwidth="17"> + <colspec colwidth="8"> + <tbody> + <row> + <entry> + <paragraph> + A simple table + <entry> + <paragraph> + cell 2 + <row> + <entry> + <paragraph> + cell 3 + <entry> + <paragraph> + cell 4 + <system_message level="2" type="WARNING"> + <paragraph> + Blank line required after table at line 6. + <paragraph> + No blank line after table. +"""], +["""\ ++-----------------+--------+ +| A simple table | cell 2 | ++-----------------+--------+ +| cell 3 | cell 4 | ++-----------------+--------+ + Unexpected indent and no blank line after table. +""", +"""\ +<document> + <table> + <tgroup cols="2"> + <colspec colwidth="17"> + <colspec colwidth="8"> + <tbody> + <row> + <entry> + <paragraph> + A simple table + <entry> + <paragraph> + cell 2 + <row> + <entry> + <paragraph> + cell 3 + <entry> + <paragraph> + cell 4 + <system_message level="3" type="ERROR"> + <paragraph> + Unexpected indentation at line 6. + <system_message level="2" type="WARNING"> + <paragraph> + Blank line required after table at line 6. + <block_quote> + <paragraph> + Unexpected indent and no blank line after table. +"""], +["""\ ++--------------+-------------+ +| A bad table. | | ++--------------+ | +| Cells must be rectangles. | ++----------------------------+ +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Malformed table at line 1; formatting as a literal block. + Malformed table; parse incomplete. + <literal_block> + +--------------+-------------+ + | A bad table. | | + +--------------+ | + | Cells must be rectangles. | + +----------------------------+ +"""], +["""\ ++------------------------------+ +| This table contains another. | +| | +| +-------------------------+ | +| | A table within a table. | | +| +-------------------------+ | ++------------------------------+ +""", +"""\ +<document> + <table> + <tgroup cols="1"> + <colspec colwidth="30"> + <tbody> + <row> + <entry> + <paragraph> + This table contains another. + <table> + <tgroup cols="1"> + <colspec colwidth="25"> + <tbody> + <row> + <entry> + <paragraph> + A table within a table. +"""], +["""\ ++------------------+--------+ +| A simple table | | ++------------------+--------+ +| with empty cells | | ++------------------+--------+ +""", +"""\ +<document> + <table> + <tgroup cols="2"> + <colspec colwidth="18"> + <colspec colwidth="8"> + <tbody> + <row> + <entry> + <paragraph> + A simple table + <entry> + <row> + <entry> + <paragraph> + with empty cells + <entry> +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_targets.py b/test/test_parsers/test_rst/test_targets.py new file mode 100755 index 000000000..b0d2f57f5 --- /dev/null +++ b/test/test_parsers/test_rst/test_targets.py @@ -0,0 +1,429 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for states.py. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +totest['targets'] = [ +["""\ +.. _target: + +(Internal hyperlink target.) +""", +"""\ +<document> + <target id="target" name="target"> + <paragraph> + (Internal hyperlink target.) +"""], +["""\ +External hyperlink targets: + +.. _one-liner: http://structuredtext.sourceforge.net + +.. _starts-on-this-line: http:// + structuredtext. + sourceforge.net + +.. _entirely-below: + http://structuredtext. + sourceforge.net + +.. _not-indirect: uri\_ +""", +"""\ +<document> + <paragraph> + External hyperlink targets: + <target id="one-liner" name="one-liner" refuri="http://structuredtext.sourceforge.net"> + <target id="starts-on-this-line" name="starts-on-this-line" refuri="http://structuredtext.sourceforge.net"> + <target id="entirely-below" name="entirely-below" refuri="http://structuredtext.sourceforge.net"> + <target id="not-indirect" name="not-indirect" refuri="uri_"> +"""], +["""\ +Indirect hyperlink targets: + +.. _target1: reference_ + +.. _target2: `phrase-link reference`_ +""", +"""\ +<document> + <paragraph> + Indirect hyperlink targets: + <target id="target1" name="target1" refname="reference"> + <target id="target2" name="target2" refname="phrase-link reference"> +"""], +["""\ +.. _target1: Not a proper hyperlink target + +.. _target2: Although it ends with an underscore, this is not a phrase-link_ + +.. _target3: A multi-line verson of something + ending with an underscore, but not a phrase-link_ +""", +"""\ +<document> + <system_message level="2" type="WARNING"> + <paragraph> + Hyperlink target at line 1 contains whitespace. Perhaps a footnote was intended? + <literal_block> + .. _target1: Not a proper hyperlink target + <system_message level="2" type="WARNING"> + <paragraph> + Hyperlink target at line 3 contains whitespace. Perhaps a footnote was intended? + <literal_block> + .. _target2: Although it ends with an underscore, this is not a phrase-link_ + <system_message level="2" type="WARNING"> + <paragraph> + Hyperlink target at line 5 contains whitespace. Perhaps a footnote was intended? + <literal_block> + .. _target3: A multi-line verson of something + ending with an underscore, but not a phrase-link_ +"""], +["""\ +.. __: Not a proper hyperlink target + +__ Although it ends with an underscore, this is not a phrase-link_ + +__ A multi-line verson of something + ending with an underscore, but not a phrase-link_ +""", +"""\ +<document> + <system_message level="2" type="WARNING"> + <paragraph> + Hyperlink target at line 1 contains whitespace. Perhaps a footnote was intended? + <literal_block> + .. __: Not a proper hyperlink target + <system_message level="2" type="WARNING"> + <paragraph> + Anonymous hyperlink target at line 3 contains whitespace. Perhaps a footnote was intended? + <literal_block> + __ Although it ends with an underscore, this is not a phrase-link_ + <system_message level="2" type="WARNING"> + <paragraph> + Anonymous hyperlink target at line 5 contains whitespace. Perhaps a footnote was intended? + <literal_block> + __ A multi-line verson of something + ending with an underscore, but not a phrase-link_ +"""], +["""\ +.. _a long target name: + +.. _`a target name: including a colon (quoted)`: + +.. _a target name\: including a colon (escaped): +""", +"""\ +<document> + <target id="a-long-target-name" name="a long target name"> + <target id="a-target-name-including-a-colon-quoted" name="a target name: including a colon (quoted)"> + <target id="a-target-name-including-a-colon-escaped" name="a target name: including a colon (escaped)"> +"""], +["""\ +.. _a very long target name, + split across lines: +.. _`and another, + with backquotes`: +""", +"""\ +<document> + <target id="a-very-long-target-name-split-across-lines" name="a very long target name, split across lines"> + <target id="and-another-with-backquotes" name="and another, with backquotes"> +"""], +["""\ +External hyperlink: + +.. _target: http://www.python.org/ +""", +"""\ +<document> + <paragraph> + External hyperlink: + <target id="target" name="target" refuri="http://www.python.org/"> +"""], +["""\ +Duplicate external targets (different URIs): + +.. _target: first + +.. _target: second +""", +"""\ +<document> + <paragraph> + Duplicate external targets (different URIs): + <target dupname="target" id="target" refuri="first"> + <system_message backrefs="id1" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "target". + <target dupname="target" id="id1" refuri="second"> +"""], +["""\ +Duplicate external targets (same URIs): + +.. _target: first + +.. _target: first +""", +"""\ +<document> + <paragraph> + Duplicate external targets (same URIs): + <target dupname="target" id="target" refuri="first"> + <system_message backrefs="id1" level="1" type="INFO"> + <paragraph> + Duplicate explicit target name: "target". + <target id="id1" name="target" refuri="first"> +"""], +["""\ +Duplicate implicit targets. + +Title +===== + +Paragraph. + +Title +===== + +Paragraph. +""", +"""\ +<document> + <paragraph> + Duplicate implicit targets. + <section dupname="title" id="title"> + <title> + Title + <paragraph> + Paragraph. + <section dupname="title" id="id1"> + <title> + Title + <system_message backrefs="id1" level="1" type="INFO"> + <paragraph> + Duplicate implicit target name: "title". + <paragraph> + Paragraph. +"""], +["""\ +Duplicate implicit/explicit targets. + +Title +===== + +.. _title: + +Paragraph. +""", +"""\ +<document> + <paragraph> + Duplicate implicit/explicit targets. + <section dupname="title" id="title"> + <title> + Title + <system_message backrefs="id1" level="1" type="INFO"> + <paragraph> + Duplicate implicit target name: "title". + <target id="id1" name="title"> + <paragraph> + Paragraph. +"""], +["""\ +Duplicate explicit targets. + +.. _title: + +First. + +.. _title: + +Second. + +.. _title: + +Third. +""", +"""\ +<document> + <paragraph> + Duplicate explicit targets. + <target dupname="title" id="title"> + <paragraph> + First. + <system_message backrefs="id1" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "title". + <target dupname="title" id="id1"> + <paragraph> + Second. + <system_message backrefs="id2" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "title". + <target dupname="title" id="id2"> + <paragraph> + Third. +"""], +["""\ +Duplicate targets: + +Target +====== + +Implicit section header target. + +.. [target] Citation target. + +.. [#target] Autonumber-labeled footnote target. + +.. _target: + +Explicit internal target. + +.. _target: Explicit_external_target +""", +"""\ +<document> + <paragraph> + Duplicate targets: + <section dupname="target" id="target"> + <title> + Target + <paragraph> + Implicit section header target. + <citation dupname="target" id="id1"> + <label> + target + <system_message backrefs="id1" level="1" type="INFO"> + <paragraph> + Duplicate implicit target name: "target". + <paragraph> + Citation target. + <footnote auto="1" dupname="target" id="id2"> + <system_message backrefs="id2" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "target". + <paragraph> + Autonumber-labeled footnote target. + <system_message backrefs="id3" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "target". + <target dupname="target" id="id3"> + <paragraph> + Explicit internal target. + <system_message backrefs="id4" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "target". + <target dupname="target" id="id4" refuri="Explicit_external_target"> +"""], +] + +totest['anonymous_targets'] = [ +["""\ +Anonymous external hyperlink target: + +.. __: http://w3c.org/ +""", +"""\ +<document> + <paragraph> + Anonymous external hyperlink target: + <target anonymous="1" id="id1" refuri="http://w3c.org/"> +"""], +["""\ +Anonymous external hyperlink target: + +__ http://w3c.org/ +""", +"""\ +<document> + <paragraph> + Anonymous external hyperlink target: + <target anonymous="1" id="id1" refuri="http://w3c.org/"> +"""], +["""\ +Anonymous indirect hyperlink target: + +.. __: reference_ +""", +"""\ +<document> + <paragraph> + Anonymous indirect hyperlink target: + <target anonymous="1" id="id1" refname="reference"> +"""], +["""\ +Anonymous indirect hyperlink targets: + +__ reference_ +__ `a very long + reference`_ +""", +"""\ +<document> + <paragraph> + Anonymous indirect hyperlink targets: + <target anonymous="1" id="id1" refname="reference"> + <target anonymous="1" id="id2" refname="a very long reference"> +"""], +["""\ +Mixed anonymous & named indirect hyperlink targets: + +__ reference_ +.. __: reference_ +__ reference_ +.. _target1: reference_ +no blank line + +.. _target2: reference_ +__ reference_ +.. __: reference_ +__ reference_ +no blank line +""", +"""\ +<document> + <paragraph> + Mixed anonymous & named indirect hyperlink targets: + <target anonymous="1" id="id1" refname="reference"> + <target anonymous="1" id="id2" refname="reference"> + <target anonymous="1" id="id3" refname="reference"> + <target id="target1" name="target1" refname="reference"> + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 7. + <paragraph> + no blank line + <target id="target2" name="target2" refname="reference"> + <target anonymous="1" id="id4" refname="reference"> + <target anonymous="1" id="id5" refname="reference"> + <target anonymous="1" id="id6" refname="reference"> + <system_message level="2" type="WARNING"> + <paragraph> + Unindent without blank line at line 13. + <paragraph> + no blank line +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_parsers/test_rst/test_transitions.py b/test/test_parsers/test_rst/test_transitions.py new file mode 100755 index 000000000..b78f794ee --- /dev/null +++ b/test/test_parsers/test_rst/test_transitions.py @@ -0,0 +1,144 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for transition markers. +""" + +import DocutilsTestSupport + +def suite(): + s = DocutilsTestSupport.ParserTestSuite() + s.generateTests(totest) + return s + +totest = {} + +# See DocutilsTestSupport.ParserTestSuite.generateTests for a +# description of the 'totest' data structure. +totest['transitions'] = [ +["""\ +Test transition markers. + +-------- + +Paragraph +""", +"""\ +<document> + <paragraph> + Test transition markers. + <transition> + <paragraph> + Paragraph +"""], +["""\ +Section 1 +========= +First text division of section 1. + +-------- + +Second text division of section 1. + +Section 2 +--------- +Paragraph 2 in section 2. +""", +"""\ +<document> + <section id="section-1" name="section 1"> + <title> + Section 1 + <paragraph> + First text division of section 1. + <transition> + <paragraph> + Second text division of section 1. + <section id="section-2" name="section 2"> + <title> + Section 2 + <paragraph> + Paragraph 2 in section 2. +"""], +["""\ +-------- + +A section or document may not begin with a transition. + +The DTD specifies that two transitions may not +be adjacent: + +-------- + +-------- + +-------- + +The DTD also specifies that a section or document +may not end with a transition. + +-------- +""", +"""\ +<document> + <system_message level="3" type="ERROR"> + <paragraph> + Document or section may not begin with a transition (line 1). + <transition> + <paragraph> + A section or document may not begin with a transition. + <paragraph> + The DTD specifies that two transitions may not + be adjacent: + <transition> + <system_message level="3" type="ERROR"> + <paragraph> + At least one body element must separate transitions; adjacent transitions at line 10. + <transition> + <system_message level="3" type="ERROR"> + <paragraph> + At least one body element must separate transitions; adjacent transitions at line 12. + <transition> + <paragraph> + The DTD also specifies that a section or document + may not end with a transition. + <transition> + <system_message level="3" type="ERROR"> + <paragraph> + Document or section may not end with a transition (line 17). +"""], +["""\ +Test unexpected transition markers. + + Block quote. + + -------- + + Paragraph. +""", +"""\ +<document> + <paragraph> + Test unexpected transition markers. + <block_quote> + <paragraph> + Block quote. + <system_message level="4" type="SEVERE"> + <paragraph> + Unexpected section title or transition at line 5. + <literal_block> + -------- + <paragraph> + Paragraph. +"""], +] + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_statemachine.py b/test/test_statemachine.py new file mode 100755 index 000000000..a73a17197 --- /dev/null +++ b/test/test_statemachine.py @@ -0,0 +1,296 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Test module for statemachine.py. +""" + +import unittest, sys, re +from DocutilsTestSupport import statemachine +try: + import mypdb as pdb +except: + import pdb +pdb.tracenow = 0 + +debug = 0 +testtext = statemachine.string2lines("""\ +First paragraph. + +- This is a bullet list. First list item. + Second line of first para. + + Second para. + + block quote + +- Second list item. Example:: + + a + literal + block + +Last paragraph.""") +expected = ('StateMachine1 text1 blank1 bullet1 knownindent1 ' + 'StateMachine2 text2 text2 blank2 text2 blank2 indent2 ' + 'StateMachine3 text3 blank3 finished3 finished2 ' + 'bullet1 knownindent1 ' + 'StateMachine2 text2 blank2 literalblock2(4) finished2 ' + 'text1 finished1').split() +para1 = testtext[:2] +item1 = [line[2:] for line in testtext[2:9]] +item2 = [line[2:] for line in testtext[9:-1]] +lbindent = 6 +literalblock = [line[lbindent:] for line in testtext[11:-1]] +para2 = testtext[-1] + + +class MockState(statemachine.StateWS): + + patterns = {'bullet': re.compile(r'- '), + 'text': ''} + initialtransitions = ['bullet', ['text']] + levelholder = [0] + + def bof(self, context): + self.levelholder[0] += 1 + self.level = self.levelholder[0] + if self.debug: print >>sys.stderr, 'StateMachine%s' % self.level + return [], ['StateMachine%s' % self.level] + + def blank(self, match, context, nextstate): + result = ['blank%s' % self.level] + if self.debug: print >>sys.stderr, 'blank%s' % self.level + if context and context[-1] and context[-1][-2:] == '::': + result.extend(self.literalblock()) + return [], None, result + + def indent(self, match, context, nextstate): + if self.debug: print >>sys.stderr, 'indent%s' % self.level + context, nextstate, result = statemachine.StateWS.indent( + self, match, context, nextstate) + return context, nextstate, ['indent%s' % self.level] + result + + def knownindent(self, match, context, nextstate): + if self.debug: print >>sys.stderr, 'knownindent%s' % self.level + context, nextstate, result = statemachine.StateWS.knownindent( + self, match, context, nextstate) + return context, nextstate, ['knownindent%s' % self.level] + result + + def bullet(self, match, context, nextstate): + if self.debug: print >>sys.stderr, 'bullet%s' % self.level + context, nextstate, result \ + = self.knownindent(match, context, nextstate) + return [], nextstate, ['bullet%s' % self.level] + result + + def text(self, match, context, nextstate): + if self.debug: print >>sys.stderr, 'text%s' % self.level + return [match.string], nextstate, ['text%s' % self.level] + + def literalblock(self): + indented, indent, offset, good = self.statemachine.getindented() + if self.debug: print >>sys.stderr, 'literalblock%s(%s)' % (self.level, + indent) + return ['literalblock%s(%s)' % (self.level, indent)] + + def eof(self, context): + self.levelholder[0] -= 1 + if self.debug: print >>sys.stderr, 'finished%s' % self.level + return ['finished%s' % self.level] + + +class EmptySMTests(unittest.TestCase): + + def setUp(self): + self.sm = statemachine.StateMachine( + stateclasses=[], initialstate='State') + self.sm.debug = debug + + def test_addstate(self): + self.sm.addstate(statemachine.State) + self.assert_(len(self.sm.states) == 1) + self.assertRaises(statemachine.DuplicateStateError, self.sm.addstate, + statemachine.State) + self.sm.addstate(statemachine.StateWS) + self.assert_(len(self.sm.states) == 2) + + def test_addstates(self): + self.sm.addstates((statemachine.State, statemachine.StateWS)) + self.assertEqual(len(self.sm.states), 2) + + def test_getstate(self): + self.assertRaises(statemachine.UnknownStateError, self.sm.getstate) + self.sm.addstates((statemachine.State, statemachine.StateWS)) + self.assertRaises(statemachine.UnknownStateError, self.sm.getstate, + 'unknownState') + self.assert_(isinstance(self.sm.getstate('State'), + statemachine.State)) + self.assert_(isinstance(self.sm.getstate('StateWS'), + statemachine.State)) + self.assertEqual(self.sm.currentstate, 'StateWS') + + +class EmptySMWSTests(EmptySMTests): + + def setUp(self): + self.sm = statemachine.StateMachineWS( + stateclasses=[], initialstate='State') + self.sm.debug = debug + + +class SMWSTests(unittest.TestCase): + + def setUp(self): + self.sm = statemachine.StateMachineWS([MockState], 'MockState', + debug=debug) + self.sm.debug = debug + self.sm.states['MockState'].levelholder[0] = 0 + + def tearDown(self): + self.sm.unlink() + + def test___init__(self): + self.assertEquals(self.sm.states.keys(), ['MockState']) + self.assertEquals(len(self.sm.states['MockState'].transitions), 2) + + def test_getindented(self): + self.sm.inputlines = testtext + self.sm.lineoffset = -1 + self.sm.nextline(3) + indented, offset, good = self.sm.getknownindented(2) + self.assertEquals(indented, item1) + self.assertEquals(offset, len(para1)) + self.failUnless(good) + self.sm.nextline() + indented, offset, good = self.sm.getknownindented(2) + self.assertEquals(indented, item2) + self.assertEquals(offset, len(para1) + len(item1)) + self.failUnless(good) + self.sm.previousline(3) + if self.sm.debug: + print '\ntest_getindented: self.sm.line:\n', self.sm.line + indented, indent, offset, good = self.sm.getindented() + if self.sm.debug: + print '\ntest_getindented: indented:\n', indented + self.assertEquals(indent, lbindent) + self.assertEquals(indented, literalblock) + self.assertEquals(offset, (len(para1) + len(item1) + len(item2) + - len(literalblock))) + self.failUnless(good) + + def test_gettextblock(self): + self.sm.inputlines = testtext + self.sm.lineoffset = -1 + self.sm.nextline() + textblock = self.sm.gettextblock() + self.assertEquals(textblock, testtext[:1]) + self.sm.nextline(2) + textblock = self.sm.gettextblock() + self.assertEquals(textblock, testtext[2:4]) + + def test_getunindented(self): + self.sm.inputlines = testtext + self.sm.lineoffset = -1 + self.sm.nextline() + textblock = self.sm.getunindented() + self.assertEquals(textblock, testtext[:1]) + self.sm.nextline() + self.assertRaises(statemachine.UnexpectedIndentationError, + self.sm.getunindented) + + def test_run(self): + self.assertEquals(self.sm.run(testtext), expected) + + +class EmptyClass: + pass + + +class EmptyStateTests(unittest.TestCase): + + def setUp(self): + self.state = statemachine.State(EmptyClass(), debug=debug) + self.state.patterns = {'nop': 'dummy', + 'nop2': 'dummy', + 'nop3': 'dummy', + 'bogus': 'dummy'} + self.state.nop2 = self.state.nop3 = self.state.nop + + def test_addtransitions(self): + self.assertEquals(len(self.state.transitions), 0) + self.state.addtransitions(['None'], {'None': None}) + self.assertEquals(len(self.state.transitions), 1) + self.assertRaises(statemachine.UnknownTransitionError, + self.state.addtransitions, ['bogus'], {}) + self.assertRaises(statemachine.DuplicateTransitionError, + self.state.addtransitions, ['None'], {'None': None}) + + def test_addtransition(self): + self.assertEquals(len(self.state.transitions), 0) + self.state.addtransition('None', None) + self.assertEquals(len(self.state.transitions), 1) + self.assertRaises(statemachine.DuplicateTransitionError, + self.state.addtransition, 'None', None) + + def test_removetransition(self): + self.assertEquals(len(self.state.transitions), 0) + self.state.addtransition('None', None) + self.assertEquals(len(self.state.transitions), 1) + self.state.removetransition('None') + self.assertEquals(len(self.state.transitions), 0) + self.assertRaises(statemachine.UnknownTransitionError, + self.state.removetransition, 'None') + + def test_maketransition(self): + dummy = re.compile('dummy') + self.assertEquals(self.state.maketransition('nop', 'bogus'), + (dummy, self.state.nop, 'bogus')) + self.assertEquals(self.state.maketransition('nop'), + (dummy, self.state.nop, + self.state.__class__.__name__)) + self.assertRaises(statemachine.TransitionPatternNotFound, + self.state.maketransition, 'None') + self.assertRaises(statemachine.TransitionMethodNotFound, + self.state.maketransition, 'bogus') + + def test_maketransitions(self): + dummy = re.compile('dummy') + self.assertEquals(self.state.maketransitions(('nop', ['nop2'], + ('nop3', 'bogus'))), + (['nop', 'nop2', 'nop3'], + {'nop': (dummy, self.state.nop, + self.state.__class__.__name__), + 'nop2': (dummy, self.state.nop2, + self.state.__class__.__name__), + 'nop3': (dummy, self.state.nop3, 'bogus')})) + + +class MiscTests(unittest.TestCase): + + s2l_string = "hello\tthere\thow are\tyou?\n\tI'm fine\tthanks.\n" + s2l_expected = ['hello there how are you?', + " I'm fine thanks."] + indented_string = """\ + a + literal + block""" + + def test_string2lines(self): + self.assertEquals(statemachine.string2lines(self.s2l_string), + self.s2l_expected) + + def test_extractindented(self): + block = statemachine.string2lines(self.indented_string) + self.assertEquals(statemachine.extractindented(block), + ([s[6:] for s in block], 6, 1)) + self.assertEquals(statemachine.extractindented(self.s2l_expected), + ([], 0, 0)) + + +if __name__ == '__main__': + unittest.main() diff --git a/test/test_transforms/test_contents2.py b/test/test_transforms/test_contents2.py new file mode 100755 index 000000000..f3dccae99 --- /dev/null +++ b/test/test_transforms/test_contents2.py @@ -0,0 +1,262 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for docutils.transforms.components.Contents. +""" + +import DocutilsTestSupport +from docutils.transforms.universal import LastReaderPending +from docutils.parsers.rst import Parser + + +def suite(): + parser = Parser() + s = DocutilsTestSupport.TransformTestSuite(parser) + s.generateTests(totest) + return s + +totest = {} + +totest['tables_of_contents'] = ((LastReaderPending,), [ +["""\ +.. contents:: + +Title 1 +======= +Paragraph 1. + +Title 2 +------- +Paragraph 2. + +Title 3 +``````` +Paragraph 3. + +Title 4 +------- +Paragraph 4. +""", +"""\ +<document> + <topic class="contents"> + <title> + Contents + <bullet_list> + <list_item id="id1"> + <paragraph> + <reference refid="title-1"> + Title 1 + <bullet_list> + <list_item id="id2"> + <paragraph> + <reference refid="title-2"> + Title 2 + <bullet_list> + <list_item id="id3"> + <paragraph> + <reference refid="title-3"> + Title 3 + <list_item id="id4"> + <paragraph> + <reference refid="title-4"> + Title 4 + <section id="title-1" name="title 1"> + <title refid="id1"> + Title 1 + <paragraph> + Paragraph 1. + <section id="title-2" name="title 2"> + <title refid="id2"> + Title 2 + <paragraph> + Paragraph 2. + <section id="title-3" name="title 3"> + <title refid="id3"> + Title 3 + <paragraph> + Paragraph 3. + <section id="title-4" name="title 4"> + <title refid="id4"> + Title 4 + <paragraph> + Paragraph 4. +"""], +["""\ +.. contents:: Table of Contents + +Title 1 +======= +Paragraph 1. + +Title 2 +------- +Paragraph 2. +""", +"""\ +<document> + <topic class="contents"> + <title> + Table of Contents + <bullet_list> + <list_item id="id1"> + <paragraph> + <reference refid="title-1"> + Title 1 + <bullet_list> + <list_item id="id2"> + <paragraph> + <reference refid="title-2"> + Title 2 + <section id="title-1" name="title 1"> + <title refid="id1"> + Title 1 + <paragraph> + Paragraph 1. + <section id="title-2" name="title 2"> + <title refid="id2"> + Title 2 + <paragraph> + Paragraph 2. +"""], +["""\ +.. contents:: + :depth: 2 + +Title 1 +======= +Paragraph 1. + +Title 2 +------- +Paragraph 2. + +Title 3 +``````` +Paragraph 3. + +Title 4 +------- +Paragraph 4. +""", +"""\ +<document> + <topic class="contents"> + <title> + Contents + <bullet_list> + <list_item id="id1"> + <paragraph> + <reference refid="title-1"> + Title 1 + <bullet_list> + <list_item id="id2"> + <paragraph> + <reference refid="title-2"> + Title 2 + <list_item id="id3"> + <paragraph> + <reference refid="title-4"> + Title 4 + <section id="title-1" name="title 1"> + <title refid="id1"> + Title 1 + <paragraph> + Paragraph 1. + <section id="title-2" name="title 2"> + <title refid="id2"> + Title 2 + <paragraph> + Paragraph 2. + <section id="title-3" name="title 3"> + <title> + Title 3 + <paragraph> + Paragraph 3. + <section id="title-4" name="title 4"> + <title refid="id3"> + Title 4 + <paragraph> + Paragraph 4. +"""], +["""\ +Title 1 +======= + +.. contents:: + :local: + +Paragraph 1. + +Title 2 +------- +Paragraph 2. + +Title 3 +``````` +Paragraph 3. + +Title 4 +------- +Paragraph 4. +""", +"""\ +<document> + <section id="title-1" name="title 1"> + <title> + Title 1 + <topic class="contents"> + <bullet_list> + <list_item id="id1"> + <paragraph> + <reference refid="title-2"> + Title 2 + <bullet_list> + <list_item id="id2"> + <paragraph> + <reference refid="title-3"> + Title 3 + <list_item id="id3"> + <paragraph> + <reference refid="title-4"> + Title 4 + <paragraph> + Paragraph 1. + <section id="title-2" name="title 2"> + <title refid="id1"> + Title 2 + <paragraph> + Paragraph 2. + <section id="title-3" name="title 3"> + <title refid="id2"> + Title 3 + <paragraph> + Paragraph 3. + <section id="title-4" name="title 4"> + <title refid="id3"> + Title 4 + <paragraph> + Paragraph 4. +"""], +["""\ +.. contents:: + +Degenerate case, no table of contents generated. +""", +"""\ +<document> + <paragraph> + Degenerate case, no table of contents generated. +"""], +]) + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_transforms/test_docinfo.py b/test/test_transforms/test_docinfo.py new file mode 100755 index 000000000..404a0b9a7 --- /dev/null +++ b/test/test_transforms/test_docinfo.py @@ -0,0 +1,327 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for docutils.transforms.frontmatter.DocInfo. +""" + +import DocutilsTestSupport +import UnitTestFolder +from docutils.transforms.frontmatter import DocInfo +from docutils.parsers.rst import Parser + + +def suite(): + parser = Parser() + s = DocutilsTestSupport.TransformTestSuite(parser) + s.generateTests(totest) + return s + +totest = {} + +totest['bibliographic_field_lists'] = ((DocInfo,), [ +["""\ +.. Bibliographic element extraction. + +:Abstract: + There can only be one abstract. + + It is automatically moved to the end of the other bibliographic elements. + +:Author: Me +:Version: 1 +:Date: 2001-08-11 +:Parameter i: integer +""", +"""\ +<document> + <docinfo> + <author> + Me + <version> + 1 + <date> + 2001-08-11 + <topic class="abstract"> + <title> + Abstract + <paragraph> + There can only be one abstract. + <paragraph> + It is automatically moved to the end of the other bibliographic elements. + <comment> + Bibliographic element extraction. + <field_list> + <field> + <field_name> + Parameter + <field_argument> + i + <field_body> + <paragraph> + integer +"""], +["""\ +.. Bibliographic element extraction. + +:Abstract: Abstract 1. +:Author: Me +:Contact: me@my.org +:Version: 1 +:Abstract: Abstract 2 (should generate a warning). +:Date: 2001-08-11 +:Parameter i: integer +""", +"""\ +<document> + <docinfo> + <author> + Me + <contact> + <reference refuri="mailto:me@my.org"> + me@my.org + <version> + 1 + <date> + 2001-08-11 + <topic class="abstract"> + <title> + Abstract + <paragraph> + Abstract 1. + <comment> + Bibliographic element extraction. + <field_list> + <field> + <field_name> + Abstract + <field_body> + <paragraph> + Abstract 2 (should generate a warning). + <system_message level="2" type="WARNING"> + <paragraph> + There can only be one abstract. + <field> + <field_name> + Parameter + <field_argument> + i + <field_body> + <paragraph> + integer +"""], +["""\ +:Author: - must be a paragraph +:Status: a *simple* paragraph +:Date: But only one + + paragraph. +:Version: + +.. and not empty either +""", +"""\ +<document> + <docinfo> + <status> + a \n\ + <emphasis> + simple + paragraph + <field_list> + <field> + <field_name> + Author + <field_body> + <bullet_list bullet="-"> + <list_item> + <paragraph> + must be a paragraph + <system_message level="2" type="WARNING"> + <paragraph> + Cannot extract bibliographic field "Author" containing anything other than a single paragraph. + <field> + <field_name> + Date + <field_body> + <paragraph> + But only one + <paragraph> + paragraph. + <system_message level="2" type="WARNING"> + <paragraph> + Cannot extract compound bibliographic field "Date". + <field> + <field_name> + Version + <field_body> + <system_message level="2" type="WARNING"> + <paragraph> + Cannot extract empty bibliographic field "Version". + <comment> + and not empty either +"""], +["""\ +:Authors: Me, Myself, **I** +:Authors: PacMan; Ms. PacMan; PacMan, Jr. +:Authors: + Here + + There + + *Everywhere* +:Authors: - First + - Second + - Third +""", +"""\ +<document> + <docinfo> + <authors> + <author> + Me + <author> + Myself + <author> + I + <authors> + <author> + PacMan + <author> + Ms. PacMan + <author> + PacMan, Jr. + <authors> + <author> + Here + <author> + There + <author> + <emphasis> + Everywhere + <authors> + <author> + First + <author> + Second + <author> + Third +"""], +["""\ +:Authors: + +:Authors: 1. One + 2. Two + +:Authors: + - + - + +:Authors: + - One + + Two + +:Authors: + - One + + Two +""", +"""\ +<document> + <field_list> + <field> + <field_name> + Authors + <field_body> + <system_message level="2" type="WARNING"> + <paragraph> + Cannot extract empty bibliographic field "Authors". + <field> + <field_name> + Authors + <field_body> + <enumerated_list enumtype="arabic" prefix="" suffix="."> + <list_item> + <paragraph> + One + <list_item> + <paragraph> + Two + <system_message level="2" type="WARNING"> + <paragraph> + Bibliographic field "Authors" incompatible with extraction: it must contain either a single paragraph (with authors separated by one of ";,"), multiple paragraphs (one per author), or a bullet list with one paragraph (one author) per item. + <field> + <field_name> + Authors + <field_body> + <bullet_list bullet="-"> + <list_item> + <list_item> + <system_message level="2" type="WARNING"> + <paragraph> + Bibliographic field "Authors" incompatible with extraction: it must contain either a single paragraph (with authors separated by one of ";,"), multiple paragraphs (one per author), or a bullet list with one paragraph (one author) per item. + <field> + <field_name> + Authors + <field_body> + <bullet_list bullet="-"> + <list_item> + <paragraph> + One + <paragraph> + Two + <system_message level="2" type="WARNING"> + <paragraph> + Bibliographic field "Authors" incompatible with extraction: it must contain either a single paragraph (with authors separated by one of ";,"), multiple paragraphs (one per author), or a bullet list with one paragraph (one author) per item. + <field> + <field_name> + Authors + <field_body> + <bullet_list bullet="-"> + <list_item> + <paragraph> + One + <paragraph> + Two + <system_message level="2" type="WARNING"> + <paragraph> + Bibliographic field "Authors" incompatible with extraction: it must contain either a single paragraph (with authors separated by one of ";,"), multiple paragraphs (one per author), or a bullet list with one paragraph (one author) per item. +"""], +["""\ +.. RCS keyword extraction. + +:Status: $RCSfile$ +:Date: $Date$ + +RCS keyword 'RCSfile' doesn't change unless the file name changes, +so it's safe. The 'Date' keyword changes every time the file is +checked in to CVS, so the test's expected output text has to be +derived (hacked) in parallel in order to stay in sync. +""", +"""\ +<document> + <docinfo> + <status> + test_docinfo.py + <date> + %s + <comment> + RCS keyword extraction. + <paragraph> + RCS keyword 'RCSfile' doesn't change unless the file name changes, + so it's safe. The 'Date' keyword changes every time the file is + checked in to CVS, so the test's expected output text has to be + derived (hacked) in parallel in order to stay in sync. +""" % ('$Date$'[7:17].replace('/', '-'),)], +]) + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_transforms/test_doctitle.py b/test/test_transforms/test_doctitle.py new file mode 100755 index 000000000..4ba0d69c5 --- /dev/null +++ b/test/test_transforms/test_doctitle.py @@ -0,0 +1,175 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for docutils.transforms.frontmatter.DocTitle. +""" + +import DocutilsTestSupport +import UnitTestFolder +from docutils.transforms.frontmatter import DocTitle +from docutils.parsers.rst import Parser + + +def suite(): + parser = Parser() + s = DocutilsTestSupport.TransformTestSuite(parser) + s.generateTests(totest) + return s + +totest = {} + +totest['section_headers'] = ((DocTitle,), [ +["""\ +.. test title promotion + +Title +===== + +Paragraph. +""", +"""\ +<document id="title" name="title"> + <title> + Title + <comment> + test title promotion + <paragraph> + Paragraph. +"""], +["""\ +Title +===== +Paragraph (no blank line). +""", +"""\ +<document id="title" name="title"> + <title> + Title + <paragraph> + Paragraph (no blank line). +"""], +["""\ +Paragraph. + +Title +===== + +Paragraph. +""", +"""\ +<document> + <paragraph> + Paragraph. + <section id="title" name="title"> + <title> + Title + <paragraph> + Paragraph. +"""], +["""\ +Title +===== + +Subtitle +-------- + +Test title & subtitle. +""", +"""\ +<document id="title" name="title"> + <title> + Title + <subtitle id="subtitle" name="subtitle"> + Subtitle + <paragraph> + Test title & subtitle. +"""], +["""\ +Title +==== + +Test short underline. +""", +"""\ +<document id="title" name="title"> + <title> + Title + <system_message level="1" type="INFO"> + <paragraph> + Title underline too short at line 2. + <literal_block> + Title + ==== + <paragraph> + Test short underline. +"""], +["""\ +======= + Long Title +======= + +Test long title and space normalization. +The system_message should move after the document title +(it was before the beginning of the section). +""", +"""\ +<document id="long-title" name="long title"> + <title> + Long Title + <system_message level="1" type="INFO"> + <paragraph> + Title overline too short at line 1. + <literal_block> + ======= + Long Title + ======= + <paragraph> + Test long title and space normalization. + The system_message should move after the document title + (it was before the beginning of the section). +"""], +["""\ +.. Test multiple second-level titles. + +Title 1 +======= +Paragraph 1. + +Title 2 +------- +Paragraph 2. + +Title 3 +------- +Paragraph 3. +""", +"""\ +<document id="title-1" name="title 1"> + <title> + Title 1 + <comment> + Test multiple second-level titles. + <paragraph> + Paragraph 1. + <section id="title-2" name="title 2"> + <title> + Title 2 + <paragraph> + Paragraph 2. + <section id="title-3" name="title 3"> + <title> + Title 3 + <paragraph> + Paragraph 3. +"""], +]) + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_transforms/test_final_checks.py b/test/test_transforms/test_final_checks.py new file mode 100755 index 000000000..e65f16475 --- /dev/null +++ b/test/test_transforms/test_final_checks.py @@ -0,0 +1,47 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for docutils.transforms.universal.FinalChecks. +""" + +import DocutilsTestSupport +import UnitTestFolder +from docutils.transforms.universal import FinalChecks +from docutils.parsers.rst import Parser + + +def suite(): + parser = Parser() + s = DocutilsTestSupport.TransformTestSuite(parser) + s.generateTests(totest) + return s + +totest = {} + +totest['final_checks'] = ((FinalChecks,), [ +["""\ +Unknown reference_. +""", +"""\ +<document> + <paragraph> + Unknown + <problematic id="id2" refid="id1"> + reference_ + . + <system_message backrefs="id2" id="id1" level="3" type="ERROR"> + <paragraph> + Unknown target name: "reference". +"""], +]) + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_transforms/test_footnotes2.py b/test/test_transforms/test_footnotes2.py new file mode 100755 index 000000000..2fa0b16db --- /dev/null +++ b/test/test_transforms/test_footnotes2.py @@ -0,0 +1,521 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for docutils.transforms.references.Footnotes. +""" + +import DocutilsTestSupport +import UnitTestFolder +from docutils.transforms.references import Footnotes +from docutils.parsers.rst import Parser + + +def suite(): + parser = Parser() + s = DocutilsTestSupport.TransformTestSuite(parser) + s.generateTests(totest) + return s + +totest = {} + +totest['footnotes'] = ((Footnotes,), [ +["""\ +[#autolabel]_ + +.. [#autolabel] text +""", +"""\ +<document> + <paragraph> + <footnote_reference auto="1" id="id1" refid="autolabel"> + 1 + <footnote auto="1" backrefs="id1" id="autolabel" name="autolabel"> + <label> + 1 + <paragraph> + text +"""], +["""\ +autonumber: [#]_ + +.. [#] text +""", +"""\ +<document> + <paragraph> + autonumber: \n\ + <footnote_reference auto="1" id="id1" refid="id2"> + 1 + <footnote auto="1" backrefs="id1" id="id2" name="1"> + <label> + 1 + <paragraph> + text +"""], +["""\ +[#]_ is the first auto-numbered footnote reference. +[#]_ is the second auto-numbered footnote reference. + +.. [#] Auto-numbered footnote 1. +.. [#] Auto-numbered footnote 2. +.. [#] Auto-numbered footnote 3. + +[#]_ is the third auto-numbered footnote reference. +""", +"""\ +<document> + <paragraph> + <footnote_reference auto="1" id="id1" refid="id3"> + 1 + is the first auto-numbered footnote reference. + <footnote_reference auto="1" id="id2" refid="id4"> + 2 + is the second auto-numbered footnote reference. + <footnote auto="1" backrefs="id1" id="id3" name="1"> + <label> + 1 + <paragraph> + Auto-numbered footnote 1. + <footnote auto="1" backrefs="id2" id="id4" name="2"> + <label> + 2 + <paragraph> + Auto-numbered footnote 2. + <footnote auto="1" backrefs="id6" id="id5" name="3"> + <label> + 3 + <paragraph> + Auto-numbered footnote 3. + <paragraph> + <footnote_reference auto="1" id="id6" refid="id5"> + 3 + is the third auto-numbered footnote reference. +"""], +["""\ +[#third]_ is a reference to the third auto-numbered footnote. + +.. [#first] First auto-numbered footnote. +.. [#second] Second auto-numbered footnote. +.. [#third] Third auto-numbered footnote. + +[#second]_ is a reference to the second auto-numbered footnote. +[#first]_ is a reference to the first auto-numbered footnote. +[#third]_ is another reference to the third auto-numbered footnote. + +Here are some internal cross-references to the implicit targets +generated by the footnotes: first_, second_, third_. +""", +"""\ +<document> + <paragraph> + <footnote_reference auto="1" id="id1" refid="third"> + 3 + is a reference to the third auto-numbered footnote. + <footnote auto="1" backrefs="id3" id="first" name="first"> + <label> + 1 + <paragraph> + First auto-numbered footnote. + <footnote auto="1" backrefs="id2" id="second" name="second"> + <label> + 2 + <paragraph> + Second auto-numbered footnote. + <footnote auto="1" backrefs="id1 id4" id="third" name="third"> + <label> + 3 + <paragraph> + Third auto-numbered footnote. + <paragraph> + <footnote_reference auto="1" id="id2" refid="second"> + 2 + is a reference to the second auto-numbered footnote. + <footnote_reference auto="1" id="id3" refid="first"> + 1 + is a reference to the first auto-numbered footnote. + <footnote_reference auto="1" id="id4" refid="third"> + 3 + is another reference to the third auto-numbered footnote. + <paragraph> + Here are some internal cross-references to the implicit targets + generated by the footnotes: \n\ + <reference refname="first"> + first + , \n\ + <reference refname="second"> + second + , \n\ + <reference refname="third"> + third + . +"""], +["""\ +Mixed anonymous and labelled auto-numbered footnotes: + +[#four]_ should be 4, [#]_ should be 1, +[#]_ should be 3, [#]_ is one too many, +[#two]_ should be 2, and [#six]_ doesn't exist. + +.. [#] Auto-numbered footnote 1. +.. [#two] Auto-numbered footnote 2. +.. [#] Auto-numbered footnote 3. +.. [#four] Auto-numbered footnote 4. +.. [#five] Auto-numbered footnote 5. +.. [#five] Auto-numbered footnote 5 again (duplicate). +""", +"""\ +<document> + <paragraph> + Mixed anonymous and labelled auto-numbered footnotes: + <paragraph> + <footnote_reference auto="1" id="id1" refid="four"> + 4 + should be 4, \n\ + <footnote_reference auto="1" id="id2" refid="id7"> + 1 + should be 1, + <footnote_reference auto="1" id="id3" refid="id8"> + 3 + should be 3, \n\ + <problematic id="id11" refid="id10"> + [#]_ + is one too many, + <footnote_reference auto="1" id="id5" refid="two"> + 2 + should be 2, and \n\ + <footnote_reference auto="1" id="id6" refname="six"> + doesn't exist. + <footnote auto="1" backrefs="id2" id="id7" name="1"> + <label> + 1 + <paragraph> + Auto-numbered footnote 1. + <footnote auto="1" backrefs="id5" id="two" name="two"> + <label> + 2 + <paragraph> + Auto-numbered footnote 2. + <footnote auto="1" backrefs="id3" id="id8" name="3"> + <label> + 3 + <paragraph> + Auto-numbered footnote 3. + <footnote auto="1" backrefs="id1" id="four" name="four"> + <label> + 4 + <paragraph> + Auto-numbered footnote 4. + <footnote auto="1" dupname="five" id="five"> + <label> + 5 + <paragraph> + Auto-numbered footnote 5. + <footnote auto="1" dupname="five" id="id9"> + <label> + 6 + <system_message backrefs="id9" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "five". + <paragraph> + Auto-numbered footnote 5 again (duplicate). + <system_message backrefs="id11" id="id10" level="3" type="ERROR"> + <paragraph> + Too many autonumbered footnote references: only 2 corresponding footnotes available. +"""], +["""\ +Mixed auto-numbered and manual footnotes: + +.. [1] manually numbered +.. [#] auto-numbered +.. [#label] autonumber-labeled +""", +"""\ +<document> + <paragraph> + Mixed auto-numbered and manual footnotes: + <footnote id="id1" name="1"> + <label> + 1 + <paragraph> + manually numbered + <footnote auto="1" id="id2" name="2"> + <label> + 2 + <paragraph> + auto-numbered + <footnote auto="1" id="label" name="label"> + <label> + 3 + <paragraph> + autonumber-labeled +"""], +["""\ +A labeled autonumbered footnote referece: [#footnote]_. + +An unlabeled autonumbered footnote referece: [#]_. + +.. [#] Unlabeled autonumbered footnote. +.. [#footnote] Labeled autonumbered footnote. + Note that the footnotes are not in the same + order as the references. +""", +"""\ +<document> + <paragraph> + A labeled autonumbered footnote referece: \n\ + <footnote_reference auto="1" id="id1" refid="footnote"> + 2 + . + <paragraph> + An unlabeled autonumbered footnote referece: \n\ + <footnote_reference auto="1" id="id2" refid="id3"> + 1 + . + <footnote auto="1" backrefs="id2" id="id3" name="1"> + <label> + 1 + <paragraph> + Unlabeled autonumbered footnote. + <footnote auto="1" backrefs="id1" id="footnote" name="footnote"> + <label> + 2 + <paragraph> + Labeled autonumbered footnote. + Note that the footnotes are not in the same + order as the references. +"""], +["""\ +Mixed manually-numbered, anonymous auto-numbered, +and labelled auto-numbered footnotes: + +[#four]_ should be 4, [#]_ should be 2, +[1]_ is 1, [3]_ is 3, +[#]_ should be 6, [#]_ is one too many, +[#five]_ should be 5, and [#eight]_ doesn't exist. + +.. [1] Manually-numbered footnote 1. +.. [#] Auto-numbered footnote 2. +.. [#four] Auto-numbered footnote 4. +.. [3] Manually-numbered footnote 3 +.. [#five] Auto-numbered footnote 5. +.. [#] Auto-numbered footnote 6. +.. [#five] Auto-numbered footnote 5 again (duplicate). +""", +"""\ +<document> + <paragraph> + Mixed manually-numbered, anonymous auto-numbered, + and labelled auto-numbered footnotes: + <paragraph> + <footnote_reference auto="1" id="id1" refid="four"> + 4 + should be 4, \n\ + <footnote_reference auto="1" id="id2" refid="id10"> + 2 + should be 2, + <footnote_reference id="id3" refid="id9"> + 1 + is 1, \n\ + <footnote_reference id="id4" refid="id11"> + 3 + is 3, + <footnote_reference auto="1" id="id5" refid="id12"> + 6 + should be 6, \n\ + <problematic id="id15" refid="id14"> + [#]_ + is one too many, + <footnote_reference auto="1" id="id7" refname="five"> + should be 5, and \n\ + <footnote_reference auto="1" id="id8" refname="eight"> + doesn't exist. + <footnote backrefs="id3" id="id9" name="1"> + <label> + 1 + <paragraph> + Manually-numbered footnote 1. + <footnote auto="1" backrefs="id2" id="id10" name="2"> + <label> + 2 + <paragraph> + Auto-numbered footnote 2. + <footnote auto="1" backrefs="id1" id="four" name="four"> + <label> + 4 + <paragraph> + Auto-numbered footnote 4. + <footnote backrefs="id4" id="id11" name="3"> + <label> + 3 + <paragraph> + Manually-numbered footnote 3 + <footnote auto="1" dupname="five" id="five"> + <label> + 5 + <paragraph> + Auto-numbered footnote 5. + <footnote auto="1" backrefs="id5" id="id12" name="6"> + <label> + 6 + <paragraph> + Auto-numbered footnote 6. + <footnote auto="1" dupname="five" id="id13"> + <label> + 7 + <system_message backrefs="id13" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "five". + <paragraph> + Auto-numbered footnote 5 again (duplicate). + <system_message backrefs="id15" id="id14" level="3" type="ERROR"> + <paragraph> + Too many autonumbered footnote references: only 2 corresponding footnotes available. +"""], +["""\ +Referencing a footnote by symbol [*]_. + +.. [*] This is an auto-symbol footnote. +""", +"""\ +<document> + <paragraph> + Referencing a footnote by symbol \n\ + <footnote_reference auto="*" id="id1" refid="id2"> + * + . + <footnote auto="*" backrefs="id1" id="id2"> + <label> + * + <paragraph> + This is an auto-symbol footnote. +"""], +["""\ +A sequence of symbol footnote references: +[*]_ [*]_ [*]_ [*]_ [*]_ [*]_ [*]_ [*]_ [*]_ [*]_ [*]_ [*]_. + +.. [*] Auto-symbol footnote 1. +.. [*] Auto-symbol footnote 2. +.. [*] Auto-symbol footnote 3. +.. [*] Auto-symbol footnote 4. +.. [*] Auto-symbol footnote 5. +.. [*] Auto-symbol footnote 6. +.. [*] Auto-symbol footnote 7. +.. [*] Auto-symbol footnote 8. +.. [*] Auto-symbol footnote 9. +.. [*] Auto-symbol footnote 10. +.. [*] Auto-symbol footnote 11. +.. [*] Auto-symbol footnote 12. +""", +u"""\ +<document> + <paragraph> + A sequence of symbol footnote references: + <footnote_reference auto="*" id="id1" refid="id13"> + * + \n\ + <footnote_reference auto="*" id="id2" refid="id14"> + \u2020 + \n\ + <footnote_reference auto="*" id="id3" refid="id15"> + \u2021 + \n\ + <footnote_reference auto="*" id="id4" refid="id16"> + \u00A7 + \n\ + <footnote_reference auto="*" id="id5" refid="id17"> + \u00B6 + \n\ + <footnote_reference auto="*" id="id6" refid="id18"> + # + \n\ + <footnote_reference auto="*" id="id7" refid="id19"> + \u2660 + \n\ + <footnote_reference auto="*" id="id8" refid="id20"> + \u2665 + \n\ + <footnote_reference auto="*" id="id9" refid="id21"> + \u2666 + \n\ + <footnote_reference auto="*" id="id10" refid="id22"> + \u2663 + \n\ + <footnote_reference auto="*" id="id11" refid="id23"> + ** + \n\ + <footnote_reference auto="*" id="id12" refid="id24"> + \u2020\u2020 + . + <footnote auto="*" backrefs="id1" id="id13"> + <label> + * + <paragraph> + Auto-symbol footnote 1. + <footnote auto="*" backrefs="id2" id="id14"> + <label> + \u2020 + <paragraph> + Auto-symbol footnote 2. + <footnote auto="*" backrefs="id3" id="id15"> + <label> + \u2021 + <paragraph> + Auto-symbol footnote 3. + <footnote auto="*" backrefs="id4" id="id16"> + <label> + \u00A7 + <paragraph> + Auto-symbol footnote 4. + <footnote auto="*" backrefs="id5" id="id17"> + <label> + \u00B6 + <paragraph> + Auto-symbol footnote 5. + <footnote auto="*" backrefs="id6" id="id18"> + <label> + # + <paragraph> + Auto-symbol footnote 6. + <footnote auto="*" backrefs="id7" id="id19"> + <label> + \u2660 + <paragraph> + Auto-symbol footnote 7. + <footnote auto="*" backrefs="id8" id="id20"> + <label> + \u2665 + <paragraph> + Auto-symbol footnote 8. + <footnote auto="*" backrefs="id9" id="id21"> + <label> + \u2666 + <paragraph> + Auto-symbol footnote 9. + <footnote auto="*" backrefs="id10" id="id22"> + <label> + \u2663 + <paragraph> + Auto-symbol footnote 10. + <footnote auto="*" backrefs="id11" id="id23"> + <label> + ** + <paragraph> + Auto-symbol footnote 11. + <footnote auto="*" backrefs="id12" id="id24"> + <label> + \u2020\u2020 + <paragraph> + Auto-symbol footnote 12. +"""], +]) + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_transforms/test_hyperlinks.py b/test/test_transforms/test_hyperlinks.py new file mode 100755 index 000000000..4b015492a --- /dev/null +++ b/test/test_transforms/test_hyperlinks.py @@ -0,0 +1,441 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for docutils.transforms.references.Hyperlinks. +""" + +import DocutilsTestSupport +import UnitTestFolder +from docutils.transforms.references import Hyperlinks +from docutils.parsers.rst import Parser + + +def suite(): + parser = Parser() + s = DocutilsTestSupport.TransformTestSuite(parser) + s.generateTests(totest) + return s + +totest = {} + +# Exhaustive listing of hyperlink variations: every combination of +# target/reference, direct/indirect, internal/external, and named/anonymous. +totest['exhaustive_hyperlinks'] = ((Hyperlinks,), [ +["""\ +direct_ external + +.. _direct: http://direct +""", +"""\ +<document> + <paragraph> + <reference refuri="http://direct"> + direct + external + <target id="direct" name="direct" refuri="http://direct"> +"""], +["""\ +indirect_ external + +.. _indirect: xtarget_ +.. _xtarget: http://indirect +""", +"""\ +<document> + <paragraph> + <reference refuri="http://indirect"> + indirect + external + <target id="indirect" name="indirect" refuri="http://indirect"> + <target id="xtarget" name="xtarget" refuri="http://indirect"> +"""], +["""\ +.. _direct: + +direct_ internal +""", +"""\ +<document> + <target id="direct" name="direct"> + <paragraph> + <reference refid="direct"> + direct + internal +"""], +["""\ +.. _ztarget: + +indirect_ internal + +.. _indirect2: ztarget_ +.. _indirect: indirect2_ +""", +"""\ +<document> + <target id="ztarget" name="ztarget"> + <paragraph> + <reference refid="ztarget"> + indirect + internal + <target id="indirect2" name="indirect2" refid="ztarget"> + <target id="indirect" name="indirect" refid="ztarget"> +"""], +["""\ +Implicit +-------- + +indirect_ internal + +.. _indirect: implicit_ +""", +"""\ +<document> + <section id="implicit" name="implicit"> + <title> + Implicit + <paragraph> + <reference refid="implicit"> + indirect + internal + <target id="indirect" name="indirect" refid="implicit"> +"""], +["""\ +Implicit +-------- + +Duplicate implicit targets. + +Implicit +-------- + +indirect_ internal + +.. _indirect: implicit_ +""", +"""\ +<document> + <section dupname="implicit" id="implicit"> + <title> + Implicit + <paragraph> + Duplicate implicit targets. + <section dupname="implicit" id="id1"> + <title> + Implicit + <system_message backrefs="id1" level="1" type="INFO"> + <paragraph> + Duplicate implicit target name: "implicit". + <paragraph> + <problematic id="id3" refid="id2"> + indirect_ + internal + <target id="indirect" name="indirect" refname="implicit"> + <system_message backrefs="id3" id="id2" level="2" type="WARNING"> + <paragraph> + Indirect hyperlink target "indirect" (id="indirect") refers to target "implicit", which does not exist. +"""], +["""\ +`direct external`__ + +__ http://direct +""", +"""\ +<document> + <paragraph> + <reference anonymous="1" refuri="http://direct"> + direct external + <target anonymous="1" id="id1" refuri="http://direct"> +"""], +["""\ +`indirect external`__ + +__ xtarget_ +.. _xtarget: http://indirect +""", +"""\ +<document> + <paragraph> + <reference anonymous="1" refuri="http://indirect"> + indirect external + <target anonymous="1" id="id1" refuri="http://indirect"> + <target id="xtarget" name="xtarget" refuri="http://indirect"> +"""], +["""\ +__ + +`direct internal`__ +""", +"""\ +<document> + <target anonymous="1" id="id1"> + <paragraph> + <reference anonymous="1" refid="id1"> + direct internal +"""], +["""\ +.. _ztarget: + +`indirect internal`__ + +__ ztarget_ +""", +"""\ +<document> + <target id="ztarget" name="ztarget"> + <paragraph> + <reference anonymous="1" refid="ztarget"> + indirect internal + <target anonymous="1" id="id1" refid="ztarget"> +"""], +["""\ +.. _ztarget: + +First + +.. _ztarget: + +Second + +`indirect internal`__ + +__ ztarget_ +""", +"""\ +<document> + <target dupname="ztarget" id="ztarget"> + <paragraph> + First + <system_message backrefs="id1" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "ztarget". + <target dupname="ztarget" id="id1"> + <paragraph> + Second + <paragraph> + <reference anonymous="1" refid="id1"> + indirect internal + <target anonymous="1" id="id2" refid="id1"> +"""], +]) + +totest['hyperlinks'] = ((Hyperlinks,), [ +["""\ +.. _internal hyperlink: + +This paragraph referenced. + +By this `internal hyperlink`_ referemce. +""", +"""\ +<document> + <target id="internal-hyperlink" name="internal hyperlink"> + <paragraph> + This paragraph referenced. + <paragraph> + By this \n\ + <reference refid="internal-hyperlink"> + internal hyperlink + referemce. +"""], +["""\ +.. _chained: +.. _internal hyperlink: + +This paragraph referenced. + +By this `internal hyperlink`_ referemce +as well as by this chained_ reference. + +The results of the transform are not visible at the XML level. +""", +"""\ +<document> + <target id="chained" name="chained"> + <target id="internal-hyperlink" name="internal hyperlink"> + <paragraph> + This paragraph referenced. + <paragraph> + By this \n\ + <reference refid="internal-hyperlink"> + internal hyperlink + referemce + as well as by this \n\ + <reference refid="chained"> + chained + reference. + <paragraph> + The results of the transform are not visible at the XML level. +"""], +["""\ +.. _external hyperlink: http://uri + +`External hyperlink`_ reference. +""", +"""\ +<document> + <target id="external-hyperlink" name="external hyperlink" refuri="http://uri"> + <paragraph> + <reference refuri="http://uri"> + External hyperlink + reference. +"""], +["""\ +.. _external hyperlink: http://uri +.. _indirect target: `external hyperlink`_ +""", +"""\ +<document> + <target id="external-hyperlink" name="external hyperlink" refuri="http://uri"> + <target id="indirect-target" name="indirect target" refuri="http://uri"> + <system_message level="1" type="INFO"> + <paragraph> + Indirect hyperlink target "indirect target" is not referenced. +"""], +["""\ +.. _chained: +.. _external hyperlink: http://uri + +`External hyperlink`_ reference +and a chained_ reference too. +""", +"""\ +<document> + <target id="chained" name="chained" refuri="http://uri"> + <target id="external-hyperlink" name="external hyperlink" refuri="http://uri"> + <paragraph> + <reference refuri="http://uri"> + External hyperlink + reference + and a \n\ + <reference refuri="http://uri"> + chained + reference too. +"""], +["""\ +.. _external hyperlink: http://uri +.. _indirect hyperlink: `external hyperlink`_ + +`Indirect hyperlink`_ reference. +""", +"""\ +<document> + <target id="external-hyperlink" name="external hyperlink" refuri="http://uri"> + <target id="indirect-hyperlink" name="indirect hyperlink" refuri="http://uri"> + <paragraph> + <reference refuri="http://uri"> + Indirect hyperlink + reference. +"""], +["""\ +.. _external hyperlink: http://uri +.. _chained: +.. _indirect hyperlink: `external hyperlink`_ + +Chained_ `indirect hyperlink`_ reference. +""", +"""\ +<document> + <target id="external-hyperlink" name="external hyperlink" refuri="http://uri"> + <target id="chained" name="chained" refuri="http://uri"> + <target id="indirect-hyperlink" name="indirect hyperlink" refuri="http://uri"> + <paragraph> + <reference refuri="http://uri"> + Chained + \n\ + <reference refuri="http://uri"> + indirect hyperlink + reference. +"""], +["""\ +.. __: http://full +__ +__ http://simplified +.. _external: http://indirect.external +__ external_ +__ + +`Full syntax anonymous external hyperlink reference`__, +`chained anonymous external reference`__, +`simplified syntax anonymous external hyperlink reference`__, +`indirect anonymous hyperlink reference`__, +`internal anonymous hyperlink reference`__. +""", +"""\ +<document> + <target anonymous="1" id="id1" refuri="http://full"> + <target anonymous="1" id="id2" refuri="http://simplified"> + <target anonymous="1" id="id3" refuri="http://simplified"> + <target id="external" name="external" refuri="http://indirect.external"> + <target anonymous="1" id="id4" refuri="http://indirect.external"> + <target anonymous="1" id="id5"> + <paragraph> + <reference anonymous="1" refuri="http://full"> + Full syntax anonymous external hyperlink reference + , + <reference anonymous="1" refuri="http://simplified"> + chained anonymous external reference + , + <reference anonymous="1" refuri="http://simplified"> + simplified syntax anonymous external hyperlink reference + , + <reference anonymous="1" refuri="http://indirect.external"> + indirect anonymous hyperlink reference + , + <reference anonymous="1" refid="id5"> + internal anonymous hyperlink reference + . +"""], +["""\ +Duplicate external target_'s (different URIs): + +.. _target: first + +.. _target: second +""", +"""\ +<document> + <paragraph> + Duplicate external \n\ + <reference refname="target"> + target + 's (different URIs): + <target dupname="target" id="target" refuri="first"> + <system_message backrefs="id1" level="2" type="WARNING"> + <paragraph> + Duplicate explicit target name: "target". + <target dupname="target" id="id1" refuri="second"> +"""], +["""\ +Several__ anonymous__ hyperlinks__, but not enough targets. + +__ http://example.org +""", +"""\ +<document> + <paragraph> + <problematic id="id3" refid="id2"> + Several__ + \n\ + <problematic id="id4" refid="id2"> + anonymous__ + \n\ + <problematic id="id5" refid="id2"> + hyperlinks__ + , but not enough targets. + <target anonymous="1" id="id1" refuri="http://example.org"> + <system_message backrefs="id3 id4 id5" id="id2" level="3" type="ERROR"> + <paragraph> + Anonymous hyperlink mismatch: 3 references but 1 targets. +"""], +]) + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_transforms/test_messages.py b/test/test_transforms/test_messages.py new file mode 100755 index 000000000..d2134d661 --- /dev/null +++ b/test/test_transforms/test_messages.py @@ -0,0 +1,67 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for docutils.transforms.universal.Messages. +""" + +import DocutilsTestSupport +import UnitTestFolder +from docutils.transforms.universal import Messages +from docutils.transforms.references import Substitutions +from docutils.parsers.rst import Parser + + +def suite(): + parser = Parser() + s = DocutilsTestSupport.TransformTestSuite(parser) + s.generateTests(totest) + return s + +totest = {} + +totest['system_message_sections'] = ((Substitutions, Messages,), [ +["""\ +This |unknown substitution| will generate a system message, thanks to +the ``Substitutions`` transform. The ``Messages`` transform will +generate a "System Messages" section. + +(A second copy of the system message is tacked on to the end of the +doctree by the test framework.) +""", +"""\ +<document> + <paragraph> + This \n\ + <problematic id="id2" refid="id1"> + |unknown substitution| + will generate a system message, thanks to + the \n\ + <literal> + Substitutions + transform. The \n\ + <literal> + Messages + transform will + generate a "System Messages" section. + <paragraph> + (A second copy of the system message is tacked on to the end of the + doctree by the test framework.) + <section class="system-messages"> + <title> + Docutils System Messages + <system_message backrefs="id2" id="id1" level="3" type="ERROR"> + <paragraph> + Undefined substitution referenced: "unknown substitution". +"""], +]) + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_transforms/test_substitutions.py b/test/test_transforms/test_substitutions.py new file mode 100755 index 000000000..693c734a2 --- /dev/null +++ b/test/test_transforms/test_substitutions.py @@ -0,0 +1,61 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Tests for docutils.transforms.references.Substitutions. +""" + +import DocutilsTestSupport +import UnitTestFolder +from docutils.transforms.references import Substitutions +from docutils.parsers.rst import Parser + + +def suite(): + parser = Parser() + s = DocutilsTestSupport.TransformTestSuite(parser) + s.generateTests(totest) + return s + +totest = {} + +totest['substitutions'] = ((Substitutions,), [ +["""\ +The |biohazard| symbol is deservedly scary-looking. + +.. |biohazard| image:: biohazard.png +""", +"""\ +<document> + <paragraph> + The + <image alt="biohazard" uri="biohazard.png"> + symbol is deservedly scary-looking. + <substitution_definition name="biohazard"> + <image alt="biohazard" uri="biohazard.png"> +"""], +["""\ +Here's an |unknown| substitution. +""", +"""\ +<document> + <paragraph> + Here's an + <problematic id="id2" refid="id1"> + |unknown| + substitution. + <system_message backrefs="id2" id="id1" level="3" type="ERROR"> + <paragraph> + Undefined substitution referenced: "unknown". +"""], +]) + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') diff --git a/test/test_utils.py b/test/test_utils.py new file mode 100755 index 000000000..29c926f56 --- /dev/null +++ b/test/test_utils.py @@ -0,0 +1,324 @@ +#! /usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +Test module for utils.py. +""" + +import unittest, StringIO, sys +from DocutilsTestSupport import utils, nodes +try: + import mypdb as pdb +except: + import pdb +pdb.tracenow = 0 + + +class ReporterTests(unittest.TestCase): + + stream = StringIO.StringIO() + reporter = utils.Reporter(2, 4, stream, 1) + + def setUp(self): + self.stream.seek(0) + self.stream.truncate() + + def test_level0(self): + sw = self.reporter.system_message(0, 'debug output') + self.assertEquals(sw.pformat(), """\ +<system_message level="0" type="DEBUG"> + <paragraph> + debug output +""") + self.assertEquals(self.stream.getvalue(), + 'Reporter: DEBUG (0) debug output\n') + + def test_level1(self): + sw = self.reporter.system_message(1, 'a little reminder') + self.assertEquals(sw.pformat(), """\ +<system_message level="1" type="INFO"> + <paragraph> + a little reminder +""") + self.assertEquals(self.stream.getvalue(), '') + + def test_level2(self): + sw = self.reporter.system_message(2, 'a warning') + self.assertEquals(sw.pformat(), """\ +<system_message level="2" type="WARNING"> + <paragraph> + a warning +""") + self.assertEquals(self.stream.getvalue(), + 'Reporter: WARNING (2) a warning\n') + + def test_level3(self): + sw = self.reporter.system_message(3, 'an error') + self.assertEquals(sw.pformat(), """\ +<system_message level="3" type="ERROR"> + <paragraph> + an error +""") + self.assertEquals(self.stream.getvalue(), + 'Reporter: ERROR (3) an error\n') + + def test_level4(self): + self.assertRaises(utils.SystemMessage, self.reporter.system_message, 4, + 'a severe error, raises an exception') + self.assertEquals(self.stream.getvalue(), 'Reporter: SEVERE (4) ' + 'a severe error, raises an exception\n') + + +class QuietReporterTests(unittest.TestCase): + + stream = StringIO.StringIO() + reporter = utils.Reporter(5, 5, stream, 0) + + def setUp(self): + self.stream.seek(0) + self.stream.truncate() + + def test_debug(self): + sw = self.reporter.debug('a debug message') + self.assertEquals(sw.pformat(), """\ +<system_message level="0" type="DEBUG"> + <paragraph> + a debug message +""") + self.assertEquals(self.stream.getvalue(), '') + + def test_info(self): + sw = self.reporter.info('an informational message') + self.assertEquals(sw.pformat(), """\ +<system_message level="1" type="INFO"> + <paragraph> + an informational message +""") + self.assertEquals(self.stream.getvalue(), '') + + def test_warning(self): + sw = self.reporter.warning('a warning') + self.assertEquals(sw.pformat(), """\ +<system_message level="2" type="WARNING"> + <paragraph> + a warning +""") + self.assertEquals(self.stream.getvalue(), '') + + def test_error(self): + sw = self.reporter.error('an error') + self.assertEquals(sw.pformat(), """\ +<system_message level="3" type="ERROR"> + <paragraph> + an error +""") + self.assertEquals(self.stream.getvalue(), '') + + def test_severe(self): + sw = self.reporter.severe('a severe error') + self.assertEquals(sw.pformat(), """\ +<system_message level="4" type="SEVERE"> + <paragraph> + a severe error +""") + self.assertEquals(self.stream.getvalue(), '') + + +class ReporterCategoryTests(unittest.TestCase): + + stream = StringIO.StringIO() + + def setUp(self): + self.stream.seek(0) + self.stream.truncate() + self.reporter = utils.Reporter(2, 4, self.stream, 1) + self.reporter.setconditions('lemon', 1, 3, self.stream, 0) + + def test_getset(self): + self.reporter.setconditions('test', 5, 5, None, 0) + self.assertEquals(self.reporter.getconditions('other').astuple(), + (1, 2, 4, self.stream)) + self.assertEquals(self.reporter.getconditions('test').astuple(), + (0, 5, 5, sys.stderr)) + self.assertEquals(self.reporter.getconditions('test.dummy').astuple(), + (0, 5, 5, sys.stderr)) + self.reporter.setconditions('test.dummy.spam', 1, 2, self.stream, 1) + self.assertEquals( + self.reporter.getconditions('test.dummy.spam').astuple(), + (1, 1, 2, self.stream)) + self.assertEquals(self.reporter.getconditions('test.dummy').astuple(), + (0, 5, 5, sys.stderr)) + self.assertEquals( + self.reporter.getconditions('test.dummy.spam.eggs').astuple(), + (1, 1, 2, self.stream)) + self.reporter.unsetconditions('test.dummy.spam') + self.assertEquals( + self.reporter.getconditions('test.dummy.spam.eggs').astuple(), + (0, 5, 5, sys.stderr)) + + def test_debug(self): + sw = self.reporter.debug('debug output', category='lemon.curry') + self.assertEquals(self.stream.getvalue(), '') + sw = self.reporter.debug('debug output') + self.assertEquals(self.stream.getvalue(), + 'Reporter: DEBUG (0) debug output\n') + + def test_info(self): + sw = self.reporter.info('some info') + self.assertEquals(self.stream.getvalue(), '') + sw = self.reporter.info('some info', category='lemon.curry') + self.assertEquals( + self.stream.getvalue(), + 'Reporter "lemon.curry": INFO (1) some info\n') + + def test_warning(self): + sw = self.reporter.warning('a warning') + self.assertEquals(self.stream.getvalue(), + 'Reporter: WARNING (2) a warning\n') + sw = self.reporter.warning('a warning', category='lemon.curry') + self.assertEquals(self.stream.getvalue(), """\ +Reporter: WARNING (2) a warning +Reporter "lemon.curry": WARNING (2) a warning +""") + + def test_error(self): + sw = self.reporter.error('an error') + self.assertEquals(self.stream.getvalue(), + 'Reporter: ERROR (3) an error\n') + self.assertRaises(utils.SystemMessage, self.reporter.error, + 'an error', category='lemon.curry') + self.assertEquals(self.stream.getvalue(), """\ +Reporter: ERROR (3) an error +Reporter "lemon.curry": ERROR (3) an error +""") + + def test_severe(self): + self.assertRaises(utils.SystemMessage, self.reporter.severe, + 'a severe error') + self.assertEquals(self.stream.getvalue(), + 'Reporter: SEVERE (4) a severe error\n') + self.assertRaises(utils.SystemMessage, self.reporter.severe, + 'a severe error', category='lemon.curry') + self.assertEquals(self.stream.getvalue(), """\ +Reporter: SEVERE (4) a severe error +Reporter "lemon.curry": SEVERE (4) a severe error +""") + + +class NameValueTests(unittest.TestCase): + + def test_extract_name_value(self): + self.assertRaises(utils.NameValueError, utils.extract_name_value, + 'hello') + self.assertRaises(utils.NameValueError, utils.extract_name_value, + 'hello') + self.assertRaises(utils.NameValueError, utils.extract_name_value, + '=hello') + self.assertRaises(utils.NameValueError, utils.extract_name_value, + 'hello=') + self.assertRaises(utils.NameValueError, utils.extract_name_value, + 'hello="') + self.assertRaises(utils.NameValueError, utils.extract_name_value, + 'hello="something') + self.assertRaises(utils.NameValueError, utils.extract_name_value, + 'hello="something"else') + output = utils.extract_name_value( + """att1=val1 att2=val2 att3="value number '3'" att4=val4""") + self.assertEquals(output, [('att1', 'val1'), ('att2', 'val2'), + ('att3', "value number '3'"), + ('att4', 'val4')]) + + +class ExtensionAttributeTests(unittest.TestCase): + + attributespec = {'a': int, 'bbb': float, 'cdef': (lambda x: x), + 'empty': (lambda x: x)} + + def test_assemble_attribute_dict(self): + input = utils.extract_name_value('a=1 bbb=2.0 cdef=hol%s' % chr(224)) + self.assertEquals( + utils.assemble_attribute_dict(input, self.attributespec), + {'a': 1, 'bbb': 2.0, 'cdef': ('hol%s' % chr(224))}) + input = utils.extract_name_value('a=1 b=2.0 c=hol%s' % chr(224)) + self.assertRaises(KeyError, utils.assemble_attribute_dict, + input, self.attributespec) + input = utils.extract_name_value('a=1 bbb=two cdef=hol%s' % chr(224)) + self.assertRaises(ValueError, utils.assemble_attribute_dict, + input, self.attributespec) + + def test_extract_extension_attributes(self): + field_list = nodes.field_list() + field_list += nodes.field( + '', nodes.field_name('', 'a'), + nodes.field_body('', nodes.paragraph('', '1'))) + field_list += nodes.field( + '', nodes.field_name('', 'bbb'), + nodes.field_body('', nodes.paragraph('', '2.0'))) + field_list += nodes.field( + '', nodes.field_name('', 'cdef'), + nodes.field_body('', nodes.paragraph('', 'hol%s' % chr(224)))) + field_list += nodes.field( + '', nodes.field_name('', 'empty'), nodes.field_body()) + self.assertEquals( + utils.extract_extension_attributes(field_list, + self.attributespec), + {'a': 1, 'bbb': 2.0, 'cdef': ('hol%s' % chr(224)), + 'empty': None}) + self.assertRaises(KeyError, utils.extract_extension_attributes, + field_list, {}) + field_list += nodes.field( + '', nodes.field_name('', 'cdef'), + nodes.field_body('', nodes.paragraph('', 'one'), + nodes.paragraph('', 'two'))) + self.assertRaises(utils.BadAttributeDataError, + utils.extract_extension_attributes, + field_list, self.attributespec) + field_list[-1] = nodes.field( + '', nodes.field_name('', 'cdef'), + nodes.field_argument('', 'bad'), + nodes.field_body('', nodes.paragraph('', 'no arguments'))) + self.assertRaises(utils.BadAttributeError, + utils.extract_extension_attributes, + field_list, self.attributespec) + field_list[-1] = nodes.field( + '', nodes.field_name('', 'cdef'), + nodes.field_body('', nodes.paragraph('', 'duplicate'))) + self.assertRaises(utils.DuplicateAttributeError, + utils.extract_extension_attributes, + field_list, self.attributespec) + field_list[-2] = nodes.field( + '', nodes.field_name('', 'unkown'), + nodes.field_body('', nodes.paragraph('', 'unknown'))) + self.assertRaises(KeyError, utils.extract_extension_attributes, + field_list, self.attributespec) + + +class MiscFunctionTests(unittest.TestCase): + + names = [('a', 'a'), ('A', 'a'), ('A a A', 'a a a'), + ('A a A a', 'a a a a'), + (' AaA\n\r\naAa\tAaA\t\t', 'aaa aaa aaa')] + + def test_normname(self): + for input, output in self.names: + normed = utils.normname(input) + self.assertEquals(normed, output) + + ids = [('a', 'a'), ('A', 'a'), ('', ''), ('a b \n c', 'a-b-c'), + ('a.b.c', 'a-b-c'), (' - a - b - c - ', 'a-b-c'), (' - ', ''), + (u'\u2020\u2066', ''), (u'a \xa7 b \u2020 c', 'a-b-c'), + ('1', ''), ('1abc', 'abc')] + + def test_id(self): + for input, output in self.ids: + normed = utils.id(input) + self.assertEquals(normed, output) + + +if __name__ == '__main__': + unittest.main() diff --git a/tools/html.py b/tools/html.py new file mode 100755 index 000000000..66f029364 --- /dev/null +++ b/tools/html.py @@ -0,0 +1,31 @@ +#!/usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +A minimal front-end to the Docutils Publisher. + +This module takes advantage of the default values defined in `publish()`. +""" + +import sys +from docutils.core import publish +from docutils import utils + +reporter = utils.Reporter(2, 4) +#reporter.setconditions('nodes.Node.walkabout', 2, 4, debug=1) + +if len(sys.argv) == 2: + publish(writername='html', source=sys.argv[1], reporter=reporter) +elif len(sys.argv) == 3: + publish(writername='html', source=sys.argv[1], destination=sys.argv[2], + reporter=reporter) +elif len(sys.argv) > 3: + print >>sys.stderr, 'Maximum 2 arguments allowed.' + sys.exit(1) +else: + publish() diff --git a/tools/publish.py b/tools/publish.py new file mode 100755 index 000000000..539548911 --- /dev/null +++ b/tools/publish.py @@ -0,0 +1,27 @@ +#!/usr/bin/env python + +""" +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. + +A minimal front-end to the Docutils Publisher. + +This module takes advantage of the default values defined in `publish()`. +""" + +import sys +from docutils.core import publish + + +if len(sys.argv) == 2: + publish(source=sys.argv[1]) +elif len(sys.argv) == 3: + publish(source=sys.argv[1], destination=sys.argv[2]) +elif len(sys.argv) > 3: + print >>sys.stderr, 'Maximum 2 arguments allowed.' + sys.exit(1) +else: + publish() diff --git a/tools/quicktest.py b/tools/quicktest.py new file mode 100755 index 000000000..df295f66a --- /dev/null +++ b/tools/quicktest.py @@ -0,0 +1,185 @@ +#!/usr/bin/env python + +""" +:Author: Garth Kidd +:Contact: garth@deadlybloodyserious.com +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This module has been placed in the public domain. +""" + +import sys, os, getopt +import docutils.utils +from docutils.parsers.rst import Parser + + +usage_header = """\ +quicktest.py: quickly test the restructuredtext parser. + +Usage:: + + quicktest.py [options] [filename] + +``filename`` is the name of the file to use as input (default is stdin). + +Options: +""" + +options = [('pretty', 'p', + 'output pretty pseudo-xml: no "&abc;" entities (default)'), + ('test', 't', 'output test-ready data (input & expected output, ' + 'ready to be copied to a parser test module)'), + ('rawxml', 'r', 'output raw XML'), + ('styledxml=', 's', 'output raw XML with XSL style sheet reference ' + '(filename supplied in the option argument)'), + ('xml', 'x', 'output pretty XML (indented)'), + ('debug', 'd', 'debug mode (lots of output)'), + ('help', 'h', 'show help text')] +"""See distutils.fancy_getopt.FancyGetopt.__init__ for a description of the +data structure: (long option, short option, description).""" + +def usage(): + print usage_header + for longopt, shortopt, description in options: + if longopt[-1:] == '=': + opts = '-%s arg, --%sarg' % (shortopt, longopt) + else: + opts = '-%s, --%s' % (shortopt, longopt), + print '%-15s' % opts, + if len(opts) > 14: + print '%-16s' % '\n', + while len(description) > 60: + limit = description.rindex(' ', 0, 60) + print description[:limit].strip() + description = description[limit + 1:] + print '%-15s' % ' ', + print description + +def _pretty(input, document, optargs): + return document.pformat() + +def _rawxml(input, document, optargs): + return document.asdom().toxml() + +def _styledxml(input, document, optargs): + docnode = document.asdom().childNodes[0] + return '%s\n%s\n%s' % ( + '<?xml version="1.0" encoding="ISO-8859-1"?>', + '<?xml-stylesheet type="text/xsl" href="%s"?>' % optargs['styledxml'], + docnode.toxml()) + +def _prettyxml(input, document, optargs): + return document.asdom().toprettyxml(' ', '\n') + +def _test(input, document, optargs): + tq = '"""' + output = document.pformat() # same as _pretty() + return """\ + totest['change_this_test_name'] = [ +[%s\\ +%s +%s, +%s\\ +%s +%s], +] +""" % ( tq, escape(input.rstrip()), tq, tq, escape(output.rstrip()), tq ) + +def escape(text): + """ + Return `text` in a form compatible with triple-double-quoted Python strings. + """ + text = text.replace('\\', '\\\\') # escape backslashes + text = text.replace('"""', '""\\"') # break up triple-double-quotes + text = text.replace(' \n', ' \\n\\\n') # protect trailing whitespace + return text + +_outputFormatters = { + 'rawxml': _rawxml, + 'styledxml': _styledxml, + 'xml': _prettyxml, + 'pretty' : _pretty, + 'test': _test + } + +def format(outputFormat, input, document, optargs): + formatter = _outputFormatters[outputFormat] + return formatter(input, document, optargs) + +def getArgs(): + if os.name == 'mac' and len(sys.argv) <= 1: + return macGetArgs() + else: + return posixGetArgs(sys.argv[1:]) + +def posixGetArgs(argv): + outputFormat = 'pretty' + # convert fancy_getopt style option list to getopt.getopt() arguments + shortopts = ''.join([option[1] + ':' * (option[0][-1:] == '=') + for option in options if option[1]]) + longopts = [option[0] for option in options if option[0]] + try: + opts, args = getopt.getopt(argv, shortopts, longopts) + except getopt.GetoptError: + usage() + sys.exit(2) + optargs = {'debug': 0} + for o, a in opts: + if o in ['-h', '--help']: + usage() + sys.exit() + elif o in ['-r', '--rawxml']: + outputFormat = 'rawxml' + elif o in ['-s', '--styledxml']: + outputFormat = 'styledxml' + optargs['styledxml'] = a + elif o in ['-x', '--xml']: + outputFormat = 'xml' + elif o in ['-p', '--pretty']: + outputFormat = 'pretty' + elif o in ['-t', '--test']: + outputFormat = 'test' + elif o in ['-d', '--debug']: + optargs['debug'] = 1 + else: + raise getopt.GetoptError, "getopt should have saved us!" + if len(args) > 1: + print "Only one file at a time, thanks." + usage() + sys.exit(1) + if len(args) == 1: + inputFile = open(args[0]) + else: + inputFile = sys.stdin + return inputFile, outputFormat, optargs + +def macGetArgs(): + import EasyDialogs + EasyDialogs.Message("""\ +In the following window, please: + +1. Choose an output format from the "Option" list. +2. Click "Add" (if you don't, the default format will + be "pretty"). +3. Click "Add existing file..." and choose an input file. +4. Click "OK".""") + optionlist = [(longopt, description) + for (longopt, shortopt, description) in options] + argv = EasyDialogs.GetArgv(optionlist=optionlist, addnewfile=0, addfolder=0) + return posixGetArgs(argv) + +def main(): + inputFile, outputFormat, optargs = getArgs() # process cmdline arguments + parser = Parser() + input = inputFile.read() + document = docutils.utils.newdocument(debug=optargs['debug']) + parser.parse(input, document) + output = format(outputFormat, input, document, optargs) + print output, + + +if __name__ == '__main__': + sys.stderr = sys.stdout + main() diff --git a/tools/stylesheets/default.css b/tools/stylesheets/default.css new file mode 100644 index 000000000..2db991782 --- /dev/null +++ b/tools/stylesheets/default.css @@ -0,0 +1,157 @@ +/* +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:date: $Date$ +:version: $Revision$ +:copyright: This stylesheet has been placed in the public domain. + +Default cascading style sheet for the HTML output of Docutils. +*/ + +a.footnote-reference { + font-size: smaller ; + vertical-align: super } + +a.target { + color: blue } + +code { + background-color: #eeeeee } + +div.abstract { + margin: 2em 5em } + +div.abstract p.topic-title { + font-weight: bold ; + text-align: center } + +div.attention, div.caution, div.danger, div.error, div.hint, +div.important, div.note, div.tip, div.warning { + margin: 2em ; + border: medium outset ; + padding: 1em } + +div.attention p.admonition-title, div.caution p.admonition-title, +div.danger p.admonition-title, div.error p.admonition-title, +div.warning p.admonition-title { + color: red ; + font-weight: bold ; + font-family: sans-serif } + +div.hint p.admonition-title, div.important p.admonition-title, +div.note p.admonition-title, div.tip p.admonition-title { + font-weight: bold ; + font-family: sans-serif } + +div.field-body { + margin-bottom: 1em } + +div.field-list { + margin-bottom: -1em } + +div.figure { + margin-left: 2em } + +div.system-messages { + margin: 5em } + +div.system-messages h1 { + color: red } + +div.system-message { + border: medium outset ; + padding: 1em } + +div.system-message p.system-message-title { + color: red ; + font-weight: bold } + +div.topic { + margin: 2em } + +dt { + margin-bottom: -1em } + +h1.title { + text-align: center } + +h2.subtitle { + text-align: center } + +hr { + width: 75% } + +ol.arabic { + list-style: decimal } + +ol.loweralpha { + list-style: lower-alpha } + +ol.upperalpha { + list-style: upper-alpha } + +ol.lowerroman { + list-style: lower-roman } + +ol.upperroman { + list-style: upper-roman } + +p.caption { + font-style: italic } + +p.credits { + font-style: italic ; + font-size: smaller } + +p.docinfo-name { + font-weight: bold ; + text-align: right } + +p.field-name { + font-weight: bold ; + margin-bottom: 1em } + +p.label { + white-space: nowrap } + +p.topic-title { + font-weight: bold } + +pre.literal-block, pre.doctest-block { + margin-left: 2em ; + margin-right: 2em ; + background-color: #eeeeee } + +span.classifier { + font-family: sans-serif ; + font-style: oblique } + +span.classifier-delimiter { + font-family: sans-serif ; + font-weight: bold } + +span.field-argument { + font-style: italic } + +span.interpreted { + font-family: sans-serif } + +span.option-argument { + font-style: italic } + +span.problematic { + color: red } + +table { + margin-top: 1em } + +table.citation { + border-left: solid thin gray ; + padding-left: 0.5ex } + +table.docinfo { + margin: 2em 4em } + +table.footnote { + border-left: solid thin black ; + padding-left: 0.5ex } |