Drop the latex2e-special output_encoding default ("latin-1").

git-svn-id: http://svn.code.sf.net/p/docutils/code/trunk@6319 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
author: milde <milde@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> 2010-05-05 12:08:10 +0000
committer: milde <milde@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> 2010-05-05 12:08:10 +0000
commit: 87cfa9d99f38c846ed02308f8c33d607f71450ff (patch)
tree: 5d4ef766985a033e1e77d0a294344d19404f2df2
parent: 31da62a25c31e64de43216e4a52fc08e6f4a132e (diff)
download: docutils-87cfa9d99f38c846ed02308f8c33d607f71450ff.tar.gz
9 files changed, 602 insertions, 115 deletions
diff --git a/docutils/HISTORY.txt b/docutils/HISTORY.txt
index e484fb233..30b05feba 100644
--- a/docutils/HISTORY.txt
+++ b/docutils/HISTORY.txt
@@ -55,9 +55,12 @@ Changes Since 0.6
   - Fix hyperlink targets (labels) for images, figures, and tables.
   - Apply [ 2961988 ] Load babel after inputenc and fontenc.
   - Apply [ 2961991 ] Call hyperref with unicode option.
+  - Drop the special `output_encoding`_ default ("latin-1").
+    The Docutils wide default (usually "UTF-8") is used instead.
 
 __ docs/ref/restructuredtext.html#inline-literals
 __ docs/user/config.html#docutils-footnotes
+__ docs/user/config.html#output_encoding
 
 * docutils/writers/manpage.py
 
diff --git a/docutils/RELEASE-NOTES.txt b/docutils/RELEASE-NOTES.txt
index 4df71db05..5b79b4fb8 100644
--- a/docutils/RELEASE-NOTES.txt
+++ b/docutils/RELEASE-NOTES.txt
@@ -31,12 +31,15 @@ Components:
   - Deprecate ``figure_footnotes`` setting.
   - Rename ``use_latex_footnotes`` setting to `docutils_footnotes`__.
   - New ``latex_preamble`` setting.
-  - PDF standard fonts (Times/Helvetica/Courier) as default.
+  - Use PDF standard fonts (Times/Helvetica/Courier) as default.
   - `hyperref` package called with ``unicode`` option (see the
     `hyperref config tips`__ for how to override).
+  - Drop the special `output_encoding`__ default ("latin-1").
+    The Docutils wide default (usually "UTF-8") is used instead.
 
 __ docs/user/config.html#docutils-footnotes
 __ docs/user/latex.html#hyperlinks
+__ docs/user/latex.html#output-encoding
 
 General:
 
diff --git a/docutils/docs/user/latex.txt b/docutils/docs/user/latex.txt
index 6c2ee0971..8a47dd6c7 100644
--- a/docutils/docs/user/latex.txt
+++ b/docutils/docs/user/latex.txt
@@ -943,7 +943,7 @@ Example 1:
 
   This will improve the look on screen with the default Computer Modern
   fonts at the expense of problems with `search and text extraction`_
-  The recommended workaround is to select a T1-encoded "Type 1" (vector)
+  The recommended way is to select a T1-encoded "Type 1" (vector)
   font, for example `Latin Modern`_
 
 Example 2:
@@ -1580,30 +1580,33 @@ Example:
 text encoding
 -------------
 
-The encoding of the LaTeX source file, i.e.
-Docutils' *output* encoding becomes the LaTeX *input* encoding.
+The encoding of the LaTeX source file is Docutils' *output* encoding
+but LaTeX' *input* encoding.
 
 Option: output-encoding_
     ``--output-encoding=OUTPUT-ENCODING``
 
 Default:
-  latin1
+  "utf8"
 
-  .. TODO: use the Docutils-wide default: output-encoding = input-encoding
+Example:
+  Encode the LaTeX source file with the ISO `latin-1` (west european)
+  8-bit encoding (the default in Docutils versions up to 0.6.)::
 
+    --output-encoding=latin-1
 
-LaTeX comes with two packages for UTF-8 support,
+Note:
+  LaTeX comes with two packages for UTF-8 support,
 
-:utf8:  by the standard `inputenc`_ package with only limited coverage
-        (mainly accented chars, only few non-alphabetic symbols, no Greek or
-        Cyrillic).
+  :utf8:  by the standard `inputenc`_ package with only limited coverage
+          (mainly accented chars, no Greekv).
 
-:utf8x: supported by the `ucs`_ package covers a wider range of Unicode
-        characters than does "utf8".  It is, however, a non-standard
-        extension and no longer developed.
+  :utf8x: supported by the `ucs`_ package covers a wider range of Unicode
+          characters than does "utf8".  It is, however, a non-standard
+          extension and no longer developed.
 
-Currently (in version 0.6), "utf8" is used if the output-encoding is
-any of "utf_8", "U8", "UTF", or "utf8".
+  Currently (in version 0.6), "utf8" is used if the output-encoding is
+  any of "utf_8", "U8", "UTF", or "utf8".
 
 .. with utf8x:
    If LaTeX issues a Warning about unloaded/unknown characters adding ::
@@ -1795,6 +1798,23 @@ If updating LaTeX is not an option, just remove the "px" from the length
 specification. HTML/CSS will default to "px" while the `latexe2` writer
 will add the fallback unit "bp".
 
+Error ``Symbol \textcurrency not provided`` ...
+```````````````````````````````````````````````
+
+The currency sign (\\u00a4) is not supported by all fonts (some have
+an Euro sign at its place). You might see an error like::
+
+    ! Package textcomp Error: Symbol \textcurrency not provided by
+    (textcomp)                font family ptm in TS1 encoding.
+    (textcomp)                Default family used instead.
+
+(which in case of font family "ptm" is a false positive). Add either
+
+:warn: turn the error in a warning, use the default symbol (bitmap), or
+:force,almostfull: use the symbol provided by the font at the users
+		     risk,
+
+to the document options or use a different font package.
 
 Search and text extraction
 ``````````````````````````
@@ -1806,21 +1826,34 @@ umlauts) might fail.  See font_ and `font encoding`_ (as well as
 .. _Searching PDF files:
    http://www.tex.ac.uk/cgi-bin/texfaq2html?label=srchpdf
 
-Unicode box drawing characters
-```````````````````````````````
+Unicode box drawing and block characters
+````````````````````````````````````````
+
+- Generate LaTeX code with `output-encoding`_ "utf-8".
+
+- Add the pmboxdraw_ package to the `style sheets`_.
+  (For shaded boxes also add the `color` package.)
 
-  - generate LaTeX code with ``--output-encoding=utf-8:strict``.
+Unfortunately, this defines only a subset of the characters
+(see pmboxdraw.pdf_ for a list).
 
-  - In the latex file, edit the preamble to load "ucs" with "postscript"
-    option and also load the pstricks package::
+Alternatively:
+
+- In the latex file, edit the preamble to load ucs_ with "postscript"
+  option and also load the pstricks package::
 
       - \usepackage[utf8]{inputenc}
       + \usepackage[postscript]{ucs}
       + \usepackage{pstricks}
       + \usepackage[utf8x]{inputenc}
 
-  - Convert to PDF with ``latex``, ``dvips``, and ``ps2pdf``.
+- Convert to PDF with ``latex``, ``dvips``, and ``ps2pdf``.
+
+.. _pmboxdraw:
+   http://www.ctan.org/tex-archive/help/Catalogue/entries/pmboxdraw.html
 
+.. _pmboxdraw.pdf:
+   http://www.ctan.org/tex-archive/macros/latex/contrib/oberdiek/pmboxdraw.pdf
 
 Bugs and open issues
 --------------------
diff --git a/docutils/docutils/writers/latex2e/__init__.py b/docutils/docutils/writers/latex2e/__init__.py
index e6a0e233a..ab9569715 100644
--- a/docutils/docutils/writers/latex2e/__init__.py
+++ b/docutils/docutils/writers/latex2e/__init__.py
@@ -17,8 +17,8 @@ import os
 import time
 import re
 import string
-from docutils import frontend, nodes, languages, writers, utils, transforms, io
-from docutils.writers.newlatex2e import unicode_map
+from docutils import frontend, nodes, languages, writers, utils, io
+from docutils.transforms import writer_aux
 
 # compatibility module for Python <= 2.4
 if not hasattr(string, 'Template'):
@@ -39,7 +39,7 @@ class Writer(writers.Writer):
                                   r'\usepackage{courier}'])
     settings_spec = (
         'LaTeX-Specific Options',
-        'The LaTeX "--output-encoding" default is "latin-1:strict".',
+        None,
         (('Specify documentclass.  Default is "article".',
           ['--documentclass'],
           {'default': 'article', }),
@@ -198,10 +198,8 @@ class Writer(writers.Writer):
           {'default': None, }),
           ),)
 
-    settings_defaults = {'output_encoding': 'latin-1',
-                         'sectnum_depth': 0 # updated by SectNum transform
+    settings_defaults = {'sectnum_depth': 0 # updated by SectNum transform
                         }
-
     relative_path_settings = ('stylesheet_path',)
 
     config_section = 'latex2e writer'
@@ -225,7 +223,7 @@ class Writer(writers.Writer):
        transform_list = writers.Writer.get_transforms(self)
        # print transform_list
        # Convert specific admonitions to generic one
-       transform_list.append(transforms.writer_aux.Admonitions)
+       transform_list.append(writer_aux.Admonitions)
        # TODO: footnote collection transform
        # transform_list.append(footnotes.collect)
        return transform_list
@@ -355,7 +353,7 @@ class SortableDict(dict):
     """Dictionary with additional sorting methods
 
     Tip: use key starting with with '_' for sorting before small letters
-    	 and with '~' for sorting after small letters.
+         and with '~' for sorting after small letters.
     """
     def sortedkeys(self):
         """Return sorted list of keys"""
@@ -559,6 +557,11 @@ PreambleCmds.table = r"""\usepackage{longtable}
 \setlength{\extrarowheight}{2pt}
 \newlength{\DUtablewidth} % internal use in tables"""
 
+# Options [force,almostfull] prevent spurious error messages, see
+# de.comp.text.tex/2005-12/msg01855
+PreambleCmds.textcomp = """\
+\\usepackage{textcomp} % text symbol macros"""
+
 PreambleCmds.documenttitle = r"""
 %% Document title
 \title{%s}
@@ -635,9 +638,10 @@ class Table(object):
 
     Table style might be
 
-    :standard: horizontal and vertical lines
-    :booktabs: only horizontal lines (requires "booktabs" LaTeX package)
-    :nolines: (or borderless) no lines
+    :standard:   horizontal and vertical lines
+    :booktabs:   only horizontal lines (requires "booktabs" LaTeX package)
+    :borderless: no borders around table cells
+    :nolines:    alias for borderless
     """
     def __init__(self,translator,latex_type,table_style):
         self._translator = translator
@@ -936,7 +940,7 @@ class LaTeXTranslator(nodes.NodeVisitor):
             self.docutils_footnotes = True
             self.warn('`use_latex_footnotes` is deprecated. '
                       'The setting has been renamed to `docutils_footnotes` '
-		      'and the alias will be removed in a future version.')
+                      'and the alias will be removed in a future version.')
         self.figure_footnotes = settings.figure_footnotes
         if self.figure_footnotes:
             self.docutils_footnotes = True
@@ -1008,19 +1012,23 @@ class LaTeXTranslator(nodes.NodeVisitor):
         # Process settings
         # ~~~~~~~~~~~~~~~~
 
-        # persistent requirements
-        if self.font_encoding == '':
-            fontenc_header = r'%\usepackage[OT1]{fontenc}'
+        # Static requirements
+        # TeX font encoding
+        if self.font_encoding:
+            encodings = [r'\usepackage[%s]{fontenc}' % self.font_encoding]
         else:
-            fontenc_header = r'\usepackage[%s]{fontenc}' % self.font_encoding
-        self.requirements['_persistent'] = '\n'.join([
-              fontenc_header,
-              r'\usepackage[%s]{inputenc}' % self.latex_encoding,
+            encodings = [r'%\usepackage[OT1]{fontenc}'] # just a comment
+        # Docutils' output-encoding => TeX input encoding:
+        if self.latex_encoding != 'ascii':
+            encodings.append(r'\usepackage[%s]{inputenc}'
+                             % self.latex_encoding)
+        self.requirements['_static'] = '\n'.join(
+              encodings + [
               r'\usepackage{ifthen}',
-              # multi-language support (language is in document settings)
+              # multi-language support (language is in document options)
               '\\usepackage{babel}%s' % self.babel.setup,
               ])
-        # page layout with typearea (if there are relevant document options).
+        # page layout with typearea (if there are relevant document options)
         if (settings.documentclass.find('scr') == -1 and
             (self.d_options.find('DIV') != -1 or
              self.d_options.find('BCOR') != -1)):
@@ -1096,7 +1104,6 @@ class LaTeXTranslator(nodes.NodeVisitor):
         """Translate docutils encoding name into LaTeX's.
 
         Default method is remove "-" and "_" chars from docutils_encoding.
-
         """
         tr = {  'iso-8859-1': 'latin1',     # west european
                 'iso-8859-2': 'latin2',     # east european
@@ -1189,8 +1196,9 @@ class LaTeXTranslator(nodes.NodeVisitor):
         # Unicode chars that are not recognized by LaTeX's utf8 encoding
         unsupported_unicode_chars = {
             0x00A0: ur'~', # NO-BREAK SPACE
-	    0x00AD: ur'\-', # SOFT HYPHEN
-	    0x2011: ur'\hbox{-}', # NON-BREAKING HYPHEN
+            0x00AD: ur'\-', # SOFT HYPHEN
+            #
+            0x2011: ur'\hbox{-}', # NON-BREAKING HYPHEN
             0x21d4: ur'$\Leftrightarrow$',
             # Docutils footnote symbols:
             0x2660: ur'$\spadesuit$',
@@ -1216,10 +1224,87 @@ class LaTeXTranslator(nodes.NodeVisitor):
             0x2665: ur'\ding{170}',     # black heartsuit
             0x2666: ur'\ding{169}',     # black diamondsuit
         }
-        # TODO: replacements using textcomp
-        ## textcomp_chars = {
-	##     0x00B5: ur'\textmu{}', # MICRO SIGN
-	## }
+        # recognized with 'utf8', if textcomp is loaded
+        textcomp_chars = {
+            # Latin-1 Supplement
+            0x00a2: ur'\textcent{}',          # ¢ CENT SIGN
+            0x00a4: ur'\textcurrency{}',      # ¤ CURRENCY SYMBOL
+            0x00a5: ur'\textyen{}',           # ¥ YEN SIGN
+            0x00a6: ur'\textbrokenbar{}',     # ¦ BROKEN BAR
+            0x00a7: ur'\textsection{}',       # § SECTION SIGN
+            0x00a8: ur'\textasciidieresis{}', # ¨ DIAERESIS
+            0x00a9: ur'\textcopyright{}',     # © COPYRIGHT SIGN
+            0x00aa: ur'\textordfeminine{}',   # ª FEMININE ORDINAL INDICATOR
+            0x00ac: ur'\textlnot{}',          # ¬ NOT SIGN
+            0x00ae: ur'\textregistered{}',    # ® REGISTERED SIGN
+            0x00af: ur'\textasciimacron{}',   # ¯ MACRON
+            0x00b0: ur'\textdegree{}',        # ° DEGREE SIGN
+            0x00b1: ur'\textpm{}',            # ± PLUS-MINUS SIGN
+            0x00b2: ur'\texttwosuperior{}',   # ² SUPERSCRIPT TWO
+            0x00b3: ur'\textthreesuperior{}', # ³ SUPERSCRIPT THREE
+            0x00b4: ur'\textasciiacute{}',    # ´ ACUTE ACCENT
+            0x00b5: ur'\textmu{}',            # µ MICRO SIGN
+            0x00b6: ur'\textparagraph{}',     # ¶ PILCROW SIGN # not equal to \textpilcrow
+            0x00b9: ur'\textonesuperior{}',   # ¹ SUPERSCRIPT ONE
+            0x00ba: ur'\textordmasculine{}',  # º MASCULINE ORDINAL INDICATOR
+            0x00bc: ur'\textonequarter{}',    # 1/4 FRACTION
+            0x00bd: ur'\textonehalf{}',       # 1/2 FRACTION
+            0x00be: ur'\textthreequarters{}', # 3/4 FRACTION
+            0x00d7: ur'\texttimes{}',         # × MULTIPLICATION SIGN
+            0x00f7: ur'\textdiv{}',           # ÷ DIVISION SIGN
+            #
+            0x0192: ur'\textflorin{}',        # LATIN SMALL LETTER F WITH HOOK
+            0x02b9: ur'\textasciiacute{}',    # MODIFIER LETTER PRIME
+            0x02ba: ur'\textacutedbl{}',      # MODIFIER LETTER DOUBLE PRIME
+            0x2016: ur'\textbardbl{}',        # DOUBLE VERTICAL LINE
+            0x2022: ur'\textbullet{}',        # BULLET
+            0x2030: ur'\textperthousand{}',   # PER MILLE SIGN
+            0x2031: ur'\textpertenthousand{}', # PER TEN THOUSAND SIGN
+            0x2032: ur'\textasciiacute{}',    # PRIME
+            0x2033: ur'\textacutedbl{}',      # DOUBLE PRIME
+            0x2035: ur'\textasciigrave{}',    # REVERSED PRIME
+            0x2036: ur'\textgravedbl{}',      # REVERSED DOUBLE PRIME
+            0x203b: ur'\textreferencemark{}', # REFERENCE MARK
+            0x203d: ur'\textinterrobang{}',   # INTERROBANG
+            0x2044: ur'\textfractionsolidus{}', # FRACTION SLASH
+            0x2045: ur'\textlquill{}',        # LEFT SQUARE BRACKET WITH QUILL
+            0x2046: ur'\textrquill{}',        # RIGHT SQUARE BRACKET WITH QUILL
+            0x2052: ur'\textdiscount{}',      # COMMERCIAL MINUS SIGN
+            0x20a1: ur'\textcolonmonetary{}', # COLON SIGN
+            0x20a3: ur'\textfrenchfranc{}',   # FRENCH FRANC SIGN
+            0x20a4: ur'\textlira{}',          # LIRA SIGN
+            0x20a6: ur'\textnaira{}',         # NAIRA SIGN
+            0x20a9: ur'\textwon{}',           # WON SIGN
+            0x20ab: ur'\textdong{}',          # DONG SIGN
+            0x20ac: ur'\texteuro{}',          # EURO SIGN
+            0x20b1: ur'\textpeso{}',          # PESO SIGN
+            0x20b2: ur'\textguarani{}',       # GUARANI SIGN
+            0x2103: ur'\textcelsius{}',       # DEGREE CELSIUS
+            0x2116: ur'\textnumero{}',        # NUMERO SIGN
+            0x2117: ur'\textcircledP{}',      # SOUND RECORDING COYRIGHT
+            0x211e: ur'\textrecipe{}',        # PRESCRIPTION TAKE
+            0x2120: ur'\textservicemark{}',   # SERVICE MARK
+            0x2122: ur'\texttrademark{}',     # TRADE MARK SIGN
+            0x2126: ur'\textohm{}',           # OHM SIGN
+            0x2127: ur'\textmho{}',           # INVERTED OHM SIGN
+            0x212e: ur'\textestimated{}',     # ESTIMATED SYMBOL
+            0x2190: ur'\textleftarrow{}',     # LEFTWARDS ARROW
+            0x2191: ur'\textuparrow{}',       # UPWARDS ARROW
+            0x2192: ur'\textrightarrow{}',    # RIGHTWARDS ARROW
+            0x2193: ur'\textdownarrow{}',     # DOWNWARDS ARROW
+            0x2212: ur'\textminus{}',         # MINUS SIGN
+            0x2217: ur'\textasteriskcentered{}', # ASTERISK OPERATOR
+            0x221a: ur'\textsurd{}',          # SQUARE ROOT
+            0x2422: ur'\textblank{}',         # BLANK SYMBOL
+            0x2423: ur'\textvisiblespace{}',  # OPEN BOX
+            0x25e6: ur'\textopenbullet{}',    # WHITE BULLET
+            0x25ef: ur'\textbigcircle{}',     # LARGE CIRCLE
+            0x266a: ur'\textmusicalnote{}',   # EIGHTH NOTE
+            0x26ad: ur'\textmarried{}',       # MARRIAGE SYMBOL
+            0x26ae: ur'\textdivorced{}',      # DIVORCE SYMBOL
+            0x27e8: ur'\textlangle{}',        # MATHEMATICAL LEFT ANGLE BRACKET
+            0x27e9: ur'\textrangle{}',        # MATHEMATICAL RIGHT ANGLE BRACKET
+        }
         # TODO: greek alphabet ... ?
         # see also LaTeX codec
         # http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/252124
@@ -1255,12 +1340,16 @@ class LaTeXTranslator(nodes.NodeVisitor):
             text = self.babel.quote_quotes(text)
         # Unicode chars:
         table.update(unsupported_unicode_chars)
+        table.update(pifont_chars)
         if not self.latex_encoding.startswith('utf8'):
             table.update(unicode_chars)
-        # Unicode chars that require a feature/package to render
-        if [ch for ch in pifont_chars.keys() if unichr(ch) in text]:
-            self.requirements['pifont'] = '\\usepackage{pifont}'
-            table.update(pifont_chars)
+            table.update(textcomp_chars)
+        # Characters that require a feature/package to render
+        for ch in text:
+            if ord(ch) in pifont_chars:
+                self.requirements['pifont'] = '\\usepackage{pifont}'
+            if ord(ch) in textcomp_chars:
+                self.requirements['textcomp'] = PreambleCmds.textcomp
 
         text = text.translate(table)
 
@@ -2280,6 +2369,7 @@ class LaTeXTranslator(nodes.NodeVisitor):
             self.depart_inline(node)
 
     def has_unbalanced_braces(self, string):
+        """Test whether there are unmatched '{' or '}' characters."""
         level = 0
         for ch in string:
             if ch == '{':
@@ -2303,7 +2393,7 @@ class LaTeXTranslator(nodes.NodeVisitor):
             if href.find('^^') != -1 or self.has_unbalanced_braces(href):
                 self.error(
                     'External link "%s" not supported by LaTeX.\n'
-		    ' (Must not contain "^^" or unbalanced braces.)' % href)
+                    ' (Must not contain "^^" or unbalanced braces.)' % href)
             if node['refuri'] == node.astext():
                 self.out.append(r'\url{%s}' % href)
                 raise nodes.SkipNode
@@ -2587,7 +2677,7 @@ class LaTeXTranslator(nodes.NodeVisitor):
         if isinstance(node.parent, nodes.table):
             self.pop_output_collector()
 
-    def minitoc(self, title, depth):
+    def minitoc(self, node, title, depth):
         """Generate a local table of contents with LaTeX package minitoc"""
         section_name = self.d_class.section(self.section_level)
         # name-prefix for current section level
@@ -2598,7 +2688,8 @@ class LaTeXTranslator(nodes.NodeVisitor):
             minitoc_name = minitoc_names[section_name]
         except KeyError: # minitoc only supports part- and toplevel
             self.warn('Skipping local ToC at %s level.\n' % section_name +
-                      '  Feature not supported with option "use-latex-toc"')
+                      '  Feature not supported with option "use-latex-toc"',
+                      base_node=node)
             return
         # Requirements/Setup
         self.requirements['minitoc'] = PreambleCmds.minitoc
@@ -2640,7 +2731,7 @@ class LaTeXTranslator(nodes.NodeVisitor):
                     title = self.encode(node.pop(0).astext())
                 depth = node.get('depth', 0)
                 if 'local' in node['classes']:
-                    self.minitoc(title, depth)
+                    self.minitoc(title, node, depth)
                     self.context.append('')
                     return
                 if depth:
diff --git a/docutils/test/functional/expected/latex_docinfo.tex b/docutils/test/functional/expected/latex_docinfo.tex
index 880589b0b..716eb5065 100644
--- a/docutils/test/functional/expected/latex_docinfo.tex
+++ b/docutils/test/functional/expected/latex_docinfo.tex
@@ -3,7 +3,7 @@
 \usepackage{fixltx2e} % LaTeX patches, \textsubscript
 \usepackage{cmap} % fix search and cut-and-paste in PDF
 \usepackage[T1]{fontenc}
-\usepackage[latin1]{inputenc}
+\usepackage[utf8]{inputenc}
 \usepackage{ifthen}
 \usepackage{babel}
 
diff --git a/docutils/test/functional/expected/standalone_rst_latex.tex b/docutils/test/functional/expected/standalone_rst_latex.tex
index d91e4eade..50c425628 100644
--- a/docutils/test/functional/expected/standalone_rst_latex.tex
+++ b/docutils/test/functional/expected/standalone_rst_latex.tex
@@ -3,7 +3,7 @@
 \usepackage{fixltx2e} % LaTeX patches, \textsubscript
 \usepackage{cmap} % fix search and cut-and-paste in PDF
 \usepackage[T1]{fontenc}
-\usepackage[latin1]{inputenc}
+\usepackage[utf8]{inputenc}
 \usepackage{ifthen}
 \usepackage{babel}
 \usepackage{color}
@@ -11,11 +11,13 @@
 \floatplacement{figure}{H} % place figures here definitely
 \usepackage{graphicx}
 \usepackage{multirow}
+\usepackage{pifont}
 \usepackage{longtable}
 \usepackage{array}
 \setlength{\extrarowheight}{2pt}
 \newlength{\DUtablewidth} % internal use in tables
 \usepackage{tabularx}
+\usepackage{textcomp} % text symbol macros
 
 %%% Custom LaTeX preamble
 % PDF Standard Fonts
@@ -808,10 +810,10 @@ And this is the third paragraph.
 %
 \DUfootnotetext{id13}{id4}{*}{%
 Footnotes may also use symbols, specified with a ``*'' label.
-Here's a reference to the next footnote:\DUfootnotemark{id14}{id15}{\dag{}}.
+Here's a reference to the next footnote:\DUfootnotemark{id14}{id15}{†}.
 }
 %
-\DUfootnotetext{id15}{id14}{\dag{}}{%
+\DUfootnotetext{id15}{id14}{†}{%
 This footnote shows the next symbol in the sequence.
 }
 %
@@ -1346,7 +1348,7 @@ Here's one:
 % 
 % Double-dashes -- "--" -- must be escaped somehow in HTML output.
 % 
-% Comments may contain non-ASCII characters: � � � � � �
+% Comments may contain non-ASCII characters: ä ö ü æ ø å
 
 (View the HTML source to see the comment.)
 
@@ -1679,108 +1681,110 @@ width as the third line.
 
 %___________________________________________________________________________
 
-\subsection*{3.4~~~Various non-ASCII characters%
+\subsection*{3.4~~~Non-ASCII characters%
   \phantomsection%
-  \addcontentsline{toc}{subsection}{3.4~~~Various non-ASCII characters}%
-  \label{various-non-ascii-characters}%
+  \addcontentsline{toc}{subsection}{3.4~~~Non-ASCII characters}%
+  \label{non-ascii-characters}%
 }
 
+Punctuation and footnote symbols
+
 \leavevmode
 \setlength{\DUtablewidth}{\linewidth}
 \begin{longtable}[c]{|p{0.028\DUtablewidth}|p{0.424\DUtablewidth}|}
 \hline
 
-�
+–
  & 
-copyright sign
+en-dash
  \\
 \hline
 
-�
+—
  & 
-registered sign
+em-dash
  \\
 \hline
 
-�
+‘
  & 
-left pointing guillemet
+single turned comma quotation mark
  \\
 \hline
 
-�
+’
  & 
-right pointing guillemet
+single comma quotation mark
  \\
 \hline
 
-\textendash{}
+‚
  & 
-en-dash
+low single comma quotation mark
  \\
 \hline
 
-\textemdash{}
+“
  & 
-em-dash
+double turned comma quotation mark
  \\
 \hline
 
-`
+”
  & 
-single turned comma quotation mark
+double comma quotation mark
  \\
 \hline
 
-'
+„
  & 
-single comma quotation mark
+low double comma quotation mark
  \\
 \hline
 
-\quotesinglbase{}
+†
  & 
-low single comma quotation mark
+dagger
  \\
 \hline
 
-\textquotedblleft{}
+‡
  & 
-double turned comma quotation mark
+double dagger
  \\
 \hline
 
-\textquotedblright{}
+\ding{169}
  & 
-double comma quotation mark
+black diamond suit
  \\
 \hline
 
-\quotedblbase
+\ding{170}
  & 
-low double comma quotation mark
+black heart suit
  \\
 \hline
 
-\dag{}
+$\spadesuit$
  & 
-dagger
+black spade suit
  \\
 \hline
 
-\ddag{}
+$\clubsuit$
  & 
-double dagger
+black club suit
  \\
 \hline
 
-\dots{}
+…
  & 
 ellipsis
  \\
 \hline
 
-\texttrademark{}
+™
  & 
 trade mark sign
  \\
@@ -1793,11 +1797,307 @@ left-right double arrow
 \hline
 \end{longtable}
 
-The following line should not be wrapped, because it uses
-non-breakable spaces:
+The \DUroletitlereference{Latin-1 extended} Unicode block
+
+\leavevmode
+\setlength{\DUtablewidth}{\linewidth}
+\begin{longtable}[c]{|p{0.051\DUtablewidth}|p{0.028\DUtablewidth}|p{0.028\DUtablewidth}|p{0.028\DUtablewidth}|p{0.028\DUtablewidth}|p{0.028\DUtablewidth}|p{0.028\DUtablewidth}|p{0.028\DUtablewidth}|p{0.028\DUtablewidth}|p{0.028\DUtablewidth}|p{0.028\DUtablewidth}|}
+\hline
+
+% 
+ & 
+0
+ & 
+1
+ & 
+2
+ & 
+3
+ & 
+4
+ & 
+5
+ & 
+6
+ & 
+7
+ & 
+8
+ & 
+9
+ \\
+\hline
+
+160
+ &  & 
+¡
+ & 
+¢
+ & 
+£
+ &  & 
+¥
+ & 
+¦
+ & 
+§
+ & 
+¨
+ & 
+©
+ \\
+\hline
+
+170
+ & 
+ª
+ & 
+«
+ & 
+¬
+ & 
+\-
+ & 
+®
+ & 
+¯
+ & 
+°
+ & 
+±
+ & 
+²
+ & 
+³
+ \\
+\hline
+
+180
+ & 
+´
+ & 
+µ
+ & 
+¶
+ & 
+·
+ & 
+¸
+ & 
+¹
+ & 
+º
+ & 
+»
+ & 
+¼
+ & 
+½
+ \\
+\hline
+
+190
+ & 
+¾
+ & 
+¿
+ & 
+À
+ & 
+Á
+ & 
+Â
+ & 
+Ã
+ & 
+Ä
+ & 
+Å
+ & 
+Æ
+ & 
+Ç
+ \\
+\hline
+
+200
+ & 
+È
+ & 
+É
+ & 
+Ê
+ & 
+Ë
+ & 
+Ì
+ & 
+Í
+ & 
+Î
+ & 
+Ï
+ & 
+Ð
+ & 
+Ñ
+ \\
+\hline
+
+210
+ & 
+Ò
+ & 
+Ó
+ & 
+Ô
+ & 
+Õ
+ & 
+Ö
+ & 
+×
+ & 
+Ø
+ & 
+Ù
+ & 
+Ú
+ & 
+Û
+ \\
+\hline
+
+220
+ & 
+Ü
+ & 
+Ý
+ & 
+Þ
+ & 
+ß
+ & 
+à
+ & 
+á
+ & 
+â
+ & 
+ã
+ & 
+ä
+ & 
+å
+ \\
+\hline
+
+230
+ & 
+æ
+ & 
+ç
+ & 
+è
+ & 
+é
+ & 
+ê
+ & 
+ë
+ & 
+ì
+ & 
+í
+ & 
+î
+ & 
+ï
+ \\
+\hline
+
+240
+ & 
+ð
+ & 
+ñ
+ & 
+ò
+ & 
+ó
+ & 
+ô
+ & 
+õ
+ & 
+ö
+ & 
+÷
+ & 
+ø
+ & 
+ù
+ \\
+\hline
+
+250
+ & 
+ú
+ & 
+û
+ & 
+ü
+ & 
+ý
+ & 
+þ
+ & 
+ÿ
+ &  &  &  &  \\
+\hline
+\end{longtable}
+%
+\begin{itemize}
+
+\item The following line should not be wrapped, because it uses
+no-break spaces (\textbackslash{}u00a0):
 
 X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X~X
 
+\item Line wrapping with/without breakpoints marked by soft hyphens
+(\textbackslash{}u00ad):
+
+pdn\-derd\-mdtd\-ri\-schpdn\-derd\-mdtd\-ri\-schpdn\-derd\-mdtd\-ri\-schpdn\-derd\-mdtd\-ri\-schpdn\-derd\-mdtd\-ri\-sch
+
+pdnderdmdtdrischpdnderdmdtdrischpdnderdmdtdrischpdnderdmdtdrischpdnderdmdtdrisch
+
+\item The currency sign (\textbackslash{}u00a4) is not supported by all fonts
+(some have an Euro sign at its place). You might see an error
+like:
+%
+\begin{quote}{\ttfamily \raggedright \noindent
+!~Package~textcomp~Error:~Symbol~\textbackslash{}textcurrency~not~provided~by\\
+(textcomp)~~~~~~~~~~~~~~~~font~family~ptm~in~TS1~encoding.\\
+(textcomp)~~~~~~~~~~~~~~~~Default~family~used~instead.
+}
+\end{quote}
+
+(which in case of font family ptm is a false positive). Add either
+%
+\begin{DUfieldlist}
+\item[{warn:}]
+turn the error in a warning, use the default symbol (bitmap), or
+
+\item[{force,almostfull:}]
+use the symbol provided by the font at the users
+risk,
+
+\end{DUfieldlist}
+
+to the document options or use a different font package.
+
+\end{itemize}
+
 
 %___________________________________________________________________________
 
@@ -1816,7 +2116,7 @@ The following characters play a special role in LaTeX and are called
 %
 \begin{quote}
 
-\# \$ \% \& \textasciitilde{} \_ \textasciicircum{} \{ \}
+\# \$ \% \& \textasciitilde{} \_ \textasciicircum{} \textbackslash{} \{ \}
 
 \end{quote}
 
diff --git a/docutils/test/functional/input/data/latex_encoding.txt b/docutils/test/functional/input/data/latex_encoding.txt
index 6d5cc0b9e..1405a5ca7 100644
--- a/docutils/test/functional/input/data/latex_encoding.txt
+++ b/docutils/test/functional/input/data/latex_encoding.txt
@@ -6,7 +6,7 @@ The LaTeX Info pages lists under "2.18 Special Characters"
   The following characters play a special role in LaTeX and are called
   "special printing characters", or simply "special characters".
 
-                            # $ % & ~ _ ^ \ { }
+                            # $ % & ~ _ ^ \\ { }
 
 The special chars verbatim::
 
diff --git a/docutils/test/functional/input/data/unicode.txt b/docutils/test/functional/input/data/unicode.txt
index 4bdd57653..bed6a8c08 100644
--- a/docutils/test/functional/input/data/unicode.txt
+++ b/docutils/test/functional/input/data/unicode.txt
@@ -1,27 +1,70 @@
-Various non-ASCII characters
-----------------------------
+Non-ASCII characters
+--------------------
+
+Punctuation and footnote symbols
 
 = ===================================
-© copyright sign
-® registered sign
-« left pointing guillemet
-» right pointing guillemet
 – en-dash
 — em-dash
-‘ single turned comma quotation mark 
-’ single comma quotation mark 
-‚ low single comma quotation mark 
-“ double turned comma quotation mark 
-” double comma quotation mark 
-„ low double comma quotation mark 
+‘ single turned comma quotation mark
+’ single comma quotation mark
+‚ low single comma quotation mark
+“ double turned comma quotation mark
+” double comma quotation mark
+„ low double comma quotation mark
 † dagger
 ‡ double dagger
+♦ black diamond suit
+♥ black heart suit
+♠ black spade suit
+♣ black club suit
 … ellipsis
 ™ trade mark sign
 ⇔ left-right double arrow
 = ===================================
 
-The following line should not be wrapped, because it uses
-non-breakable spaces:
 
-X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
+The `Latin-1 extended` Unicode block
+
+===  =  =  =  =  =  =  =  =  =  =
+ ..  0  1  2  3  4  5  6  7  8  9
+---  -  -  -  -  -  -  -  -  -  -
+160     ¡  ¢  £     ¥  ¦  §  ¨  ©
+170  ª  «  ¬    ®  ¯  °  ±  ²  ³
+180  ´  µ  ¶  ·  ¸  ¹  º  »  ¼  ½
+190  ¾  ¿  À  Á  Â  Ã  Ä  Å  Æ  Ç
+200  È  É  Ê  Ë  Ì  Í  Î  Ï  Ð  Ñ
+210  Ò  Ó  Ô  Õ  Ö  ×  Ø  Ù  Ú  Û
+220  Ü  Ý  Þ  ß  à  á  â  ã  ä  å
+230  æ  ç  è  é  ê  ë  ì  í  î  ï
+240  ð  ñ  ò  ó  ô  õ  ö  ÷  ø  ù
+250  ú  û  ü  ý  þ  ÿ
+===  =  =  =  =  =  =  =  =  =  =
+
+* The following line should not be wrapped, because it uses
+  no-break spaces (\\u00a0):
+
+  X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
+
+* Line wrapping with/without breakpoints marked by soft hyphens
+  (\\u00ad):
+
+  pdnderdmdtdrischpdnderdmdtdrischpdnderdmdtdrischpdnderdmdtdrischpdnderdmdtdrisch
+
+  pdnderdmdtdrischpdnderdmdtdrischpdnderdmdtdrischpdnderdmdtdrischpdnderdmdtdrisch
+
+* The currency sign (\\u00a4) is not supported by all fonts
+  (some have an Euro sign at its place). You might see an error
+  like::
+
+    ! Package textcomp Error: Symbol \textcurrency not provided by
+    (textcomp)                font family ptm in TS1 encoding.
+    (textcomp)                Default family used instead.
+
+  (which in case of font family ptm is a false positive). Add either
+
+  :warn: turn the error in a warning, use the default symbol (bitmap), or
+  :force,almostfull: use the symbol provided by the font at the users
+  		     risk,
+
+  to the document options or use a different font package.
diff --git a/docutils/test/test_writers/test_latex2e.py b/docutils/test/test_writers/test_latex2e.py
index 60e939194..cc2302bd3 100755
--- a/docutils/test/test_writers/test_latex2e.py
+++ b/docutils/test/test_writers/test_latex2e.py
@@ -1,3 +1,4 @@
+# -*- coding: utf8 -*-
 #! /usr/bin/env python
 
 # $Id$
@@ -50,7 +51,7 @@ parts = dict(
 head_prefix = r"""\documentclass[a4paper,english]{article}
 """,
 requirements = r"""\usepackage[T1]{fontenc}
-\usepackage[latin1]{inputenc}
+\usepackage[utf8]{inputenc}
 \usepackage{ifthen}
 \usepackage{babel}
 """,
@@ -79,6 +80,10 @@ r"""\usepackage{longtable}
 \newlength{\DUtablewidth} % internal use in tables
 """))
 
+head_textcomp = head_template.substitute(
+    dict(parts, requirements = parts['requirements'] +
+r"""\usepackage{textcomp} % text symbol macros
+"""))
 
 totest = {}
 totest_latex_toc = {}
@@ -96,6 +101,15 @@ head + r"""
 """],
 ]
 
+totest['textcomp'] = [
+["2 µm is just 2/1000000 m",
+head_textcomp + r"""
+2 µm is just 2/1000000 m
+
+\end{document}
+"""],
+]
+
 totest['table_of_contents'] = [
 # input
 ["""\
author	milde <milde@929543f6-e4f2-0310-98a6-ba3bd3dd1d04>	2010-05-05 12:08:10 +0000
committer	milde <milde@929543f6-e4f2-0310-98a6-ba3bd3dd1d04>	2010-05-05 12:08:10 +0000
commit	87cfa9d99f38c846ed02308f8c33d607f71450ff (patch)
tree	5d4ef766985a033e1e77d0a294344d19404f2df2
parent	31da62a25c31e64de43216e4a52fc08e6f4a132e (diff)
download	docutils-87cfa9d99f38c846ed02308f8c33d607f71450ff.tar.gz