diff options
Diffstat (limited to 'Doc/library/codecs.rst')
-rw-r--r-- | Doc/library/codecs.rst | 49 |
1 files changed, 37 insertions, 12 deletions
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst index f3017da2c8..7438aec3d8 100644 --- a/Doc/library/codecs.rst +++ b/Doc/library/codecs.rst @@ -7,6 +7,7 @@ .. sectionauthor:: Marc-André Lemburg <mal@lemburg.com> .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> +**Source code:** :source:`Lib/codecs.py` .. index:: single: Unicode @@ -29,10 +30,9 @@ module features are restricted to use specifically with The module defines the following functions for encoding and decoding with any codec: -.. function:: encode(obj, [encoding[, errors]]) +.. function:: encode(obj, encoding='utf-8', errors='strict') - Encodes *obj* using the codec registered for *encoding*. The default - encoding is ``utf-8``. + Encodes *obj* using the codec registered for *encoding*. *Errors* may be given to set the desired error handling scheme. The default error handler is ``'strict'`` meaning that encoding errors raise @@ -40,10 +40,9 @@ any codec: :exc:`UnicodeEncodeError`). Refer to :ref:`codec-base-classes` for more information on codec error handling. -.. function:: decode(obj, [encoding[, errors]]) +.. function:: decode(obj, encoding='utf-8', errors='strict') - Decodes *obj* using the codec registered for *encoding*. The default - encoding is ``utf-8``. + Decodes *obj* using the codec registered for *encoding*. *Errors* may be given to set the desired error handling scheme. The default error handler is ``'strict'`` meaning that decoding errors raise @@ -104,7 +103,6 @@ The full details for each codec can also be looked up directly: To simplify access to the various codec components, the module provides these additional functions which use :func:`lookup` for the codec lookup: - .. function:: getencoder(encoding) Look up the codec for the given encoding and return its encoder function. @@ -274,6 +272,7 @@ implement the file protocols. Codec authors also need to define how the codec will handle encoding and decoding errors. +.. _surrogateescape: .. _error-handlers: Error Handlers @@ -315,10 +314,14 @@ The following error handlers are only applicable to | | reference (only for encoding). Implemented | | | in :func:`xmlcharrefreplace_errors`. | +-------------------------+-----------------------------------------------+ -| ``'backslashreplace'`` | Replace with backslashed escape sequences | -| | (only for encoding). Implemented in | +| ``'backslashreplace'`` | Replace with backslashed escape sequences. | +| | Implemented in | | | :func:`backslashreplace_errors`. | +-------------------------+-----------------------------------------------+ +| ``'namereplace'`` | Replace with ``\N{...}`` escape sequences | +| | (only for encoding). Implemented in | +| | :func:`namereplace_errors`. | ++-------------------------+-----------------------------------------------+ | ``'surrogateescape'`` | On decoding, replace byte with individual | | | surrogate code ranging from ``U+DC80`` to | | | ``U+DCFF``. This code will then be turned | @@ -344,6 +347,13 @@ In addition, the following error handler is specific to the given codecs: .. versionchanged:: 3.4 The ``'surrogatepass'`` error handlers now works with utf-16\* and utf-32\* codecs. +.. versionadded:: 3.5 + The ``'namereplace'`` error handler. + +.. versionchanged:: 3.5 + The ``'backslashreplace'`` error handlers now works with decoding and + translating. + The set of allowed values can be extended by registering a new named error handler: @@ -411,9 +421,17 @@ functions: .. function:: backslashreplace_errors(exception) - Implements the ``'backslashreplace'`` error handling (for encoding with + Implements the ``'backslashreplace'`` error handling (for + :term:`text encodings <text encoding>` only): malformed data is + replaced by a backslashed escape sequence. + +.. function:: namereplace_errors(exception) + + Implements the ``'namereplace'`` error handling (for encoding with :term:`text encodings <text encoding>` only): the - unencodable character is replaced by a backslashed escape sequence. + unencodable character is replaced by a ``\N{...}`` escape sequence. + + .. versionadded:: 3.5 .. _codec-objects: @@ -1142,8 +1160,16 @@ particular, the following variants typically exist: +-----------------+--------------------------------+--------------------------------+ | koi8_r | | Russian | +-----------------+--------------------------------+--------------------------------+ +| koi8_t | | Tajik | +| | | | +| | | .. versionadded:: 3.5 | ++-----------------+--------------------------------+--------------------------------+ | koi8_u | | Ukrainian | +-----------------+--------------------------------+--------------------------------+ +| kz1048 | kz_1048, strk1048_2002, rk1048 | Kazakh | +| | | | +| | | .. versionadded:: 3.5 | ++-----------------+--------------------------------+--------------------------------+ | mac_cyrillic | maccyrillic | Bulgarian, Byelorussian, | | | | Macedonian, Russian, Serbian | +-----------------+--------------------------------+--------------------------------+ @@ -1444,4 +1470,3 @@ This module implements a variant of the UTF-8 codec: On encoding a UTF-8 encoded BOM will be prepended to the UTF-8 encoded bytes. For the stateful encoder this is only done once (on the first write to the byte stream). For decoding an optional UTF-8 encoded BOM at the start of the data will be skipped. - |