summaryrefslogtreecommitdiff
path: root/Doc/library/codecs.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/codecs.rst')
-rw-r--r--Doc/library/codecs.rst49
1 files changed, 37 insertions, 12 deletions
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst
index f3017da2c8..7438aec3d8 100644
--- a/Doc/library/codecs.rst
+++ b/Doc/library/codecs.rst
@@ -7,6 +7,7 @@
.. sectionauthor:: Marc-André Lemburg <mal@lemburg.com>
.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
+**Source code:** :source:`Lib/codecs.py`
.. index::
single: Unicode
@@ -29,10 +30,9 @@ module features are restricted to use specifically with
The module defines the following functions for encoding and decoding with
any codec:
-.. function:: encode(obj, [encoding[, errors]])
+.. function:: encode(obj, encoding='utf-8', errors='strict')
- Encodes *obj* using the codec registered for *encoding*. The default
- encoding is ``utf-8``.
+ Encodes *obj* using the codec registered for *encoding*.
*Errors* may be given to set the desired error handling scheme. The
default error handler is ``'strict'`` meaning that encoding errors raise
@@ -40,10 +40,9 @@ any codec:
:exc:`UnicodeEncodeError`). Refer to :ref:`codec-base-classes` for more
information on codec error handling.
-.. function:: decode(obj, [encoding[, errors]])
+.. function:: decode(obj, encoding='utf-8', errors='strict')
- Decodes *obj* using the codec registered for *encoding*. The default
- encoding is ``utf-8``.
+ Decodes *obj* using the codec registered for *encoding*.
*Errors* may be given to set the desired error handling scheme. The
default error handler is ``'strict'`` meaning that decoding errors raise
@@ -104,7 +103,6 @@ The full details for each codec can also be looked up directly:
To simplify access to the various codec components, the module provides
these additional functions which use :func:`lookup` for the codec lookup:
-
.. function:: getencoder(encoding)
Look up the codec for the given encoding and return its encoder function.
@@ -274,6 +272,7 @@ implement the file protocols. Codec authors also need to define how the
codec will handle encoding and decoding errors.
+.. _surrogateescape:
.. _error-handlers:
Error Handlers
@@ -315,10 +314,14 @@ The following error handlers are only applicable to
| | reference (only for encoding). Implemented |
| | in :func:`xmlcharrefreplace_errors`. |
+-------------------------+-----------------------------------------------+
-| ``'backslashreplace'`` | Replace with backslashed escape sequences |
-| | (only for encoding). Implemented in |
+| ``'backslashreplace'`` | Replace with backslashed escape sequences. |
+| | Implemented in |
| | :func:`backslashreplace_errors`. |
+-------------------------+-----------------------------------------------+
+| ``'namereplace'`` | Replace with ``\N{...}`` escape sequences |
+| | (only for encoding). Implemented in |
+| | :func:`namereplace_errors`. |
++-------------------------+-----------------------------------------------+
| ``'surrogateescape'`` | On decoding, replace byte with individual |
| | surrogate code ranging from ``U+DC80`` to |
| | ``U+DCFF``. This code will then be turned |
@@ -344,6 +347,13 @@ In addition, the following error handler is specific to the given codecs:
.. versionchanged:: 3.4
The ``'surrogatepass'`` error handlers now works with utf-16\* and utf-32\* codecs.
+.. versionadded:: 3.5
+ The ``'namereplace'`` error handler.
+
+.. versionchanged:: 3.5
+ The ``'backslashreplace'`` error handlers now works with decoding and
+ translating.
+
The set of allowed values can be extended by registering a new named error
handler:
@@ -411,9 +421,17 @@ functions:
.. function:: backslashreplace_errors(exception)
- Implements the ``'backslashreplace'`` error handling (for encoding with
+ Implements the ``'backslashreplace'`` error handling (for
+ :term:`text encodings <text encoding>` only): malformed data is
+ replaced by a backslashed escape sequence.
+
+.. function:: namereplace_errors(exception)
+
+ Implements the ``'namereplace'`` error handling (for encoding with
:term:`text encodings <text encoding>` only): the
- unencodable character is replaced by a backslashed escape sequence.
+ unencodable character is replaced by a ``\N{...}`` escape sequence.
+
+ .. versionadded:: 3.5
.. _codec-objects:
@@ -1142,8 +1160,16 @@ particular, the following variants typically exist:
+-----------------+--------------------------------+--------------------------------+
| koi8_r | | Russian |
+-----------------+--------------------------------+--------------------------------+
+| koi8_t | | Tajik |
+| | | |
+| | | .. versionadded:: 3.5 |
++-----------------+--------------------------------+--------------------------------+
| koi8_u | | Ukrainian |
+-----------------+--------------------------------+--------------------------------+
+| kz1048 | kz_1048, strk1048_2002, rk1048 | Kazakh |
+| | | |
+| | | .. versionadded:: 3.5 |
++-----------------+--------------------------------+--------------------------------+
| mac_cyrillic | maccyrillic | Bulgarian, Byelorussian, |
| | | Macedonian, Russian, Serbian |
+-----------------+--------------------------------+--------------------------------+
@@ -1444,4 +1470,3 @@ This module implements a variant of the UTF-8 codec: On encoding a UTF-8 encoded
BOM will be prepended to the UTF-8 encoded bytes. For the stateful encoder this
is only done once (on the first write to the byte stream). For decoding an
optional UTF-8 encoded BOM at the start of the data will be skipped.
-