summaryrefslogtreecommitdiff
path: root/Doc/using
diff options
context:
space:
mode:
authorMiss Islington (bot) <31488909+miss-islington@users.noreply.github.com>2018-06-09 00:13:53 -0700
committerGitHub <noreply@github.com>2018-06-09 00:13:53 -0700
commitdd613cf335d7556e20ef3dd01c3abc081b958449 (patch)
tree1d6761f65ce638c273ede3ecab270c2137d31608 /Doc/using
parent8016c2bd695a750860f2c5d0e406ab28572ac4f1 (diff)
downloadcpython-git-dd613cf335d7556e20ef3dd01c3abc081b958449.tar.gz
bpo-33409: Clarify PEP 538/540 relationship (GH-7534)
While locale coercion and UTF-8 mode turned out to be complementary ideas rather than competing ones, it isn't immediately obvious why it's useful to have both, or how they interact at runtime. This updates both the Python 3.7 What's New doc and the PYTHONCOERCECLOCALE and PYTHONUTF8 documentation in an attempt to clarify that relationship: - in the respective What's New sections, add a closing paragraph explaining which problem each one solves, and pointing to the other PEP's section for the specific aspects it relies on the other PEP to solve - use "locale-aware mode" as a more descriptive term for the default non-UTF-8 mode - improve wording conistenccy between the PYTHONCOERCECLOCALE and PYTHONUTF8 docs when they cover the same thing (mostly related to legacy locale detection and setting the standard stream error handler) - improve the description of the locale coercion trigger conditions (including pointing out that setting LC_ALL turns off locale coercion) - port the full description of the UTF-8 mode behaviour changes from PEP 540 into the PYTHONUTF8 documentation - be explicit that PYTHONIOENCODING still overrides the settings for the standard streams - mention concrete examples of things that do and don't get their text encoding assumptions adjusted by the two text encoding assumption override techniques (cherry picked from commit 1bcb8a636857e3383d65aaf196f93edb949f2e79) Co-authored-by: Nick Coghlan <ncoghlan@gmail.com>
Diffstat (limited to 'Doc/using')
-rw-r--r--Doc/using/cmdline.rst97
1 files changed, 79 insertions, 18 deletions
diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst
index e72dea9075..c6bb0be6bc 100644
--- a/Doc/using/cmdline.rst
+++ b/Doc/using/cmdline.rst
@@ -438,8 +438,10 @@ Miscellaneous options
* Set the :attr:`~sys.flags.dev_mode` attribute of :attr:`sys.flags` to
``True``
- * ``-X utf8`` enables the UTF-8 mode, whereas ``-X utf8=0`` disables the
- UTF-8 mode.
+ * ``-X utf8`` enables UTF-8 mode for operating system interfaces, overriding
+ the default locale-aware mode. ``-X utf8=0`` explicitly disables UTF-8
+ mode (even when it would otherwise activate automatically).
+ See :envvar:`PYTHONUTF8` for more details.
It also allows passing arbitrary values and retrieving them through the
:data:`sys._xoptions` dictionary.
@@ -789,14 +791,16 @@ conflict.
.. envvar:: PYTHONCOERCECLOCALE
If set to the value ``0``, causes the main Python command line application
- to skip coercing the legacy ASCII-based C locale to a more capable UTF-8
- based alternative.
+ to skip coercing the legacy ASCII-based C and POSIX locales to a more
+ capable UTF-8 based alternative.
- If this variable is *not* set, or is set to a value other than ``0``, and
- the current locale reported for the ``LC_CTYPE`` category is the default
- ``C`` locale, then the Python CLI will attempt to configure the following
- locales for the ``LC_CTYPE`` category in the order listed before loading the
- interpreter runtime:
+ If this variable is *not* set (or is set to a value other than ``0``), the
+ ``LC_ALL`` locale override environment variable is also not set, and the
+ current locale reported for the ``LC_CTYPE`` category is either the default
+ ``C`` locale, or else the explicitly ASCII-based ``POSIX`` locale, then the
+ Python CLI will attempt to configure the following locales for the
+ ``LC_CTYPE`` category in the order listed before loading the interpreter
+ runtime:
* ``C.UTF-8``
* ``C.utf8``
@@ -804,21 +808,32 @@ conflict.
If setting one of these locale categories succeeds, then the ``LC_CTYPE``
environment variable will also be set accordingly in the current process
- environment before the Python runtime is initialized. This ensures the
- updated setting is seen in subprocesses, as well as in operations that
- query the environment rather than the current C locale (such as Python's
- own :func:`locale.getdefaultlocale`).
+ environment before the Python runtime is initialized. This ensures that in
+ addition to being seen by both the interpreter itself and other locale-aware
+ components running in the same process (such as the GNU ``readline``
+ library), the updated setting is also seen in subprocesses (regardless of
+ whether or not those processes are running a Python interpreter), as well as
+ in operations that query the environment rather than the current C locale
+ (such as Python's own :func:`locale.getdefaultlocale`).
Configuring one of these locales (either explicitly or via the above
- implicit locale coercion) will automatically set the error handler for
- :data:`sys.stdin` and :data:`sys.stdout` to ``surrogateescape``. This
- behavior can be overridden using :envvar:`PYTHONIOENCODING` as usual.
+ implicit locale coercion) automatically enables the ``surrogateescape``
+ :ref:`error handler <error-handlers>` for :data:`sys.stdin` and
+ :data:`sys.stdout` (:data:`sys.stderr` continues to use ``backslashreplace``
+ as it does in any other locale). This stream handling behavior can be
+ overridden using :envvar:`PYTHONIOENCODING` as usual.
For debugging purposes, setting ``PYTHONCOERCECLOCALE=warn`` will cause
Python to emit warning messages on ``stderr`` if either the locale coercion
activates, or else if a locale that *would* have triggered coercion is
still active when the Python runtime is initialized.
+ Also note that even when locale coercion is disabled, or when it fails to
+ find a suitable target locale, :envvar:`PYTHONUTF8` will still activate by
+ default in legacy ASCII-based locales. Both features must be disabled in
+ order to force the interpreter to use ``ASCII`` instead of ``UTF-8`` for
+ system interfaces.
+
Availability: \*nix
.. versionadded:: 3.7
@@ -834,10 +849,56 @@ conflict.
.. envvar:: PYTHONUTF8
- If set to ``1``, enable the UTF-8 mode. If set to ``0``, disable the UTF-8
- mode. Any other non-empty string cause an error.
+ If set to ``1``, enables the interpreter's UTF-8 mode, where ``UTF-8`` is
+ used as the text encoding for system interfaces, regardless of the
+ current locale setting.
+
+ This means that:
+
+ * :func:`sys.getfilesystemencoding()` returns ``'UTF-8'`` (the locale
+ encoding is ignored).
+ * :func:`locale.getpreferredencoding()` returns ``'UTF-8'`` (the locale
+ encoding is ignored, and the function's ``do_setlocale`` parameter has no
+ effect).
+ * :data:`sys.stdin`, :data:`sys.stdout`, and :data:`sys.stderr` all use
+ UTF-8 as their text encoding, with the ``surrogateescape``
+ :ref:`error handler <error-handlers>` being enabled for :data:`sys.stdin`
+ and :data:`sys.stdout` (:data:`sys.stderr` continues to use
+ ``backslashreplace`` as it does in the default locale-aware mode)
+
+ As a consequence of the changes in those lower level APIs, other higher
+ level APIs also exhibit different default behaviours:
+
+ * Command line arguments, environment variables and filenames are decoded
+ to text using the UTF-8 encoding.
+ * :func:`os.fsdecode()` and :func:`os.fsencode()` use the UTF-8 encoding.
+ * :func:`open()`, :func:`io.open()`, and :func:`codecs.open()` use the UTF-8
+ encoding by default. However, they still use the strict error handler by
+ default so that attempting to open a binary file in text mode is likely
+ to raise an exception rather than producing nonsense data.
+
+ Note that the standard stream settings in UTF-8 mode can be overridden by
+ :envvar:`PYTHONIOENCODING` (just as they can be in the default locale-aware
+ mode).
+
+ If set to ``0``, the interpreter runs in its default locale-aware mode.
+
+ Setting any other non-empty string causes an error during interpreter
+ initialisation.
+
+ If this environment variable is not set at all, then the interpreter defaults
+ to using the current locale settings, *unless* the current locale is
+ identified as a legacy ASCII-based locale
+ (as descibed for :envvar:`PYTHONCOERCECLOCALE`), and locale coercion is
+ either disabled or fails. In such legacy locales, the interpreter will
+ default to enabling UTF-8 mode unless explicitly instructed not to do so.
+
+ Also available as the :option:`-X` ``utf8`` option.
+
+ Availability: \*nix
.. versionadded:: 3.7
+ See :pep:`540` for more details.
Debug-mode variables