diff options
author | Victor Stinner <vstinner@python.org> | 2020-11-01 23:07:23 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2020-11-01 23:07:23 +0100 |
commit | e662c398d87f136497f8ec672e83657ae3a599e0 (patch) | |
tree | cc9383c30557769a096be580b7f8f1b936565ea9 /Include/cpython | |
parent | 82458b6cdbae3b849dc11d0d7dc2ab06ef0451c4 (diff) | |
download | cpython-git-e662c398d87f136497f8ec672e83657ae3a599e0.tar.gz |
bpo-42236: Use UTF-8 encoding if nl_langinfo(CODESET) fails (GH-23086)
If the nl_langinfo(CODESET) function returns an empty string, Python
now uses UTF-8 as the filesystem encoding.
In May 2010 (commit b744ba1d14c5487576c95d0311e357b707600b47), I
modified Python to log a warning and use UTF-8 as the filesystem
encoding (instead of None) if nl_langinfo(CODESET) returns an empty
string.
In August 2020 (commit 94908bbc1503df830d1d615e7b57744ae1b41079), I
modified Python startup to fail with a fatal error and a specific
error message if nl_langinfo(CODESET) returns an empty string. The
intent was to prevent guessing the encoding and also investigate user
configuration where this case happens.
In 10 years (2010 to 2020), I saw zero user report about the error
message related to nl_langinfo(CODESET) returning an empty string.
Today, UTF-8 became the defacto standard and it's safe to make the
assumption that the user expects UTF-8. For example,
nl_langinfo(CODESET) can return an empty string on macOS if the
LC_CTYPE locale is not supported, and UTF-8 is the default encoding
on macOS.
While this change is likely to not affect anyone in practice, it
should make UTF-8 lover happy ;-)
Rewrite also the documentation explaining how Python selects the
filesystem encoding and error handler.
Diffstat (limited to 'Include/cpython')
-rw-r--r-- | Include/cpython/initconfig.h | 37 |
1 files changed, 7 insertions, 30 deletions
diff --git a/Include/cpython/initconfig.h b/Include/cpython/initconfig.h index bbe8387677..dd5ca6121c 100644 --- a/Include/cpython/initconfig.h +++ b/Include/cpython/initconfig.h @@ -156,36 +156,13 @@ typedef struct { /* Python filesystem encoding and error handler: sys.getfilesystemencoding() and sys.getfilesystemencodeerrors(). - Default encoding and error handler: - - * if Py_SetStandardStreamEncoding() has been called: they have the - highest priority; - * PYTHONIOENCODING environment variable; - * The UTF-8 Mode uses UTF-8/surrogateescape; - * If Python forces the usage of the ASCII encoding (ex: C locale - or POSIX locale on FreeBSD or HP-UX), use ASCII/surrogateescape; - * locale encoding: ANSI code page on Windows, UTF-8 on Android and - VxWorks, LC_CTYPE locale encoding on other platforms; - * On Windows, "surrogateescape" error handler; - * "surrogateescape" error handler if the LC_CTYPE locale is "C" or "POSIX"; - * "surrogateescape" error handler if the LC_CTYPE locale has been coerced - (PEP 538); - * "strict" error handler. - - Supported error handlers: "strict", "surrogateescape" and - "surrogatepass". The surrogatepass error handler is only supported - if Py_DecodeLocale() and Py_EncodeLocale() use directly the UTF-8 codec; - it's only used on Windows. - - initfsencoding() updates the encoding to the Python codec name. - For example, "ANSI_X3.4-1968" is replaced with "ascii". - - On Windows, sys._enablelegacywindowsfsencoding() sets the - encoding/errors to mbcs/replace at runtime. - - - See Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors. - */ + The Doc/c-api/init_config.rst documentation explains how Python selects + the filesystem encoding and error handler. + + _PyUnicode_InitEncodings() updates the encoding name to the Python codec + name. For example, "ANSI_X3.4-1968" is replaced with "ascii". It also + sets Py_FileSystemDefaultEncoding to filesystem_encoding and + sets Py_FileSystemDefaultEncodeErrors to filesystem_errors. */ wchar_t *filesystem_encoding; wchar_t *filesystem_errors; |