summaryrefslogtreecommitdiff
path: root/Doc
diff options
context:
space:
mode:
authorMiss Islington (bot) <31488909+miss-islington@users.noreply.github.com>2023-05-17 16:06:06 -0700
committerGitHub <noreply@github.com>2023-05-17 16:06:06 -0700
commitf48a96a28012d28ae37a2f4587a780a5eb779946 (patch)
treec9d5e9271a27e75b4f394ba441da49a8df4bd176 /Doc
parent425065bb002b9cbf9c12f61a6f3102f2ce2b8d14 (diff)
downloadcpython-git-3.10.tar.gz
[3.10] [3.11] gh-102153: Start stripping C0 control and space chars in `urlsplit` (GH-102508) (GH-104575) (#104592)3.10
gh-102153: Start stripping C0 control and space chars in `urlsplit` (GH-102508) `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit GH-25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). I simplified the docs by eliding the state of the world explanatory paragraph in this security release only backport. (people will see that in the mainline /3/ docs) --------- (cherry picked from commit 2f630e1ce18ad2e07428296532a68b11dc66ad10) (cherry picked from commit 610cc0ab1b760b2abaac92bd256b96191c46b941) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Illia Volochii <illia.volochii@gmail.com> Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
Diffstat (limited to 'Doc')
-rw-r--r--Doc/library/urllib.parse.rst38
1 files changed, 36 insertions, 2 deletions
diff --git a/Doc/library/urllib.parse.rst b/Doc/library/urllib.parse.rst
index 96b3965107..1e85602951 100644
--- a/Doc/library/urllib.parse.rst
+++ b/Doc/library/urllib.parse.rst
@@ -159,6 +159,10 @@ or on combining URL components into a URL string.
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
params='', query='', fragment='')
+ .. warning::
+
+ :func:`urlparse` does not perform validation. See :ref:`URL parsing
+ security <url-parsing-security>` for details.
.. versionchanged:: 3.2
Added IPv6 URL parsing capabilities.
@@ -324,8 +328,14 @@ or on combining URL components into a URL string.
``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
decomposed before parsing, no error will be raised.
- Following the `WHATWG spec`_ that updates RFC 3986, ASCII newline
- ``\n``, ``\r`` and tab ``\t`` characters are stripped from the URL.
+ Following some of the `WHATWG spec`_ that updates RFC 3986, leading C0
+ control and space characters are stripped from the URL. ``\n``,
+ ``\r`` and tab ``\t`` characters are removed from the URL at any position.
+
+ .. warning::
+
+ :func:`urlsplit` does not perform validation. See :ref:`URL parsing
+ security <url-parsing-security>` for details.
.. versionchanged:: 3.6
Out-of-range port numbers now raise :exc:`ValueError`, instead of
@@ -338,6 +348,9 @@ or on combining URL components into a URL string.
.. versionchanged:: 3.10
ASCII newline and tab characters are stripped from the URL.
+ .. versionchanged:: 3.10.12
+ Leading WHATWG C0 control and space characters are stripped from the URL.
+
.. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser
.. function:: urlunsplit(parts)
@@ -414,6 +427,27 @@ or on combining URL components into a URL string.
or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned
without changes.
+.. _url-parsing-security:
+
+URL parsing security
+--------------------
+
+The :func:`urlsplit` and :func:`urlparse` APIs do not perform **validation** of
+inputs. They may not raise errors on inputs that other applications consider
+invalid. They may also succeed on some inputs that might not be considered
+URLs elsewhere. Their purpose is for practical functionality rather than
+purity.
+
+Instead of raising an exception on unusual input, they may instead return some
+component parts as empty strings. Or components may contain more than perhaps
+they should.
+
+We recommend that users of these APIs where the values may be used anywhere
+with security implications code defensively. Do some verification within your
+code before trusting a returned component part. Does that ``scheme`` make
+sense? Is that a sensible ``path``? Is there anything strange about that
+``hostname``? etc.
+
.. _parsing-ascii-encoded-bytes:
Parsing ASCII Encoded Bytes