diff options
Diffstat (limited to 'Doc/library/html.parser.rst')
-rw-r--r-- | Doc/library/html.parser.rst | 47 |
1 files changed, 6 insertions, 41 deletions
diff --git a/Doc/library/html.parser.rst b/Doc/library/html.parser.rst index fef9c38411..824995eddc 100644 --- a/Doc/library/html.parser.rst +++ b/Doc/library/html.parser.rst @@ -16,21 +16,13 @@ This module defines a class :class:`HTMLParser` which serves as the basis for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. -.. class:: HTMLParser(strict=False, *, convert_charrefs=False) +.. class:: HTMLParser(*, convert_charrefs=True) - Create a parser instance. + Create a parser instance able to parse invalid markup. - If *convert_charrefs* is ``True`` (default: ``False``), all character + If *convert_charrefs* is ``True`` (the default), all character references (except the ones in ``script``/``style`` elements) are automatically converted to the corresponding Unicode characters. - The use of ``convert_charrefs=True`` is encouraged and will become - the default in Python 3.5. - - If *strict* is ``False`` (the default), the parser will accept and parse - invalid markup. If *strict* is ``True`` the parser will raise an - :exc:`~html.parser.HTMLParseError` exception instead [#]_ when it's not - able to parse the markup. The use of ``strict=True`` is discouraged and - the *strict* argument is deprecated. An :class:`.HTMLParser` instance is fed HTML data and calls handler methods when start tags, end tags, text, comments, and other markup elements are @@ -40,31 +32,11 @@ parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. This parser does not check that end tags match start tags or call the end-tag handler for elements which are closed implicitly by closing an outer element. - .. versionchanged:: 3.2 - *strict* argument added. - - .. deprecated-removed:: 3.3 3.5 - The *strict* argument and the strict mode have been deprecated. - The parser is now able to accept and parse invalid markup too. - .. versionchanged:: 3.4 *convert_charrefs* keyword argument added. -An exception is defined as well: - - -.. exception:: HTMLParseError - - Exception raised by the :class:`HTMLParser` class when it encounters an error - while parsing and *strict* is ``True``. This exception provides three - attributes: :attr:`msg` is a brief message explaining the error, - :attr:`lineno` is the number of the line on which the broken construct was - detected, and :attr:`offset` is the number of characters into the line at - which the construct starts. - - .. deprecated-removed:: 3.3 3.5 - This exception has been deprecated because it's never raised by the parser - (when the default non-strict mode is used). + .. versionchanged:: 3.5 + The default value for argument *convert_charrefs* is now ``True``. Example HTML Parser Application @@ -246,8 +218,7 @@ implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`): The *data* parameter will be the entire contents of the declaration inside the ``<![...]>`` markup. It is sometimes useful to be overridden by a - derived class. The base class implementation raises an :exc:`HTMLParseError` - when *strict* is ``True``. + derived class. The base class implementation does nothing. .. _htmlparser-examples: @@ -358,9 +329,3 @@ Parsing invalid HTML (e.g. unquoted attributes) also works:: Data : tag soup End tag : p End tag : a - -.. rubric:: Footnotes - -.. [#] For backward compatibility reasons *strict* mode does not raise - exceptions for all non-compliant HTML. That is, some invalid HTML - is tolerated even in *strict* mode. |