summaryrefslogtreecommitdiff
path: root/Doc/library/xml.etree.elementtree.rst
diff options
context:
space:
mode:
authorSerhiy Storchaka <storchaka@gmail.com>2013-08-29 10:29:30 +0300
committerSerhiy Storchaka <storchaka@gmail.com>2013-08-29 10:29:30 +0300
commitbb1ebd6ed7d5f6a0447c1e018e5b656acee7f571 (patch)
tree15e29cc021680bc451ff4662ad8e947075400756 /Doc/library/xml.etree.elementtree.rst
parentdb82193d1b5a74308f6673ef1dcb16456a0ae8d7 (diff)
parent1c5cb63b938b85ea512cf993f3867d8ef0f2ac3f (diff)
downloadcpython-bb1ebd6ed7d5f6a0447c1e018e5b656acee7f571.tar.gz
Issue #18760: Improved cross-references in the xml package.
Diffstat (limited to 'Doc/library/xml.etree.elementtree.rst')
-rw-r--r--Doc/library/xml.etree.elementtree.rst107
1 files changed, 102 insertions, 5 deletions
diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst
index 7270067bf6..e1d9ff04ce 100644
--- a/Doc/library/xml.etree.elementtree.rst
+++ b/Doc/library/xml.etree.elementtree.rst
@@ -105,6 +105,38 @@ Children are nested, and we can access specific child nodes by index::
>>> root[0][1].text
'2008'
+Incremental parsing
+^^^^^^^^^^^^^^^^^^^
+
+It's possible to parse XML incrementally (i.e. not the whole document at once).
+The most powerful tool for doing this is :class:`IncrementalParser`. It does
+not require a blocking read to obtain the XML data, and is instead fed with
+data incrementally with :meth:`IncrementalParser.data_received` calls. To get
+the parsed XML elements, call :meth:`IncrementalParser.events`. Here's an
+example::
+
+ >>> incparser = ET.IncrementalParser(['start', 'end'])
+ >>> incparser.data_received('<mytag>sometext')
+ >>> list(incparser.events())
+ [('start', <Element 'mytag' at 0x7fba3f2a8688>)]
+ >>> incparser.data_received(' more text</mytag>')
+ >>> for event, elem in incparser.events():
+ ... print(event)
+ ... print(elem.tag, 'text=', elem.text)
+ ...
+ end
+ mytag text= sometext more text
+
+The obvious use case is applications that operate in an asynchronous fashion
+where the XML data is being received from a socket or read incrementally from
+some storage device. In such cases, blocking reads are unacceptable.
+
+Because it's so flexible, :class:`IncrementalParser` can be inconvenient
+to use for simpler use-cases. If you don't mind your application blocking on
+reading XML data but would still like to have incremental parsing capabilities,
+take a look at :func:`iterparse`. It can be useful when you're reading a large
+XML document and don't want to hold it wholly in memory.
+
Finding interesting elements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -379,7 +411,7 @@ Functions
Parses an XML section into an element tree incrementally, and reports what's
going on to the user. *source* is a filename or :term:`file object`
- containing XML data. *events* is a list of events to report back. The
+ containing XML data. *events* is a sequence of events to report back. The
supported events are the strings ``"start"``, ``"end"``, ``"start-ns"``
and ``"end-ns"`` (the "ns" events are used to get detailed namespace
information). If *events* is omitted, only ``"end"`` events are reported.
@@ -388,6 +420,11 @@ Functions
:class:`TreeBuilder` as a target. Returns an :term:`iterator` providing
``(event, elem)`` pairs.
+ Note that while :func:`iterparse` builds the tree incrementally, it issues
+ blocking reads on *source* (or the file it names). As such, it's unsuitable
+ for asynchronous applications where blocking reads can't be made. For fully
+ asynchronous parsing, see :class:`IncrementalParser`.
+
.. note::
:func:`iterparse` only guarantees that it has seen the ">"
@@ -398,7 +435,6 @@ Functions
If you need a fully populated element, look for "end" events instead.
-
.. function:: parse(source, parser=None)
Parses an XML section into an element tree. *source* is a filename or file
@@ -438,29 +474,39 @@ Functions
arguments. Returns an element instance.
-.. function:: tostring(element, encoding="us-ascii", method="xml")
+.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
+ short_empty_elements=True)
Generates a string representation of an XML element, including all
subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
generate a Unicode string (otherwise, a bytestring is generated). *method*
is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
+ *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Returns an (optionally) encoded string containing the XML data.
+ .. versionadded:: 3.4
+ The *short_empty_elements* parameter.
-.. function:: tostringlist(element, encoding="us-ascii", method="xml")
+
+.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
+ short_empty_elements=True)
Generates a string representation of an XML element, including all
subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
generate a Unicode string (otherwise, a bytestring is generated). *method*
is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
+ *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Returns a list of (optionally) encoded strings containing the XML data.
It does not guarantee any specific sequence, except that
``"".join(tostringlist(element)) == tostring(element)``.
.. versionadded:: 3.2
+ .. versionadded:: 3.4
+ The *short_empty_elements* parameter.
+
.. function:: XML(text, parser=None)
@@ -753,7 +799,8 @@ ElementTree Objects
.. method:: write(file, encoding="us-ascii", xml_declaration=None, \
- default_namespace=None, method="xml")
+ default_namespace=None, method="xml", *, \
+ short_empty_elements=True)
Writes the element tree to a file, as XML. *file* is a file name, or a
:term:`file object` opened for writing. *encoding* [1]_ is the output
@@ -764,6 +811,10 @@ ElementTree Objects
*default_namespace* sets the default XML namespace (for "xmlns").
*method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
``"xml"``).
+ The keyword-only *short_empty_elements* parameter controls the formatting
+ of elements that contain no content. If *True* (the default), they are
+ emitted as a single self-closed tag, otherwise they are emitted as a pair
+ of start/end tags.
The output is either a string (:class:`str`) or binary (:class:`bytes`).
This is controlled by the *encoding* argument. If *encoding* is
@@ -772,6 +823,9 @@ ElementTree Objects
:term:`file object`; make sure you do not try to write a string to a
binary stream and vice versa.
+ .. versionadded:: 3.4
+ The *short_empty_elements* parameter.
+
This is the XML file that is going to be manipulated::
@@ -817,6 +871,49 @@ QName Objects
:class:`QName` instances are opaque.
+IncrementalParser Objects
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. class:: IncrementalParser(events=None, parser=None)
+
+ An incremental, event-driven parser suitable for non-blocking applications.
+ *events* is a sequence of events to report back. The supported events are
+ the strings ``"start"``, ``"end"``, ``"start-ns"`` and ``"end-ns"`` (the "ns"
+ events are used to get detailed namespace information). If *events* is
+ omitted, only ``"end"`` events are reported. *parser* is an optional
+ parser instance. If not given, the standard :class:`XMLParser` parser is
+ used. *parser* can only use the default :class:`TreeBuilder` as a target.
+
+ .. method:: data_received(data)
+
+ Feed the given bytes data to the incremental parser.
+
+ .. method:: eof_received()
+
+ Signal the incremental parser that the data stream is terminated.
+
+ .. method:: events()
+
+ Iterate over the events which have been encountered in the data fed
+ to the parser. This method yields ``(event, elem)`` pairs, where
+ *event* is a string representing the type of event (e.g. ``"end"``)
+ and *elem* is the encountered :class:`Element` object. Events
+ provided in a previous call to :meth:`events` will not be yielded
+ again.
+
+ .. note::
+
+ :class:`IncrementalParser` only guarantees that it has seen the ">"
+ character of a starting tag when it emits a "start" event, so the
+ attributes are defined, but the contents of the text and tail attributes
+ are undefined at that point. The same applies to the element children;
+ they may or may not be present.
+
+ If you need a fully populated element, look for "end" events instead.
+
+ .. versionadded:: 3.4
+
+
.. _elementtree-treebuilder-objects:
TreeBuilder Objects