summaryrefslogtreecommitdiff
path: root/doc/tutorial.txt
diff options
context:
space:
mode:
authorscoder <none@none>2008-03-04 19:31:15 +0100
committerscoder <none@none>2008-03-04 19:31:15 +0100
commit0dccb72a2a6364a216eed1c5e50b95aa82b26c15 (patch)
tree9642b26113ff9ed1723f4f5de35589f80d9708b7 /doc/tutorial.txt
parent5a81c2a92ae4e0e073084d6fa64b1c3174b54c87 (diff)
downloadpython-lxml-0dccb72a2a6364a216eed1c5e50b95aa82b26c15.tar.gz
[svn r3398] r3725@delle: sbehnel | 2008-03-04 17:52:27 +0100
tutorial update --HG-- branch : trunk
Diffstat (limited to 'doc/tutorial.txt')
-rw-r--r--doc/tutorial.txt31
1 files changed, 25 insertions, 6 deletions
diff --git a/doc/tutorial.txt b/doc/tutorial.txt
index e739ee55..fb7a818f 100644
--- a/doc/tutorial.txt
+++ b/doc/tutorial.txt
@@ -520,7 +520,15 @@ specific output encoding other than plain ASCII:
</root>
Note the newline that is appended at the end when pretty printing the
-output.
+output:
+
+.. sourcecode:: pycon
+
+ >>> etree.tostring(root, pretty_print=True)
+ '<root>\n <a>\n <b/>\n </a>\n</root>\n'
+
+ >>> etree.tostring(root)
+ '<root><a><b/></a></root>'
Since lxml 2.0 (and ElementTree 1.3), the serialisation functions can
do more than XML serialisation. You can serialise to HTML or extract
@@ -528,7 +536,8 @@ the text content by passing the ``method`` keyword:
.. sourcecode:: pycon
- >>> root = etree.XML('<html><head/><body><p>Hello<br/>World</p></body></html>')
+ >>> root = etree.XML(
+ ... '<html><head/><body><p>Hello<br/>World</p></body></html>')
>>> print etree.tostring(root) # default: method = 'xml'
<html><head/><body><p>Hello<br/>World</p></body></html>
@@ -548,13 +557,23 @@ the text content by passing the ``method`` keyword:
>>> print etree.tostring(root, method='text')
HelloWorld
-For the plain text output, serialising to a Python unicode string
+Note that the default encoding for plain text serialisation is UTF-8:
+
+.. sourcecode:: pycon
+
+ >>> br = root.find('.//br')
+ >>> br.tail = u'W\xf6rld'
+
+ >>> etree.tostring(root, method='text')
+ 'HelloW\xc3\xb6rld'
+
+Here, serialising to a Python unicode string instead of a byte string
might become handy. Just pass the ``unicode`` type as encoding:
.. sourcecode:: pycon
>>> etree.tostring(root, encoding=unicode, method='text')
- u'HelloWorld'
+ u'HelloW\xf6rld'
The ElementTree class
@@ -605,8 +624,8 @@ comments, as well as a DOCTYPE and other DTD content in the document:
<a>eggs</a>
</root>
-Note that this has changed in lxml 1.3.4 to match the behaviour of the
-upcoming lxml 2.0. Before, both would serialise without DTD content, which
+Note that this has changed in lxml 1.3.4 to match the behaviour of
+lxml 2.0. Before, both would serialise without DTD content, which
made lxml loose DTD information in an input-output cycle.