diff options
| author | scoder <none@none> | 2008-03-04 19:31:15 +0100 |
|---|---|---|
| committer | scoder <none@none> | 2008-03-04 19:31:15 +0100 |
| commit | 0dccb72a2a6364a216eed1c5e50b95aa82b26c15 (patch) | |
| tree | 9642b26113ff9ed1723f4f5de35589f80d9708b7 /doc/tutorial.txt | |
| parent | 5a81c2a92ae4e0e073084d6fa64b1c3174b54c87 (diff) | |
| download | python-lxml-0dccb72a2a6364a216eed1c5e50b95aa82b26c15.tar.gz | |
[svn r3398] r3725@delle: sbehnel | 2008-03-04 17:52:27 +0100
tutorial update
--HG--
branch : trunk
Diffstat (limited to 'doc/tutorial.txt')
| -rw-r--r-- | doc/tutorial.txt | 31 |
1 files changed, 25 insertions, 6 deletions
diff --git a/doc/tutorial.txt b/doc/tutorial.txt index e739ee55..fb7a818f 100644 --- a/doc/tutorial.txt +++ b/doc/tutorial.txt @@ -520,7 +520,15 @@ specific output encoding other than plain ASCII: </root> Note the newline that is appended at the end when pretty printing the -output. +output: + +.. sourcecode:: pycon + + >>> etree.tostring(root, pretty_print=True) + '<root>\n <a>\n <b/>\n </a>\n</root>\n' + + >>> etree.tostring(root) + '<root><a><b/></a></root>' Since lxml 2.0 (and ElementTree 1.3), the serialisation functions can do more than XML serialisation. You can serialise to HTML or extract @@ -528,7 +536,8 @@ the text content by passing the ``method`` keyword: .. sourcecode:: pycon - >>> root = etree.XML('<html><head/><body><p>Hello<br/>World</p></body></html>') + >>> root = etree.XML( + ... '<html><head/><body><p>Hello<br/>World</p></body></html>') >>> print etree.tostring(root) # default: method = 'xml' <html><head/><body><p>Hello<br/>World</p></body></html> @@ -548,13 +557,23 @@ the text content by passing the ``method`` keyword: >>> print etree.tostring(root, method='text') HelloWorld -For the plain text output, serialising to a Python unicode string +Note that the default encoding for plain text serialisation is UTF-8: + +.. sourcecode:: pycon + + >>> br = root.find('.//br') + >>> br.tail = u'W\xf6rld' + + >>> etree.tostring(root, method='text') + 'HelloW\xc3\xb6rld' + +Here, serialising to a Python unicode string instead of a byte string might become handy. Just pass the ``unicode`` type as encoding: .. sourcecode:: pycon >>> etree.tostring(root, encoding=unicode, method='text') - u'HelloWorld' + u'HelloW\xf6rld' The ElementTree class @@ -605,8 +624,8 @@ comments, as well as a DOCTYPE and other DTD content in the document: <a>eggs</a> </root> -Note that this has changed in lxml 1.3.4 to match the behaviour of the -upcoming lxml 2.0. Before, both would serialise without DTD content, which +Note that this has changed in lxml 1.3.4 to match the behaviour of +lxml 2.0. Before, both would serialise without DTD content, which made lxml loose DTD information in an input-output cycle. |
