diff options
author | crayzeewulf <crayzeewulf@gmail.com> | 2013-03-21 14:15:52 -0700 |
---|---|---|
committer | crayzeewulf <crayzeewulf@gmail.com> | 2013-03-21 14:15:52 -0700 |
commit | caa2748cb1c774d18a1664c3434cbc7c862bb46f (patch) | |
tree | 1239d0bd62e8eedbf8812cb3b380b33736e79367 | |
parent | ec692af97eea48421c12525bdafd2f20f922bd86 (diff) | |
download | python-lxml-caa2748cb1c774d18a1664c3434cbc7c862bb46f.tar.gz |
Corrected the sample output of clean_html()
The output of clean_html() does not include html and body tags.
The example output in the documentation was corrected.
-rw-r--r-- | doc/lxmlhtml.txt | 29 |
1 files changed, 12 insertions, 17 deletions
diff --git a/doc/lxmlhtml.txt b/doc/lxmlhtml.txt index 776a4ae3..940e65bb 100644 --- a/doc/lxmlhtml.txt +++ b/doc/lxmlhtml.txt @@ -515,24 +515,19 @@ To remove the all suspicious content from this unparsed document, use the .. sourcecode:: pycon >>> from lxml.html.clean import clean_html - >>> print clean_html(html) - <html> - <body> - <div> - <style>/* deleted */</style> - <a href="">a link</a> - <a href="#">another link</a> - <p>a paragraph</p> - <div>secret EVIL!</div> - of EVIL! - Password: - annoying EVIL! - <a href="evil-site">spam spam SPAM!</a> - <img src="evil!"> - </div> - </body> - </html> + <div><style>/* deleted */</style><body> + + <a href="">a link</a> + <a href="#">another link</a> + <p>a paragraph</p> + <div>secret EVIL!</div> + of EVIL! + + + Password: + annoying EVIL!<a href="evil-site">spam spam SPAM!</a> + <img src="evil!"></body></div> The ``Cleaner`` class supports several keyword arguments to control exactly which content is removed: |