summaryrefslogtreecommitdiff
path: root/src/lxml/html/html5parser.py
Commit message (Collapse)AuthorAgeFilesLines
* Update docstrings to reflect the new behaviour of the "guess_charset" option ↵Stefan Behnel2017-08-131-9/+20
| | | | in html5parser.
* Adapt defaults for "guess_charset" option in the remaining parse functions.Stefan Behnel2017-08-131-2/+2
|
* Adapt defaults for "guess_charset" option when parsing from files and URLs ↵Stefan Behnel2017-08-131-2/+22
| | | | to avoid passing "useChardet" when it's likely to fail.
* Back out GH issue #232 again as a retry should not be triggered internally ↵Stefan Behnel2017-08-131-36/+6
| | | | by lxml. Users should do it explicitly if they think they need it.
* Only pass "useChardet" option into html5parser by default if the input is a ↵Stefan Behnel2017-08-121-6/+18
| | | | byte string. Should help with LP#1654544.
* Make class Py2 compatible.Stefan Behnel2017-08-121-1/+1
|
* Merge pull request #232 from ondergetekende/1654544scoder2017-08-121-5/+38
|\ | | | | Fix LP1654544
| * Make sure the html5lib tests are included in CIKoert van der Veer2017-03-201-20/+22
| |
| * Build a retry mechanism around html5lib's unpredictable useChardet supportKoert van der Veer2017-03-161-2/+33
| | | | | | | | Closes LP1654544
* | improve type check and commentStefan Behnel2017-03-181-4/+4
| |
* | Perform full-document detection on decoded bytes.Koert van der Veer2017-03-161-1/+8
|/ | | | Closes #1673355
* prevent 'abc' from being considered a drive letterStefan Behnel2014-01-231-1/+3
|
* fix URL detection heuristic in html5parser under win32Stefan Behnel2014-01-201-1/+12
|
* Fixes so that unit tests run under python 3.1Jeff Dairiki2012-04-011-8/+18
| | | | | | | | | | | | | Note however that while there is a python3 version of html5lib, it appears to be unmaintained, so the worth of all this is questionable. References: http://code.google.com/p/html5lib/issues/detail?id=144 http://code.google.com/p/html5lib/source/browse/#hg%2Fpython3 --HG-- extra : rebase_source : a4ce702ad841c25d63f4a6a56ea106bcd986bd47
* fix undefined names in html5parser.pyStefan Behnel2012-03-241-2/+2
|
* Added support for passing kwargs into html5lib parser. I.e ↵hankthetank2011-11-041-4/+4
| | | | lxml.html.html5parser.HTMLParser(namespaceHTMLElements=False) which is needed in order to avoid the <html:div> namespacing.
* replace html5lib integration with an import of the official lxml support in ↵Stefan Behnel2011-08-111-2/+4
| | | | html5lib itself
* [svn r4339] r5455@lenny: sbehnel | 2010-01-30 22:37:46 +0100scoder2010-01-301-13/+20
| | | | | | | bug #511252: fix fragment parsing in lxml.html --HG-- branch : trunk
* [svn r4338] r5454@lenny: sbehnel | 2010-01-30 22:10:29 +0100scoder2010-01-301-5/+10
| | | | | | | bug #511252: fix fragment parsing in html5parser.py --HG-- branch : trunk
* [svn r4312] r5409@lenny: sbehnel | 2010-01-21 13:22:23 +0100scoder2010-01-211-6/+12
| | | | | | | do not require XHTMLParser in html5lib --HG-- branch : trunk
* [svn r4005] r4861@delle: sbehnel | 2008-11-14 10:52:38 +0100scoder2008-11-161-2/+2
| | | | | | | fixed missing imports and name errors --HG-- branch : trunk
* [svn r3900] r4637@delle: sbehnel | 2008-07-16 08:55:48 +0200scoder2008-07-161-0/+164
html5lib parser module provided by Armin Ronacher --HG-- branch : trunk