Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Update docstrings to reflect the new behaviour of the "guess_charset" option ↵ | Stefan Behnel | 2017-08-13 | 1 | -9/+20 |
| | | | | in html5parser. | ||||
* | Adapt defaults for "guess_charset" option in the remaining parse functions. | Stefan Behnel | 2017-08-13 | 1 | -2/+2 |
| | |||||
* | Adapt defaults for "guess_charset" option when parsing from files and URLs ↵ | Stefan Behnel | 2017-08-13 | 1 | -2/+22 |
| | | | | to avoid passing "useChardet" when it's likely to fail. | ||||
* | Back out GH issue #232 again as a retry should not be triggered internally ↵ | Stefan Behnel | 2017-08-13 | 1 | -36/+6 |
| | | | | by lxml. Users should do it explicitly if they think they need it. | ||||
* | Only pass "useChardet" option into html5parser by default if the input is a ↵ | Stefan Behnel | 2017-08-12 | 1 | -6/+18 |
| | | | | byte string. Should help with LP#1654544. | ||||
* | Make class Py2 compatible. | Stefan Behnel | 2017-08-12 | 1 | -1/+1 |
| | |||||
* | Merge pull request #232 from ondergetekende/1654544 | scoder | 2017-08-12 | 1 | -5/+38 |
|\ | | | | | Fix LP1654544 | ||||
| * | Make sure the html5lib tests are included in CI | Koert van der Veer | 2017-03-20 | 1 | -20/+22 |
| | | |||||
| * | Build a retry mechanism around html5lib's unpredictable useChardet support | Koert van der Veer | 2017-03-16 | 1 | -2/+33 |
| | | | | | | | | Closes LP1654544 | ||||
* | | improve type check and comment | Stefan Behnel | 2017-03-18 | 1 | -4/+4 |
| | | |||||
* | | Perform full-document detection on decoded bytes. | Koert van der Veer | 2017-03-16 | 1 | -1/+8 |
|/ | | | | Closes #1673355 | ||||
* | prevent 'abc' from being considered a drive letter | Stefan Behnel | 2014-01-23 | 1 | -1/+3 |
| | |||||
* | fix URL detection heuristic in html5parser under win32 | Stefan Behnel | 2014-01-20 | 1 | -1/+12 |
| | |||||
* | Fixes so that unit tests run under python 3.1 | Jeff Dairiki | 2012-04-01 | 1 | -8/+18 |
| | | | | | | | | | | | | | Note however that while there is a python3 version of html5lib, it appears to be unmaintained, so the worth of all this is questionable. References: http://code.google.com/p/html5lib/issues/detail?id=144 http://code.google.com/p/html5lib/source/browse/#hg%2Fpython3 --HG-- extra : rebase_source : a4ce702ad841c25d63f4a6a56ea106bcd986bd47 | ||||
* | fix undefined names in html5parser.py | Stefan Behnel | 2012-03-24 | 1 | -2/+2 |
| | |||||
* | Added support for passing kwargs into html5lib parser. I.e ↵ | hankthetank | 2011-11-04 | 1 | -4/+4 |
| | | | | lxml.html.html5parser.HTMLParser(namespaceHTMLElements=False) which is needed in order to avoid the <html:div> namespacing. | ||||
* | replace html5lib integration with an import of the official lxml support in ↵ | Stefan Behnel | 2011-08-11 | 1 | -2/+4 |
| | | | | html5lib itself | ||||
* | [svn r4339] r5455@lenny: sbehnel | 2010-01-30 22:37:46 +0100 | scoder | 2010-01-30 | 1 | -13/+20 |
| | | | | | | | bug #511252: fix fragment parsing in lxml.html --HG-- branch : trunk | ||||
* | [svn r4338] r5454@lenny: sbehnel | 2010-01-30 22:10:29 +0100 | scoder | 2010-01-30 | 1 | -5/+10 |
| | | | | | | | bug #511252: fix fragment parsing in html5parser.py --HG-- branch : trunk | ||||
* | [svn r4312] r5409@lenny: sbehnel | 2010-01-21 13:22:23 +0100 | scoder | 2010-01-21 | 1 | -6/+12 |
| | | | | | | | do not require XHTMLParser in html5lib --HG-- branch : trunk | ||||
* | [svn r4005] r4861@delle: sbehnel | 2008-11-14 10:52:38 +0100 | scoder | 2008-11-16 | 1 | -2/+2 |
| | | | | | | | fixed missing imports and name errors --HG-- branch : trunk | ||||
* | [svn r3900] r4637@delle: sbehnel | 2008-07-16 08:55:48 +0200 | scoder | 2008-07-16 | 1 | -0/+164 |
html5lib parser module provided by Armin Ronacher --HG-- branch : trunk |