| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | Rewrite Unicode chunk parsing by directly encoding to UTF-8. | Stefan Behnel | 2021-07-18 | 1 | -43/+59 |
| | | | | | Previously, we required Py_UNICODE strings, which is inefficient since most strings in Py3 use the PEP-393 memory layout. | ||||
| * | Avoid globally overriding the libxml2 external entity resolver and instead ↵ | Stefan Behnel | 2020-05-23 | 1 | -11/+31 |
| | | | | | | | set it for each parser run. This improves the interoperability with other users of libxml2 in the system, such as libxmlsec. | ||||
| * | Tighten an assertion (string length must never be < 0). | Stefan Behnel | 2019-03-15 | 1 | -1/+1 |
| | | |||||
| * | Minor code cleanup. | Stefan Behnel | 2019-03-15 | 1 | -2/+1 |
| | | |||||
| * | Replace old Pyrex property syntax with @property decorators for read-only ↵ | Stefan Behnel | 2019-02-23 | 1 | -19/+19 |
| | | | | | properties, and resolve some Cython warnings. | ||||
| * | Remove redundant parentheses | Hugo | 2018-08-25 | 1 | -3/+3 |
| | | |||||
| * | Fix crash during GC when _ParserContext and _ParserSchemaValidationContext ↵ | Stefan Behnel | 2018-06-03 | 1 | -1/+2 |
| | | | | | | | instances participate in a reference cycle and the _ParserContext gets cleared by the GC before it can disconnect the _ParserSchemaValidationContext. Closes #266. | ||||
| * | Add missing huge_tree parameter to XMLParser | johnthagen | 2018-04-13 | 1 | -1/+1 |
| | | | | | | The `huge_tree` parameter is missing from the docstring definition of the `XMLParser` class. This causes PyCharm to report a false positive when this parameter is used: https://youtrack.jetbrains.com/issue/PY-21959 This PR adds the parameter to the docstring. | ||||
| * | Fix HTMLParser docstring. | stranac | 2018-03-08 | 1 | -1/+1 |
| | | |||||
| * | Add huge_tree support to HTMLParser. | stranac | 2018-03-08 | 1 | -1/+5 |
| | | |||||
| * | Avoid dependency on char signedness in C comparison. | Stefan Behnel | 2018-02-16 | 1 | -1/+2 |
| | | |||||
| * | Use f-strings for all string formatting for which it makes sense (i.e. does ↵ | Stefan Behnel | 2018-01-25 | 1 | -7/+6 |
| | | | | | not look unreadable). | ||||
| * | LP#1737825: Fix a crash during garbage collection when an iterparse run with ↵ | Stefan Behnel | 2018-01-06 | 1 | -0/+8 |
| | | | | | XMLSchema validation gets interrupted and not finished. | ||||
| * | Use xmlMalloc() instead of plain malloc() for allocating an xmlSAXHandler to ↵ | Stefan Behnel | 2018-01-06 | 1 | -1/+1 |
| | | | | | match the corresponding xmlFree() call in libxml2. | ||||
| * | Remove unnecessary pre-declarations. | Stefan Behnel | 2018-01-06 | 1 | -7/+0 |
| | | |||||
| * | Clean up exception classes and turn them into extension types. | Stefan Behnel | 2017-08-13 | 1 | -5/+4 |
| | | |||||
| * | LP#1703810: Implement explicit support for UTF-32 encodings in fromstring() ↵ | Stefan Behnel | 2017-08-12 | 1 | -5/+13 |
| | | | | | as libxml2 seems to fail parsing the BOM. | ||||
| * | LP#1703810: Implement explicit support for UTF-32 encodings in fromstring() ↵ | Stefan Behnel | 2017-08-12 | 1 | -2/+18 |
| | | | | | as libxml2 seems to fail parsing the BOM. | ||||
| * | whitespace | Stefan Behnel | 2016-12-10 | 1 | -1/+2 |
| | | |||||
| * | enable the collect_ids option also for HTML and add a simple test for it | Stefan Behnel | 2016-12-10 | 1 | -3/+3 |
| | | |||||
| * | Merge pull request #216 from plq/arskom | scoder | 2016-12-10 | 1 | -4/+6 |
| |\ | | | | | expose collect_ids for HTMLParser as well | ||||
| | * | expose collect_ids for HTMLParser | Burak Arslan | 2016-12-08 | 1 | -4/+6 |
| | | | |||||
| * | | Make XMLSyntaxError have normal SyntaxError metadata | Philipp A | 2016-12-04 | 1 | -5/+17 |
| |/ | |||||
| * | Add option to prevent default doctypes | Shadab Zafar | 2016-08-19 | 1 | -1/+4 |
| | | | | | | | | | Reason why this was added: https://github.com/mitmproxy/mitmproxy/issues/845 See post on mailing list: https://mailman-mail5.webfaction.com/pipermail/lxml/2016-August/007738.html | ||||
| * | GH#198: fix file path encoding and error handling in resolver code | Stefan Behnel | 2016-07-18 | 1 | -2/+11 |
| | | |||||
| * | Fix setting the base url for etree.Resolver.resolve_string | Michael van Tellingen | 2016-07-18 | 1 | -0/+2 |
| | | | | | | | | | See https://github.com/GNOME/libxml2/blob/master/parserInternals.c#L1549 We seem to get away with only setting the _filename so that relative url's are resolved based on the value. Fixes https://bugs.launchpad.net/lxml/+bug/1568167 | ||||
| * | change documented signature of parsers to use "type hints" in Py3 annotation ↵ | Stefan Behnel | 2016-03-17 | 1 | -2/+2 |
| | | | | | style (might work better with some external tools) | ||||
| * | propagate SAX exceptions immediately in HTML parser (used to continue parsing) | Stefan Behnel | 2015-09-23 | 1 | -1/+12 |
| | | |||||
| * | code cleanup | Stefan Behnel | 2015-09-04 | 1 | -4/+3 |
| | | |||||
| * | Simplify encoding detection. | Olli Pottonen | 2015-06-21 | 1 | -25/+13 |
| | | | | | | | Use libxml2 properly, that is, give xmlCtxtResetPush() at least 4 bytes of xml data. Then it properly processes byte order mark, and it need not be processed in lxml. | ||||
| * | remove useless declarations | Stefan Behnel | 2015-03-02 | 1 | -2/+0 |
| | | |||||
| * | clean up code, use faster instantiation for thread dict context | Stefan Behnel | 2015-03-02 | 1 | -7/+2 |
| | | |||||
| * | use per-document hash tables for XML IDs and allow disabling them completely ↵ | Stefan Behnel | 2014-05-28 | 1 | -19/+62 |
| | | | | | with collect_ids=False | ||||
| * | minor doc fixes | Stefan Behnel | 2014-05-25 | 1 | -3/+3 |
| | | |||||
| * | only apply decoding error change to XML parsing (not HTML for now) | Stefan Behnel | 2014-05-24 | 1 | -1/+1 |
| | | |||||
| * | raise a parser error even in recovery mode when encountering undecodable ↵ | Stefan Behnel | 2014-05-24 | 1 | -3/+12 |
| | | | | | input to avoid having to deal with mixed-encoding trees | ||||
| * | minor code cleanup | Stefan Behnel | 2014-05-24 | 1 | -2/+3 |
| | | |||||
| * | remove legacy code for now unsupported libxml2/libxslt versions | Stefan Behnel | 2014-03-22 | 1 | -7/+1 |
| | | | | | | --HG-- extra : amend_source : 5f766bb41c74b8ea7bba7f71905fb18cb90a19f2 | ||||
| * | use XML_PARSE_BIG_LINES parser option if available (libxml2 2.9.0+) | Stefan Behnel | 2014-03-18 | 1 | -1/+2 |
| | | |||||
| * | remove some legacy code | Stefan Behnel | 2014-03-10 | 1 | -6/+1 |
| | | |||||
| * | improve docstring description of "remove_blank_text" parser option | Stefan Behnel | 2014-02-28 | 1 | -2/+2 |
| | | |||||
| * | undo doc freeing change: crashes when doc has already been used elsewhere | Stefan Behnel | 2014-02-25 | 1 | -2/+0 |
| | | |||||
| * | safety fix: free parsed document if it's left in the parser context for some ↵ | Stefan Behnel | 2014-02-25 | 1 | -0/+2 |
| | | | | | reason | ||||
| * | fix corner case where name of HTML root node was not put into parser dict | Stefan Behnel | 2014-01-31 | 1 | -2/+2 |
| | | |||||
| * | fix up tag dict usage also for the feed parser | Stefan Behnel | 2014-01-31 | 1 | -2/+37 |
| | | |||||
| * | implement iterparse() parsing of BOM prefixed files | Stefan Behnel | 2014-01-29 | 1 | -0/+21 |
| | | |||||
| * | fix several error/exception handling cases throughout the code base | Stefan Behnel | 2014-01-17 | 1 | -15/+23 |
| | | |||||
| * | _ParserContext.__dealloc__() doesn't need to disconnect its XMLSchema ↵ | Stefan Behnel | 2014-01-17 | 1 | -3/+0 |
| | | | | | validator since the validators __dealloc__() does it anyway | ||||
| * | fix GC crashes | Stefan Behnel | 2014-01-17 | 1 | -0/+1 |
| | | |||||
| * | provide Py_UNICODE parsing fallback even in Py3.3+ (might be useful for ↵ | Stefan Behnel | 2014-01-16 | 1 | -23/+23 |
| | | | | | | | | Windows systems) --HG-- extra : amend_source : 2278c552b8be65e7eaf1075ec72f359cab4bd2ac | ||||
