summaryrefslogtreecommitdiff
path: root/src/lxml/parser.pxi
Commit message (Collapse)AuthorAgeFilesLines
* Rewrite Unicode chunk parsing by directly encoding to UTF-8.Stefan Behnel2021-07-181-43/+59
| | | | Previously, we required Py_UNICODE strings, which is inefficient since most strings in Py3 use the PEP-393 memory layout.
* Avoid globally overriding the libxml2 external entity resolver and instead ↵Stefan Behnel2020-05-231-11/+31
| | | | | | set it for each parser run. This improves the interoperability with other users of libxml2 in the system, such as libxmlsec.
* Tighten an assertion (string length must never be < 0).Stefan Behnel2019-03-151-1/+1
|
* Minor code cleanup.Stefan Behnel2019-03-151-2/+1
|
* Replace old Pyrex property syntax with @property decorators for read-only ↵Stefan Behnel2019-02-231-19/+19
| | | | properties, and resolve some Cython warnings.
* Remove redundant parenthesesHugo2018-08-251-3/+3
|
* Fix crash during GC when _ParserContext and _ParserSchemaValidationContext ↵Stefan Behnel2018-06-031-1/+2
| | | | | | instances participate in a reference cycle and the _ParserContext gets cleared by the GC before it can disconnect the _ParserSchemaValidationContext. Closes #266.
* Add missing huge_tree parameter to XMLParserjohnthagen2018-04-131-1/+1
| | | | | The `huge_tree` parameter is missing from the docstring definition of the `XMLParser` class. This causes PyCharm to report a false positive when this parameter is used: https://youtrack.jetbrains.com/issue/PY-21959 This PR adds the parameter to the docstring.
* Fix HTMLParser docstring.stranac2018-03-081-1/+1
|
* Add huge_tree support to HTMLParser.stranac2018-03-081-1/+5
|
* Avoid dependency on char signedness in C comparison.Stefan Behnel2018-02-161-1/+2
|
* Use f-strings for all string formatting for which it makes sense (i.e. does ↵Stefan Behnel2018-01-251-7/+6
| | | | not look unreadable).
* LP#1737825: Fix a crash during garbage collection when an iterparse run with ↵Stefan Behnel2018-01-061-0/+8
| | | | XMLSchema validation gets interrupted and not finished.
* Use xmlMalloc() instead of plain malloc() for allocating an xmlSAXHandler to ↵Stefan Behnel2018-01-061-1/+1
| | | | match the corresponding xmlFree() call in libxml2.
* Remove unnecessary pre-declarations.Stefan Behnel2018-01-061-7/+0
|
* Clean up exception classes and turn them into extension types.Stefan Behnel2017-08-131-5/+4
|
* LP#1703810: Implement explicit support for UTF-32 encodings in fromstring() ↵Stefan Behnel2017-08-121-5/+13
| | | | as libxml2 seems to fail parsing the BOM.
* LP#1703810: Implement explicit support for UTF-32 encodings in fromstring() ↵Stefan Behnel2017-08-121-2/+18
| | | | as libxml2 seems to fail parsing the BOM.
* whitespaceStefan Behnel2016-12-101-1/+2
|
* enable the collect_ids option also for HTML and add a simple test for itStefan Behnel2016-12-101-3/+3
|
* Merge pull request #216 from plq/arskomscoder2016-12-101-4/+6
|\ | | | | expose collect_ids for HTMLParser as well
| * expose collect_ids for HTMLParserBurak Arslan2016-12-081-4/+6
| |
* | Make XMLSyntaxError have normal SyntaxError metadataPhilipp A2016-12-041-5/+17
|/
* Add option to prevent default doctypesShadab Zafar2016-08-191-1/+4
| | | | | | | | Reason why this was added: https://github.com/mitmproxy/mitmproxy/issues/845 See post on mailing list: https://mailman-mail5.webfaction.com/pipermail/lxml/2016-August/007738.html
* GH#198: fix file path encoding and error handling in resolver codeStefan Behnel2016-07-181-2/+11
|
* Fix setting the base url for etree.Resolver.resolve_stringMichael van Tellingen2016-07-181-0/+2
| | | | | | | | See https://github.com/GNOME/libxml2/blob/master/parserInternals.c#L1549 We seem to get away with only setting the _filename so that relative url's are resolved based on the value. Fixes https://bugs.launchpad.net/lxml/+bug/1568167
* change documented signature of parsers to use "type hints" in Py3 annotation ↵Stefan Behnel2016-03-171-2/+2
| | | | style (might work better with some external tools)
* propagate SAX exceptions immediately in HTML parser (used to continue parsing)Stefan Behnel2015-09-231-1/+12
|
* code cleanupStefan Behnel2015-09-041-4/+3
|
* Simplify encoding detection.Olli Pottonen2015-06-211-25/+13
| | | | | | Use libxml2 properly, that is, give xmlCtxtResetPush() at least 4 bytes of xml data. Then it properly processes byte order mark, and it need not be processed in lxml.
* remove useless declarationsStefan Behnel2015-03-021-2/+0
|
* clean up code, use faster instantiation for thread dict contextStefan Behnel2015-03-021-7/+2
|
* use per-document hash tables for XML IDs and allow disabling them completely ↵Stefan Behnel2014-05-281-19/+62
| | | | with collect_ids=False
* minor doc fixesStefan Behnel2014-05-251-3/+3
|
* only apply decoding error change to XML parsing (not HTML for now)Stefan Behnel2014-05-241-1/+1
|
* raise a parser error even in recovery mode when encountering undecodable ↵Stefan Behnel2014-05-241-3/+12
| | | | input to avoid having to deal with mixed-encoding trees
* minor code cleanupStefan Behnel2014-05-241-2/+3
|
* remove legacy code for now unsupported libxml2/libxslt versionsStefan Behnel2014-03-221-7/+1
| | | | | --HG-- extra : amend_source : 5f766bb41c74b8ea7bba7f71905fb18cb90a19f2
* use XML_PARSE_BIG_LINES parser option if available (libxml2 2.9.0+)Stefan Behnel2014-03-181-1/+2
|
* remove some legacy codeStefan Behnel2014-03-101-6/+1
|
* improve docstring description of "remove_blank_text" parser optionStefan Behnel2014-02-281-2/+2
|
* undo doc freeing change: crashes when doc has already been used elsewhereStefan Behnel2014-02-251-2/+0
|
* safety fix: free parsed document if it's left in the parser context for some ↵Stefan Behnel2014-02-251-0/+2
| | | | reason
* fix corner case where name of HTML root node was not put into parser dictStefan Behnel2014-01-311-2/+2
|
* fix up tag dict usage also for the feed parserStefan Behnel2014-01-311-2/+37
|
* implement iterparse() parsing of BOM prefixed filesStefan Behnel2014-01-291-0/+21
|
* fix several error/exception handling cases throughout the code baseStefan Behnel2014-01-171-15/+23
|
* _ParserContext.__dealloc__() doesn't need to disconnect its XMLSchema ↵Stefan Behnel2014-01-171-3/+0
| | | | validator since the validators __dealloc__() does it anyway
* fix GC crashesStefan Behnel2014-01-171-0/+1
|
* provide Py_UNICODE parsing fallback even in Py3.3+ (might be useful for ↵Stefan Behnel2014-01-161-23/+23
| | | | | | | Windows systems) --HG-- extra : amend_source : 2278c552b8be65e7eaf1075ec72f359cab4bd2ac