summaryrefslogtreecommitdiff
path: root/entities.c
Commit message (Collapse)AuthorAgeFilesLines
* entities: Stop counting entitiesNick Wellnhofer2022-12-211-6/+5
| | | | This was only used in the old version of xmlParserEntityCheck.
* entities: Rework entity amplification checksNick Wellnhofer2022-12-211-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit implements robust detection of entity amplification attacks, better known as the "billion laughs" attack. We now limit the size of the document after substitution of entities to 10 times the size before expansion. This guarantees linear behavior by definition. There already was a similar check before, but the accounting of "sizeentities" (size of external entities) and "sizeentcopy" (size of all copies created by entity references) wasn't accurate. We also need saturation arithmetic since we're historically limited to "unsigned long" which is 32-bit on many platforms. A maximum of 10 MB of substitutions is always allowed. This should make use cases like DITA work which have caused problems in the past. The old checks based on the number of entities were removed. This is accounted for by adding a fixed cost to each entity reference. Entity amplification checks are now enabled even if XML_PARSE_HUGE is set. This option is mainly used to allow larger text nodes. Most users were unaware that it also disabled entity expansion checks. Some of the limits might be adjusted later. If this change turns out to affect legitimate use cases, we can add a separate parser option to disable the checks. Fixes #294. Fixes #345.
* entities: Add "flags" member to struct xmlEntityNick Wellnhofer2022-12-191-5/+5
| | | | | This will hold various flags and eventually replace the "checked" member.
* buf: Deprecate static/immutable buffersNick Wellnhofer2022-11-201-1/+0
|
* [CVE-2022-40304] Fix dict corruption caused by entity reference cyclesNick Wellnhofer2022-10-141-39/+16
| | | | | | | | | | | | | | | When an entity reference cycle is detected, the entity content is cleared by setting its first byte to zero. But the entity content might be allocated from a dict. In this case, the dict entry becomes corrupted leading to all kinds of logic errors, including memory errors like double-frees. Stop storing entity content, orig, ExternalID and SystemID in a dict. These values are unlikely to occur multiple times in a document, so they shouldn't have been stored in a dict in the first place. Thanks to Ned Williamson and Nathan Wachholz working with Google Project Zero for the report!
* Don't use sizeof(xmlChar) or sizeof(char)Nick Wellnhofer2022-09-011-2/+2
|
* Consolidate private header filesNick Wellnhofer2022-08-261-1/+2
| | | | | | | | | | | Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.
* Don't check for standard C89 headersNick Wellnhofer2022-03-021-2/+1
| | | | | | | | | | | | | | | | | | | | Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h
* Fix documentation in entities.cNick Wellnhofer2022-02-201-2/+2
|
* Remove elfgcchack.hNick Wellnhofer2022-02-201-2/+0
| | | | | The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.
* Only warn on invalid redeclarations of predefined entitiesNick Wellnhofer2022-02-201-2/+19
| | | | | | | | | | | | Downgrade the error message to a warning since the error was ignored, anyway. Also print the name of redeclared entity. For a proper fix that also shows filename and line number of the invalid redeclaration, we'd have to - pass the parser context to the entity functions somehow, or - make these functions return distinct error codes. Partial fix for #308.
* Validate UTF8 in xmlEncodeEntitiesJoel Hockey2021-04-221-1/+15
| | | | | | | | | Code is currently assuming UTF-8 without validating. Truncated UTF-8 input can cause out-of-bounds array access. Adds further checks to partial fix in 50f06b3e. Fixes #178
* Fix null deref introduced with previous commitNick Wellnhofer2021-02-091-1/+2
| | | | Found by OSS-Fuzz.
* Check for invalid redeclarations of predefined entitiesNick Wellnhofer2021-02-081-1/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.
* Fix typosNick Wellnhofer2020-03-081-1/+1
| | | | Resolves #133.
* Large batch of typo fixesJared Yanovich2019-09-301-3/+3
| | | | Closes #109.
* Fix hash callback signaturesNick Wellnhofer2017-11-091-10/+11
| | | | | | | | | Make sure that all parameters and return values of hash callback functions exactly match the callback function type. This is required to pass clang's Control Flow Integrity checks and to allow compilation to asm.js with Emscripten. Fixes bug 784861.
* Porting libxml2 on zOS encoding of codeStéphane Michaut2017-08-281-0/+5
| | | | | | | | | | First set of patches for zOS - entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c: ask conversion of code to ISO Latin 1 to avoid having the compiler assume EBCDIC codepoint for characters. - xmlmodule.c: make sure we have support for modules - xmlIO.c: zOS path names are special avoid dsome of the expectstions from Unix/Windows
* Fix some format string warnings with possible format string vulnerabilityDavid Kilzer2016-05-231-1/+1
| | | | | | | | For https://bugzilla.gnome.org/show_bug.cgi?id=761029 Decorate every method in libxml2 with the appropriate LIBXML_ATTR_FORMAT(fmt,args) macro and add some cleanups following the reports.
* Fix and add const qualifiersKurt Roeckx2014-10-131-2/+2
| | | | | | | | | | | | | | | | | | For https://bugzilla.gnome.org/show_bug.cgi?id=689483 It seems there are functions that do use the const qualifier for some of the arguments, but it seems that there are a lot of functions that don't use it and probably should. So I created a patch against 2.9.0 that makes as much as possible const in tree.h, and changed other files as needed. There were a lot of cases like "const xmlNodePtr node". This doesn't actually do anything, there the *pointer* is constant not the object it points to. So I changed those to "const xmlNode *node". I also removed some consts, mostly in the Copy functions, because those functions can actually modify the doc or node they copy from
* Switched comment in file to UTF-8 encodingDaniel Veillard2013-03-301-1/+1
|
* Various cleanups to avoid compiler warningsDaniel Veillard2012-09-111-1/+4
|
* Big space and tab cleanupDaniel Veillard2012-09-111-12/+12
| | | | Remove all space before tabs and space and tabs at end of lines.
* Improve HTML escaping of attribute on outputDaniel Veillard2012-09-051-8/+78
| | | | | | | Handle special cases of &{...} constructs as hinted in the spec http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1 and special values as comment <!-- ... --> used for server side includes This is limited to attribute values in HTML content.
* Fix an error in previous commitAron Xu2012-07-201-1/+1
|
* Fix entities local buffers size problemsDaniel Veillard2012-07-181-13/+23
|
* Fix a bunch of scan 'dead increments' and cleanupDaniel Veillard2009-09-051-2/+2
| | | | | | | | * HTMLparser.c c14n.c debugXML.c entities.c nanohttp.c parser.c testC14N.c uri.c xmlcatalog.c xmllint.c xmlregexp.c xpath.c: fix unused variables, or unneeded increments as well as a couple of space issues * runtest.c: check for NULL before calling unlink()
* applied patch from Aswin to fix tree skipping fixed a comment and added aDaniel Veillard2008-08-251-33/+89
| | | | | | | | | | | * xmlreader.c: applied patch from Aswin to fix tree skipping * include/libxml/entities.h entities.c: fixed a comment and added a new xmlNewEntity() entry point * runtest.c: be less verbose * tree.c: space and tabs cleanups daniel svn path=/trunk/; revision=3774
* rework the patch to avoid some ABI issue with people allocating entitiesDaniel Veillard2008-08-251-5/+5
| | | | | | | | | * include/libxml/entities.h entities.c SAX2.c parser.c: rework the patch to avoid some ABI issue with people allocating entities structure directly Daniel svn path=/trunk/; revision=3773
* fix for CVE-2008-3281 DanielDaniel Veillard2008-08-201-5/+5
| | | | | | | | * include/libxml/parser.h include/libxml/entities.h entities.c parserInternals.c parser.c: fix for CVE-2008-3281 Daniel svn path=/trunk/; revision=3772
* trying to fix entities behaviour when using SAX, had to extend entitiesDaniel Veillard2006-10-101-5/+6
| | | | | | | | * include/libxml/entities.h entities.c SAX2.c parser.c: trying to fix entities behaviour when using SAX, had to extend entities content and hack on the entities processing code, but that should fix the long standing bug #159219 Daniel
* more cleanups based on coverity reports. DanielDaniel Veillard2006-03-091-5/+2
| | | | | | * SAX2.c catalog.c encoding.c entities.c example/gjobread.c python/libxml.c: more cleanups based on coverity reports. Daniel
* revamped the elfgcchack.h format to cope with gcc4 change of aliasingDaniel Veillard2005-04-011-0/+2
| | | | | | | | | | | * doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h format to cope with gcc4 change of aliasing allowed scopes, had to add extra informations to doc/libxml2-api.xml to separate the header from the c module source. * *.c: updated all c library files to add a #define bottom_xxx and reimport elfgcchack.h thereafter, and a bit of cleanups. * doc//* testapi.c: regenerated when rebuilding the API Daniel
* added xmlHashCreateDict where the hash reuses the dictionnary for internalDaniel Veillard2005-01-231-2/+3
| | | | | | | | * hash.c include/libxml/hash.h: added xmlHashCreateDict where the hash reuses the dictionnary for internal strings * entities.c valid.c parser.c: reuse that new API, leads to a decent speedup when parsing for example DocBook documents. Daniel
* small speedup in skipping blanks characters interning the entities stringsDaniel Veillard2005-01-231-22/+69
| | | | | | * parser.c: small speedup in skipping blanks characters * entities.c: interning the entities strings Daniel
* autogenerate a minimal NULL value sequence for unknown pointer types ThisDaniel Veillard2004-11-051-0/+1
| | | | | | | | | | | | * gentest.py testapi.c: autogenerate a minimal NULL value sequence for unknown pointer types * HTMLparser.c SAX2.c chvalid.c encoding.c entities.c parser.c parserInternals.c relaxng.c valid.c xmlIO.c xmlreader.c xmlsave.c xmlschemas.c xmlschemastypes.c xmlstring.c xpath.c xpointer.c: This uncovered an impressive amount of entry points not checking for NULL pointers when they ought to, closing all the open gaps. Daniel
* fixed a compilation problem on a recent change DanielDaniel Veillard2004-11-051-2/+0
| | | | | * entities.c: fixed a compilation problem on a recent change Daniel
* fixed the way the generator works, extended the testing, especially withDaniel Veillard2004-11-051-21/+45
| | | | | | | | | * gentest.py testapi.c: fixed the way the generator works, extended the testing, especially with more real trees and nodes. * HTMLtree.c tree.c valid.c xinclude.c xmlIO.c xmlsave.c: a bunch of real problems found and fixed. * entities.c: fix error reporting to go through the new handlers Daniel
* avoid returning default namespace when searching from an attribute reverseDaniel Veillard2004-05-171-2/+0
| | | | | | | | | | * tree.c: avoid returning default namespace when searching from an attribute * entities.c xmlwriter.c: reverse xmlEncodeSpecialChars() behaviour back to escaping " since the normal serialization routines do not use it anymore, should close bug #134477 . Tried to make the writer avoid it too but it didn't work. Daniel
* fixed an XML entites content serialization potentially triggered byDaniel Veillard2003-12-091-2/+44
| | | | | | * entities.c: fixed an XML entites content serialization potentially triggered by XInclude, see #126817 Daniel
* fixed #127877, never output &quot; in element content this changes theDaniel Veillard2003-11-251-0/+2
| | | | | | | | | * entities.c: fixed #127877, never output &quot; in element content * result/isolat3 result/slashdot16.xml result/noent/isolat3 result/noent/slashdot16.xml result/valid/REC-xml-19980210.xml result/valid/index.xml result/valid/xlink.xml: this changes the output of a few tests Daniel
* fixed problem reported on the mailing list by Melvyn Sopacua - wrongWilliam M. Brack2003-10-201-1/+13
| | | | | | * entities.c, valid.c: fixed problem reported on the mailing list by Melvyn Sopacua - wrong argument order on functions called through xmlHashScan.
* Changed all (?) occurences where validation macros (IS_xxx) hadWilliam M. Brack2003-10-181-10/+2
| | | | | | | | | | | | * include/libxml/parserInternals.h HTMLparser.c HTMLtree.c SAX2.c catalog.c debugXML.c entities.c parser.c relaxng.c testSAX.c tree.c valid.c xmlschemas.c xmlschemastypes.c xpath.c: Changed all (?) occurences where validation macros (IS_xxx) had single-byte arguments to use IS_xxx_CH instead (e.g. IS_BLANK changed to IS_BLANK_CH). This gets rid of many warning messages on certain platforms, and also high- lights places in the library which may need to be enhanced for proper UTF8 handling.
* Fix error on output of high codepoint charref like &#x10FFFF; , reportedDaniel Veillard2003-10-011-2/+2
| | | | | | * entities.c: Fix error on output of high codepoint charref like &#x10FFFF; , reported by Eric Hanchrow Daniel
* made the predefined entities static predefined structures to avoid theDaniel Veillard2003-09-301-67/+59
| | | | | | | * entities.c legacy.c parser.c: made the predefined entities static predefined structures to avoid the work, memory and hazards associated to initialization/cleanup. Daniel
* Adding a configure option to remove tree manipulation code which is notDaniel Veillard2003-09-291-0/+2
| | | | | | | | * configure.in entities.c tree.c valid.c xmllint.c include/libxml/tree.h include/libxml/xmlversion.h.in: Adding a configure option to remove tree manipulation code which is not strictly needed by the parser. Daniel
* Okay this is scary but it is just adding a configure option to disableDaniel Veillard2003-09-291-0/+2
| | | | | | | | | | | | | | | | | | * HTMLtree.c SAX2.c c14n.c catalog.c configure.in debugXML.c encoding.c entities.c nanoftp.c nanohttp.c parser.c relaxng.c testAutomata.c testC14N.c testHTML.c testRegexp.c testRelax.c testSchemas.c testXPath.c threads.c tree.c valid.c xmlIO.c xmlcatalog.c xmllint.c xmlmemory.c xmlreader.c xmlschemas.c example/gjobread.c include/libxml/HTMLtree.h include/libxml/c14n.h include/libxml/catalog.h include/libxml/debugXML.h include/libxml/entities.h include/libxml/nanohttp.h include/libxml/relaxng.h include/libxml/tree.h include/libxml/valid.h include/libxml/xmlIO.h include/libxml/xmlschemas.h include/libxml/xmlversion.h.in include/libxml/xpathInternals.h python/libxml.c: Okay this is scary but it is just adding a configure option to disable output, this touches most of the files. Daniel
* cleanup, creating a new legacy.c module, made sure make tests ran inDaniel Veillard2003-09-281-176/+0
| | | | | | | | | | | | | * Makefile.am: cleanup, creating a new legacy.c module, made sure make tests ran in reduced conditions * SAX.c SAX2.c configure.in entities.c globals.c parser.c parserInternals.c tree.c valid.c xlink.c xmlIO.c xmlcatalog.c xmlmemory.c xpath.c xmlmemory.c include/libxml/xmlversion.h.in: increased the modularization, allow to configure out validation code and legacy code, added a configuration option --with-minimum compiling only the mandatory code which then shrink to 200KB. Daniel
* fix a bug raised by the Mips compiler. move the SAXv1 block definitions toDaniel Veillard2003-09-281-1/+1
| | | | | | | | | | | * parser.c: fix a bug raised by the Mips compiler. * include/libxml/SAX.h include/libxml/parser.h: move the SAXv1 block definitions to parser.h fixes bug #123380 * xmlreader.c include/libxml/xmlreader.h: reinstanciate the attribute and element pool borken 2 commits ago. Start playing with an entry point to preserve a subtree. * entities.c: remove a warning. Daniel
* minor change to avoid compilation warnings on some (e.g. AIX) systemsWilliam M. Brack2003-09-261-1/+3
| | | | | * HTMLparser.c, entities.c, xmlreader.c: minor change to avoid compilation warnings on some (e.g. AIX) systems