delta/libxml2.git - gitlab.gnome.org: GNOME/libxml2.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	error: Limit number of parser errors	Nick Wellnhofer	2022-12-27	1	-0/+5
\| \| \| \| \| \| \|	Reporting errors is expensive and some abusive test cases can generate an error for each invalid input byte. This causes the parser to spend most of the time with error handling. Limit the number of errors and warnings to 100.
*	Remove hacky heuristic from b2dc5675e94aa6b5557ba63f7d66b0f08dd17e4d	Alex Richardson	2022-12-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Checking whether the context is close to the parent context by hardcoding 250 is not portable (I noticed tests were failing on Morello since the value is 288 there due to pointers being 128 bits). Instead we should ensure that the XML_VCTXT_USE_PCTXT flag is not set in cases where the user data is not actually a parser context (or ideally add a separate field but that would be an ABI break. From what I can see in the source, the XML_VCTXT_USE_PCTXT is only set if the userData field points to a valid context, and if this is not the case the flag should be cleared when changing userData rather than relying on the offset between the two. Looking at the history, I think d7cb33cf44aa688f24215c9cd398c1a26f0d25ff fixed most of the need for this workaround, but it looks like there are a few more locations that need updating; This commit changes two more places to set/clear/copy the XML_VCTXT_USE_PCTXT flag, so this heuristic should not be needed anymore. I've also drop two = NULL assignment in xmllint since this is not needed after a call to memset(). There was also an uninitialized vctxt.flags (and other fields) in `xmlShellValidate()`, which I've fixed by adding a memset() call.
*	Avoid creating an out-of-bounds pointer by rewriting a check	Alex Richardson	2022-12-01	1	-1/+1
\| \| \| \| \| \| \|	Creating more than one-past-the-end pointers is undefined behaviour in C and while this code is unlikely to be miscompiled, I discovered that an out-of-bounds pointer is being created using UBSan on a CHERI-enabled system.
*	html: Improve parsing of nested lists	Nick Wellnhofer	2022-11-30	1	-2/+0
\| \| \| \| \| \| \|	Allow ul/ol as immediate children of ul/ol. This is more in line with the HTML5 spec. Fixes #447.
*	html: Fix htmlInitAutoClose documentation	Nick Wellnhofer	2022-11-27	1	-4/+1
\|
*	html: Fix check for end of comment in push parser	Nick Wellnhofer	2022-11-20	1	-6/+14
\| \| \| \| \|	Make sure to reset checkIndex. Handle case where "--" or "--!" is at the end of the buffer. Fix "avail" check in htmlParseOrTryFinish.
*	parser: Rewrite push parser boundary checks	Nick Wellnhofer	2022-11-20	1	-51/+16
\| \| \| \| \| \| \| \| \| \| \|	Remove inaccurate xmlParseCheckTransition check. Remove non-incremental xmlParseGetLasts check. Add functions that check for several boundary constructs more accurately, keeping track of progress in ctxt->checkIndex. Fixes #439.
*	Remove or annotate char casts	Nick Wellnhofer	2022-09-01	1	-2/+2
\|
*	Don't use sizeof(xmlChar) or sizeof(char)	Nick Wellnhofer	2022-09-01	1	-7/+7
\|
*	Remove explicit integer casts	Nick Wellnhofer	2022-09-01	1	-10/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.
*	Make xmlNewSAXParserCtx take a const sax handler	Nick Wellnhofer	2022-09-01	1	-3/+5
\| \| \| \|	Also improve documentation.
*	Consolidate private header files	Nick Wellnhofer	2022-08-26	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \|	Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.
*	Deprecate internal parser functions	Nick Wellnhofer	2022-08-25	1	-0/+6
\|
*	Deprecate old HTML SAX API	Nick Wellnhofer	2022-08-25	1	-0/+4
\|
*	Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt	Nick Wellnhofer	2022-08-24	1	-27/+33
\| \| \| \| \|	Add API functions to create a parser context with a custom SAX handler without having to mess with ctxt->sax manually.
*	Don't mess with parser options in htmlParseDocument	Nick Wellnhofer	2022-08-24	1	-2/+1
\| \| \| \| \| \|	Don't set ctxt->html. This member should already be initialized. Set ctxt->linenumbers in htmlCtxtUseOptions like the XML parser does.
*	Remove useless call to htmlDefaultSAXHandlerInit	Nick Wellnhofer	2022-08-24	1	-2/+0
\| \| \| \|	This function is already called from xmlInitParser.
*	Remove htmlDefaultSAXHandler from non-SAX1 build	Nick Wellnhofer	2022-08-22	1	-0/+2
\| \| \| \|	This matches long-standing behavior of the XML counterpart.
*	Don't initialize SAX handler in htmlReadMemory	Nick Wellnhofer	2022-08-22	1	-3/+0
\| \| \| \| \|	The SAX handler is already initialized when creating the parser context.
*	Fix htmlReadMemory mixing up XML and HTML functions	Nick Wellnhofer	2022-08-22	1	-1/+1
\| \| \| \|	Also see fe6890e2.
*	Don't use default SAX handler to report unrelated errors	Nick Wellnhofer	2022-08-22	1	-5/+0
\|
*	Fix HTML parser with threads and --without-legacy	Nick Wellnhofer	2022-08-22	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the legacy functions are disabled, the default "V1" HTML SAX handler isn't initialized in threads other than the main thread. htmlInitParserCtxt would later use the empty V1 SAX handler, resulting in NULL documents. Change htmlInitParserCtxt to initialize the HTML SAX handler by calling xmlSAX2InitHtmlDefaultSAXHandler. This removes the ability to change the default handler but is more in line with the XML parser which initializes the SAX handler by calling xmlSAXVersion, ignoring the V1 default handler. Fixes #399.
*	Use xmlStrlen in *CtxtReadDoc	Nick Wellnhofer	2022-08-20	1	-5/+2
\| \| \| \|	xmlStrlen handles buffers larger than INT_MAX more gracefully.
*	Fix xmlCtxtReadDoc with encoding	Nick Wellnhofer	2022-08-20	1	-13/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	xmlCtxtReadDoc used to create an input stream involving xmlNewStringInputStream. This would create a stream without an input buffer, causing problems with encodings (see #34). After commit aab584dc3, an error was returned even with UTF-8 encodings which happened to work before. Make xmlCtxtReadDoc call xmlCtxtReadMemory which doesn't suffer from these issues. Also fix htmlCtxtReadDoc. Fixes #397.
*	Skip incorrectly opened HTML comments	Nick Wellnhofer	2022-08-02	1	-60/+85
\| \| \| \| \| \| \| \|	Commit 4fd69f3e fixed handling of '<' characters not followed by an ASCII letter. But a '<!' sequence followed by invalid characters should be treated as bogus comment and skipped. Fixes #380.
*	Reduce indentation in HTMLparser.c	Nick Wellnhofer	2022-08-02	1	-199/+197
\| \| \| \|	No functional change.
*	Also reset nsNr in htmlCtxtReset	Nick Wellnhofer	2022-07-28	1	-0/+2
\|
*	Prevent integer-overflow in htmlSkipBlankChars() and xmlSkipBlankChars()	David Kilzer	2022-04-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	* HTMLparser.c: (htmlSkipBlankChars): * parser.c: (xmlSkipBlankChars): - Cap the return value at INT_MAX. - The commit range that OSS-Fuzz listed for the fix didn't make any changes to xmlSkipBlankChars(), so it seems like this issue may still exist. Found by OSS-Fuzz Issue 44803.
*	Deprecate module init and cleanup functions	Nick Wellnhofer	2022-03-06	1	-0/+3
\| \| \| \| \| \|	These functions shouldn't be part of the public API. Most init functions are only thread-safe when called from xmlInitParser. Global variables should only be cleaned up by calling xmlCleanupParser.
*	Remove unneeded #includes	Nick Wellnhofer	2022-03-04	1	-13/+0
\|
*	htmlParseComment: handle abruptly-closed comments	Mike Dalessio	2022-03-02	1	-0/+11
\| \| \| \| \| \|	See guidance provided on abrutply-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment
*	Don't check for standard C89 headers	Nick Wellnhofer	2022-03-02	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h
*	Fix recovery from invalid HTML start tags	Nick Wellnhofer	2022-02-22	1	-23/+21
\| \| \| \| \| \| \| \| \| \|	Only try to parse a start tag if there's a '<' followed by an ASCII letter. This is more in line with HTML5 and the old behavior in recovery mode. Emit a literal '<' if the following character is invalid. Fixes #101. Fixes #339.
*	Remove elfgcchack.h	Nick Wellnhofer	2022-02-20	1	-2/+0
\| \| \| \| \|	The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.
*	Rework validation context flags	Nick Wellnhofer	2022-02-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Use a bitmask instead of magic values to - keep track whether the validation context is part of a parser context - keep track whether xmlValidateDtdFinal was called This allows to add addtional flags later. Note that this deliberately changes the name of a public struct member, assuming that this was always private data never to be used by client code.
*	Also register HTML document nodes	Nick Wellnhofer	2022-02-01	1	-0/+2
\| \| \| \|	Fixes #196.
*	Fix htmlReadFd, which was using a mix of xml and html context functions	Finn Barber	2022-01-16	1	-5/+7
\|
*	Fix memory leak in xmlFreeParserInputBuffer	David King	2022-01-16	1	-0/+1
\| \| \| \| \| \|	Found by Coverity. https://bugzilla.redhat.com/show_bug.cgi?id=1938806
*	Different approach to fix quadratic behavior in HTML push parser	Nick Wellnhofer	2022-01-10	1	-1/+13
\| \| \| \| \| \| \| \|	The old approach introduced a regression, see issue #312 and the previous commit. Disable code that tries to recover from invalid start tags. This only affects "recovery" mode. Add a comment outlining a better fix in accordance with the HTML5 spec.
*	Fix regression when parsing invalid HTML tags in push mode	Nick Wellnhofer	2022-01-10	1	-24/+4
\| \| \| \| \| \| \| \| \|	Revert part of commit 173a0830 that changed behavior when parsing malformed start tags with the push parser. This reintroduces quadratic behavior in recovery mode which will be worked around in the next commit. Fixes #312.
*	Fix regression parsing public IDs literals in HTML	Nick Wellnhofer	2022-01-10	1	-1/+1
\| \| \| \| \| \| \|	Fix regression introduced when reworking htmlParsePubidLiteral in commit 93ce33c2. Fixes #318.
*	Fix htmlTagLookup	Nick Wellnhofer	2021-05-06	1	-2/+2
\| \| \| \| \| \| \| \|	Fix regression introduced with b25acce8. Some users like libxslt may call the HTML output functions on documents with uppercase tag names, so we must keep case-insensitive string comparison. Fixes #248.
*	Fix duplicate xmlStrEqual calls in htmlParseEndTag	Nick Wellnhofer	2021-03-04	1	-6/+4
\|
*	Speed up htmlCheckAutoClose	Nick Wellnhofer	2021-03-04	1	-136/+280
\| \| \| \|	Switch to binary search.
*	Speed up htmlTagLookup	Nick Wellnhofer	2021-03-04	1	-7/+13
\| \| \| \| \| \|	Switch to binary search. This is the first time bsearch is used in the libxml2 code base. But it's a standard library function since C89 and should be portable.
*	Revert "Improve HTML fuzzer stability"	Nick Wellnhofer	2021-02-22	1	-4/+0
\| \| \| \|	This reverts commit de1b51eddcc17fd7ed1bbcc6d5d7d529407dfbe2.
*	Improve HTML fuzzer stability	Nick Wellnhofer	2021-02-22	1	-0/+4
\| \| \| \| \|	Call htmlInitAutoClose during fuzzer initialization to fix stability issue. Leave a note concerning problems with this function.
*	Fix slow parsing of HTML with encoding errors	Nick Wellnhofer	2021-02-20	1	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under certain circumstances, the HTML parser would try to guess and switch input encodings multiple times, leading to slow processing of documents with encoding errors. The repeated scanning of the input buffer when guessing encodings could even lead to quadratic behavior. The code htmlCurrentChar probably assumed that if there's an encoding handler, it is guaranteed to produce valid UTF-8. This holds true in general, but if the detected encoding was "UTF-8", the UTF8ToUTF8 encoding handler simply invoked memcpy without checking for invalid UTF-8. This still must be fixed, preferably by not using this handler at all. Also leave a note that switching encodings twice seems impossible to implement correctly. Add a check when handling UTF-8 encoding errors in htmlCurrentChar to avoid this situation, even if encoders produce invalid UTF-8. Found by OSS-Fuzz.
*	Fix infinite loop in HTML parser introduced with recent commits	Nick Wellnhofer	2021-02-07	1	-1/+2
\| \| \| \| \| \| \|	Check for XML_PARSER_EOF to avoid an infinite loop introduced with recent changes to the HTML push parser. Found by OSS-Fuzz.
*	use new htmlParseLookupCommentEnd to find comment ends	Mike Dalessio	2020-12-16	1	-9/+37
\| \| \| \| \| \| \| \| \|	Note that the caret in error messages generated during comment parsing may have moved by one byte. See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment