summaryrefslogtreecommitdiff
path: root/HTMLtree.c
Commit message (Collapse)AuthorAgeFilesLines
* html: Don't escape ASCII chars in href attributesNick Wellnhofer2022-11-201-3/+8
| | | | | | | In several cases, href attributes can contain ASCII characters which are illegal in URIs. Escaping them often does more harm than good. Fixes #321.
* Remove explicit integer castsNick Wellnhofer2022-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.
* Consolidate private header filesNick Wellnhofer2022-08-261-8/+4
| | | | | | | | | | | Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.
* Restore behavior of htmlDocContentDumpFormatOutput()David Kilzer2022-05-141-0/+7
| | | | | | | | | | Patch by J Pascoe of Apple. * HTMLtree.c: (htmlDocContentDumpFormatOutput): - Prior to commit b79ab6e6d92, xmlDoc.type was set to XML_HTML_DOCUMENT_NODE before dumping the HTML output, then restored before returning.
* Mark more static data as `const`David Kilzer2022-04-071-1/+1
| | | | | | | | | Similar to 8f5710379, mark more static data structures with `const` keyword. Also fix placement of `const` in encoding.c. Original patch by Sarah Wilkin.
* Don't check for standard C89 headersNick Wellnhofer2022-03-021-5/+0
| | | | | | | | | | | | | | | | | | | | Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h
* Remove elfgcchack.hNick Wellnhofer2022-02-201-2/+0
| | | | | The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.
* Fix whitespace when serializing empty HTML documentsNick Wellnhofer2021-06-071-5/+9
| | | | | | | | | The old, non-recursive HTML serialization code would always terminate the output with a newline. The new implementation omitted the newline if the document node had no children. Readd the newline when serializing empty documents. Fixes #266.
* Work around lxml API abuseNick Wellnhofer2021-05-211-18/+28
| | | | | | | | | | | | Make xmlNodeDumpOutput and htmlNodeDumpFormatOutput work with corrupted parent pointers. This used to work with the old recursive code but the non-recursive rewrite required parent pointers to be set correctly. Unfortunately, lxml relies on the old behavior and passes subtrees with a corrupted structure. Fall back to a recursive function call if an invalid parent pointer is detected. Fixes #255.
* Remove unused encoding parameter of HTML output functionsNick Wellnhofer2021-02-071-17/+17
| | | | | The encoding string is unused. Encodings are set by way of the output buffer.
* Handle dumps of corrupted documents more gracefullyNick Wellnhofer2020-09-291-0/+6
| | | | | | Check parent pointers for NULL after the non-recursive rewrite of the serialization code. This avoids segfaults with corrupted documents which can apparently be seen with lxml, see issue #187.
* Revert "Do not URI escape in server side includes"Nick Wellnhofer2020-08-151-38/+11
| | | | | | | | | | | | | | This reverts commit 960f0e275616cadc29671a218d7fb9b69eb35588. This commit introduced - an infinite loop, found by OSS-Fuzz, which could be easily fixed. - an algorithm with quadratic runtime - a security issue, see https://bugzilla.gnome.org/show_bug.cgi?id=769760 A better approach is to add an option not to escape URLs at all which libxml2 should have possibly done in the first place.
* Make htmlNodeDumpFormatOutput non-recursiveNick Wellnhofer2020-07-281-225/+185
| | | | | | Fixes stack overflow with deeply nested HTML documents. Found by OSS-Fuzz.
* Fix typosNick Wellnhofer2020-03-081-3/+3
| | | | Resolves #133.
* Large batch of typo fixesJared Yanovich2019-09-301-1/+1
| | | | Closes #109.
* Fix HTML serialization with UTF-8 encodingNick Wellnhofer2018-10-131-44/+40
| | | | | If the encoding is specified as UTF-8, make sure to use a NULL encoding handler.
* Stop using doc->charset outside parser codeNick Wellnhofer2018-10-131-34/+4
| | | | | doc->charset does not specify the in-memory encoding which is always UTF-8.
* Allow HTML serializer to output HTML5 DOCTYPEShaun McCance2015-04-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | For https://bugzilla.gnome.org/show_bug.cgi?id=747301 Use simple HTML5 DOCTYPE for about:legacy-compat HTML5 uses a DOCTYPE without a PUBLIC or SYSTEM identifier. It looks like this: <!DOCTYPE html> I can't use XSLT to output this, because to get a DOCTYPE I have to provide a PUBLIC or SYSTEM identifier. Luckily, the standards folks recognized this and provided this semantically equivalent form for the HTML DOCTYPE: <!DOCTYPE html SYSTEM "about:legacy-compat"> But people don't like seeing the "legacy" identifier in their output. They'd rather see the shiny new DOCTYPE. Since we know that about:legacy-compat is defined by the W3C to be semantically equivalent to the sans-SYSTEM DOCTYPE, we could just special-case it in the HTML serializer in libxml2. So if you set the SYSTEM identifier to "about:legacy-compat", you get an HTML5 short-form DOCTYPE.
* Do not URI escape in server side includesRomain Bondue2013-04-231-11/+38
|
* Big space and tab cleanupDaniel Veillard2012-09-111-18/+18
| | | | Remove all space before tabs and space and tabs at end of lines.
* Improve HTML escaping of attribute on outputDaniel Veillard2012-09-051-4/+9
| | | | | | | Handle special cases of &{...} constructs as hinted in the spec http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1 and special values as comment <!-- ... --> used for server side includes This is limited to attribute values in HTML content.
* Convert the HTML tree module to the new buffersDaniel Veillard2012-07-231-22/+35
| | | | | The new input buffers induced a couple of changes, the others are related to the switch to xmlBuf in saving routines.
* Fix html serialization error and htmlSetMetaEncoding()Daniel Veillard2012-05-111-3/+9
| | | | | | | | For https://bugzilla.gnome.org/show_bug.cgi?id=630682 The python tests were reporting errors, some of it was due to a small change in case encoding, but the main one was about htmlSetMetaEncoding(doc, NULL) being broken by not removing the associated meta tag anymore
* Add options to ignore the internal encodingDaniel Veillard2011-05-261-10/+8
| | | | | | | | | | | | | | | | For both XML and HTML, the document can provide an encoding either in XMLDecl in XML, or as a meta element in HTML head. This adds options to ignore those encodings if the encoding is known in advace for example if the content had been converted before being passed to the parser. * parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option for XML parsing * include/libxml/HTMLparser.h HTMLparser.c: adds the HTML_PARSE_IGNORE_ENC for HTML parsing * HTMLtree.c: fix the handling of saving when an unknown encoding is defined in meta document header * xmllint.c: add a --noenc option to activate the new parser options
* 582913 Fix htmlSetMetaEncoding() to be nicerDaniel Veillard2009-08-121-33/+38
| | | | | | | * HTMLtree.c: htmlSetMetaEncoding should not destroy existing meta encoding elements, plus it should not change things at all if the encoding is the same. Also fixed htmlSaveFileFormat() to ask for change if outputing to UTF-8.
* 575875 don't output charset=htmlDaniel Veillard2009-08-121-0/+4
| | | | | * HTMLtree.c: don't output charset=html in htmlSetMetaEncoding() as this is clearly a libxml2 only thingused for import only
* Borland C fix from Moritz Both regenerate, workaround a problem for bufferDaniel Veillard2008-09-011-1/+6
| | | | | | | | | | | | | | | | | | | * trionan.c: Borland C fix from Moritz Both * testapi.c: regenerate, workaround a problem for buffer testing * xmlIO.c HTMLtree.c: new internal entry point to hide even better xmlAllocOutputBufferInternal * tree.c: harden the code around buffer allocation schemes * parser.c: restore the warning when namespace names are not absolute URIs * runxmlconf.c: continue regression tests if we get the expected number of errors * Makefile.am: run the python tests on make check * xmlsave.c: handle the HTML documents and trees * python/libxml.c: convert python serialization to the xmlSave APIs and avoid some horrible hacks Daniel svn path=/trunk/; revision=3790
* htmlNodeDumpFormatOutput didn't handle XML_ATTRIBUTE_NODe fixes bugDaniel Veillard2007-06-121-0/+4
| | | | | | | | * HTMLtree.c: htmlNodeDumpFormatOutput didn't handle XML_ATTRIBUTE_NODe fixes bug #438390 Daniel svn path=/trunk/; revision=3631
* Add linefeeds to error messages allowing for consistant handling.Rob Richards2006-08-151-5/+5
| | | | | * HTMLtree.c xmlsave.c: Add linefeeds to error messages allowing for consistant handling.
* fix bug #322136 in xmlNodeBufGetContent when entity ref is a child of anRob Richards2005-12-201-3/+19
| | | | | | * tree.c: fix bug #322136 in xmlNodeBufGetContent when entity ref is a child of an element (fix by Oleksandr Kononenko). * HTMLtree.c include/libxml/HTMLtree.h: Add htmlDocDumpMemoryFormat.
* fixed bug #310333 with a patch close to the provided patch for HTML UTF-8Daniel Veillard2005-08-081-0/+4
| | | | | | | * HTMLtree.c: fixed bug #310333 with a patch close to the provided patch for HTML UTF-8 serialization * result/HTML/script2.html: this changed the output of that test Daniel
* revamped the elfgcchack.h format to cope with gcc4 change of aliasingDaniel Veillard2005-04-011-0/+2
| | | | | | | | | | | * doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h format to cope with gcc4 change of aliasing allowed scopes, had to add extra informations to doc/libxml2-api.xml to separate the header from the c module source. * *.c: updated all c library files to add a #define bottom_xxx and reimport elfgcchack.h thereafter, and a bit of cleanups. * doc//* testapi.c: regenerated when rebuilding the API Daniel
* fixing bug 168196, <a name=""> must be URI escaped too DanielDaniel Veillard2005-03-291-1/+3
| | | | | * HTMLtree.c: fixing bug 168196, <a name=""> must be URI escaped too Daniel
* augmented types supported a number of new bug fixes and documentationDaniel Veillard2004-11-061-0/+2
| | | | | | | * gentest.py testapi.c: augmented types supported * HTMLtree.c tree.c xmlreader.c xmlwriter.c: a number of new bug fixes and documentation updates. Daniel
* fixed the way the generator works, extended the testing, especially withDaniel Veillard2004-11-051-2/+3
| | | | | | | | | * gentest.py testapi.c: fixed the way the generator works, extended the testing, especially with more real trees and nodes. * HTMLtree.c tree.c valid.c xinclude.c xmlIO.c xmlsave.c: a bunch of real problems found and fixed. * entities.c: fix error reporting to go through the new handlers Daniel
* extending the tests coverage more fixes and cleanups DanielDaniel Veillard2004-11-041-1/+4
| | | | | | * gentest.py testapi.c: extending the tests coverage * HTMLtree.c tree.c xmlsave.c xpointer.c: more fixes and cleanups Daniel
* adding xmlMemBlocks() work on generator of an automatic API regressionDaniel Veillard2004-11-021-0/+6
| | | | | | | | | | | * xmlmemory.c include/libxml/xmlmemory.h: adding xmlMemBlocks() * Makefile.am gentest.py testapi.c: work on generator of an automatic API regression test tool. * SAX2.c nanoftp.c parser.c parserInternals.c tree.c xmlIO.c xmlstring.c: various API hardeing changes as a result of running teh first set of automatic API regression tests. * test/slashdot16.xml: apparently missing from CVS, commited it Daniel
* added the routine xmlNanoHTTPContentLength to the external APIWilliam M. Brack2004-09-181-1/+1
| | | | | | | | | | | | * nanohttp.c, include/libxml/nanohttp.h: added the routine xmlNanoHTTPContentLength to the external API (bug151968). * parser.c: fixed unnecessary internal error message (bug152060); also changed call to strncmp over to xmlStrncmp. * encoding.c: fixed compilation warning (bug152307). * tree.c: fixed segfault in xmlCopyPropList (bug152368); fixed a couple of compilation warnings. * HTMLtree.c, debugXML.c, xmlmemory.c: fixed a few compilation warnings; no change to logic.
* change --html to make sure we use the HTML serialization rule by defaultDaniel Veillard2003-11-041-5/+10
| | | | | | | | | | * xmllint.c: change --html to make sure we use the HTML serialization rule by default when HTML parser is used, add --xmlout to allow to force the XML serializer on HTML. * HTMLtree.c: ugly tweak to fix the output on <p> element and solve #125093 * result/HTML/*: this changes the output of some tests Daniel
* Changed all (?) occurences where validation macros (IS_xxx) hadWilliam M. Brack2003-10-181-1/+1
| | | | | | | | | | | | * include/libxml/parserInternals.h HTMLparser.c HTMLtree.c SAX2.c catalog.c debugXML.c entities.c parser.c relaxng.c testSAX.c tree.c valid.c xmlschemas.c xmlschemastypes.c xpath.c: Changed all (?) occurences where validation macros (IS_xxx) had single-byte arguments to use IS_xxx_CH instead (e.g. IS_BLANK changed to IS_BLANK_CH). This gets rid of many warning messages on certain platforms, and also high- lights places in the library which may need to be enhanced for proper UTF8 handling.
* converted too small cleanup DanielDaniel Veillard2003-10-091-20/+51
| | | | | | * HTMLtree.c include/libxml/xmlerror.h: converted too * tree.c: small cleanup Daniel
* Okay this is scary but it is just adding a configure option to disableDaniel Veillard2003-09-291-1/+2
| | | | | | | | | | | | | | | | | | * HTMLtree.c SAX2.c c14n.c catalog.c configure.in debugXML.c encoding.c entities.c nanoftp.c nanohttp.c parser.c relaxng.c testAutomata.c testC14N.c testHTML.c testRegexp.c testRelax.c testSchemas.c testXPath.c threads.c tree.c valid.c xmlIO.c xmlcatalog.c xmllint.c xmlmemory.c xmlreader.c xmlschemas.c example/gjobread.c include/libxml/HTMLtree.h include/libxml/c14n.h include/libxml/catalog.h include/libxml/debugXML.h include/libxml/entities.h include/libxml/nanohttp.h include/libxml/relaxng.h include/libxml/tree.h include/libxml/valid.h include/libxml/xmlIO.h include/libxml/xmlschemas.h include/libxml/xmlversion.h.in include/libxml/xpathInternals.h python/libxml.c: Okay this is scary but it is just adding a configure option to disable output, this touches most of the files. Daniel
* Fixed bug 121394 - missing ns on attributesWilliam M. Brack2003-09-151-0/+4
| | | | * HTMLtree.c: Fixed bug 121394 - missing ns on attributes
* hum try to avoid some troubles when the library is not initialized and oneDaniel Veillard2003-08-081-0/+16
| | | | | | | | | | * HTMLtree.c tree.c threads.c: hum try to avoid some troubles when the library is not initialized and one try to save, the locks in threaded env might not been initialized, playing safe * xmlschemastypes.c: apply patch for hexBinary from Charles Bozeman * test/schemas/hexbinary_* result/schemas/hexbinary_*: also added his tests to the regression suite. Daniel
* fixing bug #112904: html output method escaped plus sign character in URIDaniel Veillard2003-05-161-1/+1
| | | | | | * HTMLtree.c: fixing bug #112904: html output method escaped plus sign character in URI attribute. Daniel
* patch from Vasily Tchekalkin to fix #109865 DanielDaniel Veillard2003-04-101-0/+4
| | | | | * HTMLtree.c: patch from Vasily Tchekalkin to fix #109865 Daniel
* Fixed reopening of #78662 <form action="..."> is an URI reference DanielDaniel Veillard2003-03-271-2/+5
| | | | | | * HTMLtree.c: Fixed reopening of #78662 <form action="..."> is an URI reference Daniel
* avoid escaping ',' in URIs DanielDaniel Veillard2003-03-231-1/+1
| | | | | * HTMLtree.c: avoid escaping ',' in URIs Daniel
* fixes #102920 about namespace handling in HTML output and section 16.2Daniel Veillard2003-01-091-1/+15
| | | | | | | * HTMLtree.c tree.c: fixes #102920 about namespace handling in HTML output and section 16.2 "HTML Output Method" of XSLT-1.0 * README: fixed a link Daniel
* patch from Mark Vadok about htmlNodeDumpOutput location. removed anDaniel Veillard2002-12-121-2/+0
| | | | | | | | | | * HTMLtree.c include/libxml/HTMLtree.h: patch from Mark Vadok about htmlNodeDumpOutput location. * xpath.c: removed an undefined function signature * doc/apibuild.py doc/libxml2-api.xml: the script was exporting too many symbols in the API breaking the python bindings. Updated with the libxslt/libexslt changes. Daniel