| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
If libz or liblzma are detected with pkg-config, AC_CHECK_HEADERS must
not be run because the correct CPPFLAGS aren't set. It is actually not
required have separate checks for LIBXML_ZLIB_ENABLED and HAVE_ZLIB_H.
Only check for LIBXML_ZLIB_ENABLED and remove HAVE_ZLIB_H macro.
Fixes bug 764657, bug 787041.
|
|
|
|
|
| |
Add "falls through" comments to quench implicit-fallthrough warnings
which are enabled by -Wextra under GCC 7.
|
|
|
|
|
|
|
|
|
|
|
| |
On 64-bit Windows, `long` is 32 bits wide and can't hold a pointer.
Switch to ptrdiff_t instead which should be the same size as a pointer
on every somewhat sane platform without requiring C99 types like
intptr_t.
Fixes bug 788312.
Thanks to J. Peter Mugaas for the report and initial patch.
|
|
|
|
|
| |
Raised by gcc as potential error, no semantic change needed but
fixed the indentation
|
|
|
|
|
|
|
|
|
|
| |
First set of patches for zOS
- entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c:
ask conversion of code to ISO Latin 1 to avoid having the compiler assume
EBCDIC codepoint for characters.
- xmlmodule.c: make sure we have support for modules
- xmlIO.c: zOS path names are special avoid dsome of the expectstions from
Unix/Windows
|
|
|
|
| |
Fixes bug 347465, bug 599433, bug 624550, bug 698253.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid expanding the entity recursively. Use the same prevention
mechanism as in xmlStringGetNodeList.
xmlStringGetNodeList on the other hand wasn't fixing up the 'last'
pointer.
I think the memory leak can only be triggered in recovery mode.
Found with libFuzzer and ASan.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For https://bugzilla.gnome.org/show_bug.cgi?id=762100
When we detect a recusive entity we should really not
build the associated data, moreover if someone bypass
libxml2 fatal errors and still tries to serialize a broken
entity make sure we don't risk to get ito a recursion
* parser.c: xmlParserEntityCheck() don't build if entity loop
were found and remove the associated text content
* tree.c: xmlStringGetNodeList() avoid a potential recursion
|
|
|
|
| |
Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
|
|
|
|
| |
This partially reverts my previous commit fixing bug #741919.
|
| |
|
|
|
|
|
|
| |
As detected by Coverity (CIDs 60467–60472).
https://bugzilla.gnome.org/show_bug.cgi?id=739220
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For https://bugzilla.gnome.org/show_bug.cgi?id=689483
It seems there are functions that do use the const qualifier for some of the
arguments, but it seems that there are a lot of functions that don't use it and
probably should.
So I created a patch against 2.9.0 that makes as much as possible const in
tree.h, and changed other files as needed.
There were a lot of cases like "const xmlNodePtr node". This doesn't actually
do anything, there the *pointer* is constant not the object it points to. So I
changed those to "const xmlNode *node".
I also removed some consts, mostly in the Copy functions, because those
functions can actually modify the doc or node they copy from
|
|
|
|
|
|
| |
For https://bugzilla.gnome.org/show_bug.cgi?id=705392
Cut out an unused block
|
|
|
|
| |
https://bugzilla.gnome.org/show_bug.cgi?id=733900
|
|
|
|
|
| |
For https://bugzilla.gnome.org/show_bug.cgi?id=733710
Reported by Gaurav but with slightly different fixes
|
|
|
|
|
|
|
| |
Avoid freeing the currently set name until after having assigned the new name,
this allows one to call xmlNodeSetName (node, node->name + 1) to set the new
name of the node to a substring of the current name without introducing any
crash and without requiring an extra strdup().
|
|
|
|
| |
Raised by Blasius Bieselbert on IRC
|
|
|
|
|
|
| |
xinclude needs xmlAddNextSibling().
Compile out use of xmlLocationSetPtr when xptr is disabled.
Include xpath header.
|
|
|
|
| |
Fix compilation with minimum and legacy.
|
|
|
|
| |
Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
|
|
|
|
| |
For https://bugzilla.gnome.org/show_bug.cgi?id=708681
|
|
|
|
|
|
|
| |
https://bugzilla.gnome.org/show_bug.cgi?id=707750
Also reported by Gaurav, simple fix to check the pointer before
dereference
|
|
|
|
|
| |
An improvement of the documentation, and an extra safety check
for xmlSetNs()
|
| |
|
|
|
|
|
|
|
| |
Handle special cases of &{...} constructs as hinted in the spec
http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix the lack of line number as reported by Johan Corveleyn <jcorvel@gmail.com>
* parser.c include/libxml/parser.h: add an XML_PARSE_BIG_LINES parser
option not switch on by default, it's an opt-in
* SAX2.c: if XML_PARSE_BIG_LINES is set store the long line numbers
in the psvi field of text nodes
* tree.c: expand xmlGetLineNo to extract those informations, also
make sure we can't fail on recursive behaviour
* error.c: in __xmlRaiseError, if a node is provided, call
xmlGetLineNo() if we can't get a valid line number.
* xmllint.c: switch on XML_PARSE_BIG_LINES in xmllint
|
|
|
|
|
|
|
|
|
|
|
|
| |
Various cleanups
* configure.in: force regeneration of APIs in my environment
* buf.c buf.h enc.h encoding.c include/libxml/tree.h
include/libxml/xmlerror.h save.h tree.c: various comment cleanups
pointed by apibuild
* doc/apibuild.py: added the 3 new internal headers in the excludes
* doc/libxml2-api.xml doc/libxml2-refs.xml: regenerated the API
* doc/symbols.xml: listing new entry points for 2.9.0
* doc/devhelp/*: regenerated
|
|
|
|
|
| |
Specifially checking against namespace nodes before accessing node
pointers
|
| |
|
|
|
|
|
| |
Mostly an optimization to avoid xmlBuffer->xmlBuf conversions
and use the new code.
|
|
|
|
|
|
|
|
| |
* include/libxml/tree.h: adds xmlBufGetNodeContent and xmlBufNodeDump
as xmlBuf based equivalents of xmlNodeGetContent and xmlNodeDump
* tree.c: implements one new routine and converts xmlNodeBufGetContent
to use the xmlBuf equivalent. It should behave better as a result
in case of data larger than 2GB.
|
|
|
|
|
|
| |
* testapi.c: regenerated and covering new APIs
* tree.c: xmlBufferDetach can't work on immutable buffers
* xzlib.c: fix a deallocation error
|
|
|
|
|
|
|
| |
* tree.c: missing documentation for xmlBufferDetach
* doc/symbols.xml: add two new symbols xmlTextReaderRelaxNGValidateCtxt
and xmlBufferDetach
* doc/apibuild.py: ignore internal header xzlib.h
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veillard@redhat.com> wrote:
> Hi Conrad,
>
> that's interesting ! I was initially afraid of a sudden explosion of
> memory allocations for building a tree since by default buffers tend to
> "waste" memory by using doubling allocations, but that's not the case.
> xmllint --noout doc/libxml2-api.xml
> when compiled with memory debug produce
>
> paphio:~/XML -> cat .memdump
> MEMORY ALLOCATED : 0, MAX was 12756699
>
> and without your patch 12755657, i.e. the increase is minimal.
Heh, I thought that too. Actually you're looking at the result with XML_ALLOC_EXACT! This
is because EXACT adds 10bytes "spare" on each alloc, and that interestingly wastes about the
same amount of space as XML_ALLOC_DOUBLEIT on this example (see below).
So it turns out that the default realloc() on my system actually handles this case really
well — and I guess that all the time in xmlRealloc() was actually in xmlStrlen, not the
underlying realloc() after all (sorry for misleading you). If you replace the realloc()
with a bad one (like valgrind's), then the performance degrades severely.
This patch implements a HYBRID allocator which has the behaviour you describe (it's
like EXACT to start with, though without the spare 10 bytes; and switches to DOUBLEIT
after 4kb) — that gets the memory back down to 12755657, with no noticeable impact on the
performance of the synthetic pathological example under valgrind.
In summary:
max_memory on ./xmllint --noout doc/libxml2-api.xml,
valgrind time on https://gist.github.com/2656940
max_memory valgrind time
before | 12755657 | 29:18.2
EXACT | 12756699 | 2:58.6 <-- this is the state after the first patch.
DOUBLEIT | 12756727 | 0:02.7
HYBRID | 12755754 | 0:02.7 <-- this is the state with both patches.
>
> There is also the cost of creating the buffers all the time.
> I need to read the code and check but I may be interested in an hybrid
> approach where we switch to buffer only when the text node starts to
> become too big (4k would remove nearly all usuall types of "document"
> usage, i.e. not blocks of data)
I tried to avoid too much buffer creation by introducing the xmlBufferDetach function,
which allows re-using one buffer to construct many strings. It's maybe a bit of a "hack"
in API terms though I thought the gains would be worth it.
Conrad
------8<------
To keep memory usage tight in normal conditions it's desirable to only
allocate as much space as is needed. Unfortunately this can lead to
problems when constructing a long string out of small chunks, because
every chunk you add will need to resize the buffer.
To fix this XML_ALLOC_HYBRID will switch (when the buffer is 4kb big)
from using exact allocations to doubling buffer size every time it is
full. This limits the number of buffer resizes to O(log n) (down from
O(n)), and thus greatly increases the performance of constructing very
large strings in this manner.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hi Veillard and all,
Firstly, thanks for libxml: it's awesome!
I noticed recently that libxml was taking a surprisingly long time to perform some
operations (many minutes instead of milliseconds), and so I did some digging. It turns out
that the problem was caused by the realloc()ing done in xmlNodeAddContentLen() which can
be called many (many) times when assigning some content into a node.
For background, I'm dealing with XML that contains emails, these can have large
attachments (~6MB) which are base-64 encoded, line-wrapped at 78 chars, and each line ends
with . This means that xmlNodeAddContentLen() is being called about 200,000 times,
and so there are 200,000 reallocs of a 6MB string, which takes a while... (I put a synthetic
example of this at https://gist.github.com/2656940)
The attached patch works around that problem by using the existing buffer API to merge the
strings together before even creating the text node, this keeps the number of realloc()s
at a managable level.
I'd love feedback on the patch, and am happy to fix problems with it, or explore other
solutions if you think that this is barking up the wrong tree :).
Thanks,
Conrad
P.S. Should I create a bug for this too?
------8<------
Before this change xmlStringGetNodeList would perform a realloc() of the
entire new content for every XML entity in the assigned text in order to
merge together adjacent text nodes. This had the effect of making
xmlSetNodeContent O(n^2), which led to unexpectedly bad performance on
inputs that contained a large number of XML entities.
After this change the memory management is done by the buffer API,
avoiding the need to continually re-measure and realloc() the string.
For my test data (6MB of 80 character lines, each ending with )
this takes the time to xmlSetNodeContent from about 500 seconds to
around 50ms. I have not profiled smaller cases, though I tried to
minimize the performance impact of my change by avoiding unnecessary
string copying.
Signed-off-by: Conrad Irwin <conrad.irwin@gmail.com>
|
|
|
|
|
|
|
|
| |
For https://bugzilla.gnome.org/show_bug.cgi?id=630682
The python tests were reporting errors, some of it was due to
a small change in case encoding, but the main one was about
htmlSetMetaEncoding(doc, NULL) being broken by not removing
the associated meta tag anymore
|
|
|
|
| |
Just add one small sentence to the xmlUnlinkNode function comments
|
|
|
|
|
|
|
|
|
|
| |
Usually 'xml' namespace for XML-1.0 declaration does not need
to be carried but Mike Hommey raised the problem that the SVG
XSD file fails to parse due to a mishandling.
- SAX2.c: failure to create a namespace should not be interpreted
as a memory allocation error
- tree.c: document better xmlNewNs behaviour, and fix it in the
case the 'xml' prefix is being used.
|
| |
|
|
|
|
|
| |
* tree.c: xmlPreviousElementSibling it should look for preceding sibling
never for the following ones...
|
|
|
|
| |
* tree.c: reconcile namespace if not found
|
|
|
|
|
| |
* tree.c: xmlNodeSetName: Remove const from declaration since it is
used non-const anyway. Remove unnecessary cast on xmlFree later on.
|
|
|
|
|
|
|
|
| |
* SAX2.c dict.c error.c hash.c nanohttp.c parser.c python/libxml.c
relaxng.c runtest.c tree.c valid.c xinclude.c xmlregexp.c xmlsave.c
xmlschemas.c xpath.c xpointer.c: mostly removing unneded affectations,
but this led to a few real bugs and some part not yet understood
(relaxng/interleave)
|
|
|
|
|
| |
* encoding.c parser.c relaxng.c runsuite.c tree.c xmlreader.c
xmlschemas.c: nothing really serious but better safe than sorry
|
|
|
|
|
| |
* tree.c: even on BSD there is too much of a penalty hit, to use
the doubling buffer size strategy on all arches not just Windows.
|
|
|
|
|
|
|
| |
* libxml2.syms: the symbols with history, going back to 2.4.30
* Makefile.am configure.in: linking flags detection and use
* parser.c tree.c valid.c xpointer.c: various cleanup of functions
which could be made static or simply discarded, not that many
|
|
|
|
| |
* tree.c: copy attributes and namespaces for that kind of node
|
|
|
|
|
| |
* tree.c: avoid calling xmlAddID with NULL values
* parser.c: add a few xmlInitParser in some entry points
|
|
|
|
|
|
|
|
|
| |
* tree.c: add a missing check in xmlAddSibling, patch by Kris Breuker
* xmlIO.c: avoid xmlAllocOutputBuffer using XML_BUFFER_EXACT which
leads to performances problems especially on Windows.
daniel
svn path=/trunk/; revision=3820
|