Things to try out when life permits =================================== * zlib-based parsing/serialising of compressed in-memory data * requires a libxml2 I/O OutputBuffer with appropriate I/O functions that call into the zlib compression routines * lzma-based parsing/serialising of compressed in-memory data * requires a libxml2 I/O OutputBuffer with appropriate I/O functions that call into the lzma compression routines * advantage over zlib: probably faster and better compression * maybe embed the lzma C sources in the distro http://www.7-zip.org/sdk.html * parse-time validation against a user provided DTD * currently only works for XML Schema * support subclassing XSLTAccessControl to provide custom per-URL access check methods * maybe custom resolvers are enough, or can be combined with this? * reimplement iterparse() using the libxml2 xmlReader API * Advantage: the implementation can be made safer than the current SAX implementation, as the parser would not interact with the Python-level tree. * Disadvantage: the tree has to be built manually. In the current SAX based implementation, libxml2 does it for us. * provide an HTMLParser wrapper that handles broken encodings in broken HTML better, e.g. using BeautifulSoup's "unicode dammit" analyser * expose namespace prefixes through the QName class