diff options
Diffstat (limited to 'Doc/library/xml.sax.handler.rst')
-rw-r--r-- | Doc/library/xml.sax.handler.rst | 402 |
1 files changed, 402 insertions, 0 deletions
diff --git a/Doc/library/xml.sax.handler.rst b/Doc/library/xml.sax.handler.rst new file mode 100644 index 0000000000..bc287d1337 --- /dev/null +++ b/Doc/library/xml.sax.handler.rst @@ -0,0 +1,402 @@ + +:mod:`xml.sax.handler` --- Base classes for SAX handlers +======================================================== + +.. module:: xml.sax.handler + :synopsis: Base classes for SAX event handlers. +.. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> +.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> + + +.. versionadded:: 2.0 + +The SAX API defines four kinds of handlers: content handlers, DTD handlers, +error handlers, and entity resolvers. Applications normally only need to +implement those interfaces whose events they are interested in; they can +implement the interfaces in a single object or in multiple objects. Handler +implementations should inherit from the base classes provided in the module +:mod:`xml.sax.handler`, so that all methods get default implementations. + + +.. class:: ContentHandler + + This is the main callback interface in SAX, and the one most important to + applications. The order of events in this interface mirrors the order of the + information in the document. + + +.. class:: DTDHandler + + Handle DTD events. + + This interface specifies only those DTD events required for basic parsing + (unparsed entities and attributes). + + +.. class:: EntityResolver + + Basic interface for resolving entities. If you create an object implementing + this interface, then register the object with your Parser, the parser will call + the method in your object to resolve all external entities. + + +.. class:: ErrorHandler + + Interface used by the parser to present error and warning messages to the + application. The methods of this object control whether errors are immediately + converted to exceptions or are handled in some other way. + +In addition to these classes, :mod:`xml.sax.handler` provides symbolic constants +for the feature and property names. + + +.. data:: feature_namespaces + + Value: ``"http://xml.org/sax/features/namespaces"`` --- true: Perform Namespace + processing. --- false: Optionally do not perform Namespace processing (implies + namespace-prefixes; default). --- access: (parsing) read-only; (not parsing) + read/write + + +.. data:: feature_namespace_prefixes + + Value: ``"http://xml.org/sax/features/namespace-prefixes"`` --- true: Report + the original prefixed names and attributes used for Namespace + declarations. --- false: Do not report attributes used for Namespace + declarations, and optionally do not report original prefixed names + (default). --- access: (parsing) read-only; (not parsing) read/write + + +.. data:: feature_string_interning + + Value: ``"http://xml.org/sax/features/string-interning"`` --- true: All element + names, prefixes, attribute names, Namespace URIs, and local names are interned + using the built-in intern function. --- false: Names are not necessarily + interned, although they may be (default). --- access: (parsing) read-only; (not + parsing) read/write + + +.. data:: feature_validation + + Value: ``"http://xml.org/sax/features/validation"`` --- true: Report all + validation errors (implies external-general-entities and + external-parameter-entities). --- false: Do not report validation errors. --- + access: (parsing) read-only; (not parsing) read/write + + +.. data:: feature_external_ges + + Value: ``"http://xml.org/sax/features/external-general-entities"`` --- true: + Include all external general (text) entities. --- false: Do not include + external general entities. --- access: (parsing) read-only; (not parsing) + read/write + + +.. data:: feature_external_pes + + Value: ``"http://xml.org/sax/features/external-parameter-entities"`` --- true: + Include all external parameter entities, including the external DTD subset. --- + false: Do not include any external parameter entities, even the external DTD + subset. --- access: (parsing) read-only; (not parsing) read/write + + +.. data:: all_features + + List of all features. + + +.. data:: property_lexical_handler + + Value: ``"http://xml.org/sax/properties/lexical-handler"`` --- data type: + xml.sax.sax2lib.LexicalHandler (not supported in Python 2) --- description: An + optional extension handler for lexical events like comments. --- access: + read/write + + +.. data:: property_declaration_handler + + Value: ``"http://xml.org/sax/properties/declaration-handler"`` --- data type: + xml.sax.sax2lib.DeclHandler (not supported in Python 2) --- description: An + optional extension handler for DTD-related events other than notations and + unparsed entities. --- access: read/write + + +.. data:: property_dom_node + + Value: ``"http://xml.org/sax/properties/dom-node"`` --- data type: + org.w3c.dom.Node (not supported in Python 2) --- description: When parsing, + the current DOM node being visited if this is a DOM iterator; when not parsing, + the root DOM node for iteration. --- access: (parsing) read-only; (not parsing) + read/write + + +.. data:: property_xml_string + + Value: ``"http://xml.org/sax/properties/xml-string"`` --- data type: String --- + description: The literal string of characters that was the source for the + current event. --- access: read-only + + +.. data:: all_properties + + List of all known property names. + + +.. _content-handler-objects: + +ContentHandler Objects +---------------------- + +Users are expected to subclass :class:`ContentHandler` to support their +application. The following methods are called by the parser on the appropriate +events in the input document: + + +.. method:: ContentHandler.setDocumentLocator(locator) + + Called by the parser to give the application a locator for locating the origin + of document events. + + SAX parsers are strongly encouraged (though not absolutely required) to supply a + locator: if it does so, it must supply the locator to the application by + invoking this method before invoking any of the other methods in the + DocumentHandler interface. + + The locator allows the application to determine the end position of any + document-related event, even if the parser is not reporting an error. Typically, + the application will use this information for reporting its own errors (such as + character content that does not match an application's business rules). The + information returned by the locator is probably not sufficient for use with a + search engine. + + Note that the locator will return correct information only during the invocation + of the events in this interface. The application should not attempt to use it at + any other time. + + +.. method:: ContentHandler.startDocument() + + Receive notification of the beginning of a document. + + The SAX parser will invoke this method only once, before any other methods in + this interface or in DTDHandler (except for :meth:`setDocumentLocator`). + + +.. method:: ContentHandler.endDocument() + + Receive notification of the end of a document. + + The SAX parser will invoke this method only once, and it will be the last method + invoked during the parse. The parser shall not invoke this method until it has + either abandoned parsing (because of an unrecoverable error) or reached the end + of input. + + +.. method:: ContentHandler.startPrefixMapping(prefix, uri) + + Begin the scope of a prefix-URI Namespace mapping. + + The information from this event is not necessary for normal Namespace + processing: the SAX XML reader will automatically replace prefixes for element + and attribute names when the ``feature_namespaces`` feature is enabled (the + default). + + There are cases, however, when applications need to use prefixes in character + data or in attribute values, where they cannot safely be expanded automatically; + the :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events supply the + information to the application to expand prefixes in those contexts itself, if + necessary. + + .. % XXX This is not really the default, is it? MvL + + Note that :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events are not + guaranteed to be properly nested relative to each-other: all + :meth:`startPrefixMapping` events will occur before the corresponding + :meth:`startElement` event, and all :meth:`endPrefixMapping` events will occur + after the corresponding :meth:`endElement` event, but their order is not + guaranteed. + + +.. method:: ContentHandler.endPrefixMapping(prefix) + + End the scope of a prefix-URI mapping. + + See :meth:`startPrefixMapping` for details. This event will always occur after + the corresponding :meth:`endElement` event, but the order of + :meth:`endPrefixMapping` events is not otherwise guaranteed. + + +.. method:: ContentHandler.startElement(name, attrs) + + Signals the start of an element in non-namespace mode. + + The *name* parameter contains the raw XML 1.0 name of the element type as a + string and the *attrs* parameter holds an object of the :class:`Attributes` + interface (see :ref:`attributes-objects`) containing the attributes of + the element. The object passed as *attrs* may be re-used by the parser; holding + on to a reference to it is not a reliable way to keep a copy of the attributes. + To keep a copy of the attributes, use the :meth:`copy` method of the *attrs* + object. + + +.. method:: ContentHandler.endElement(name) + + Signals the end of an element in non-namespace mode. + + The *name* parameter contains the name of the element type, just as with the + :meth:`startElement` event. + + +.. method:: ContentHandler.startElementNS(name, qname, attrs) + + Signals the start of an element in namespace mode. + + The *name* parameter contains the name of the element type as a ``(uri, + localname)`` tuple, the *qname* parameter contains the raw XML 1.0 name used in + the source document, and the *attrs* parameter holds an instance of the + :class:`AttributesNS` interface (see :ref:`attributes-ns-objects`) + containing the attributes of the element. If no namespace is associated with + the element, the *uri* component of *name* will be ``None``. The object passed + as *attrs* may be re-used by the parser; holding on to a reference to it is not + a reliable way to keep a copy of the attributes. To keep a copy of the + attributes, use the :meth:`copy` method of the *attrs* object. + + Parsers may set the *qname* parameter to ``None``, unless the + ``feature_namespace_prefixes`` feature is activated. + + +.. method:: ContentHandler.endElementNS(name, qname) + + Signals the end of an element in namespace mode. + + The *name* parameter contains the name of the element type, just as with the + :meth:`startElementNS` method, likewise the *qname* parameter. + + +.. method:: ContentHandler.characters(content) + + Receive notification of character data. + + The Parser will call this method to report each chunk of character data. SAX + parsers may return all contiguous character data in a single chunk, or they may + split it into several chunks; however, all of the characters in any single event + must come from the same external entity so that the Locator provides useful + information. + + *content* may be a Unicode string or a byte string; the ``expat`` reader module + produces always Unicode strings. + + .. note:: + + The earlier SAX 1 interface provided by the Python XML Special Interest Group + used a more Java-like interface for this method. Since most parsers used from + Python did not take advantage of the older interface, the simpler signature was + chosen to replace it. To convert old code to the new interface, use *content* + instead of slicing content with the old *offset* and *length* parameters. + + +.. method:: ContentHandler.ignorableWhitespace(whitespace) + + Receive notification of ignorable whitespace in element content. + + Validating Parsers must use this method to report each chunk of ignorable + whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating + parsers may also use this method if they are capable of parsing and using + content models. + + SAX parsers may return all contiguous whitespace in a single chunk, or they may + split it into several chunks; however, all of the characters in any single event + must come from the same external entity, so that the Locator provides useful + information. + + +.. method:: ContentHandler.processingInstruction(target, data) + + Receive notification of a processing instruction. + + The Parser will invoke this method once for each processing instruction found: + note that processing instructions may occur before or after the main document + element. + + A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a + text declaration (XML 1.0, section 4.3.1) using this method. + + +.. method:: ContentHandler.skippedEntity(name) + + Receive notification of a skipped entity. + + The Parser will invoke this method once for each entity skipped. Non-validating + processors may skip entities if they have not seen the declarations (because, + for example, the entity was declared in an external DTD subset). All processors + may skip external entities, depending on the values of the + ``feature_external_ges`` and the ``feature_external_pes`` properties. + + +.. _dtd-handler-objects: + +DTDHandler Objects +------------------ + +:class:`DTDHandler` instances provide the following methods: + + +.. method:: DTDHandler.notationDecl(name, publicId, systemId) + + Handle a notation declaration event. + + +.. method:: DTDHandler.unparsedEntityDecl(name, publicId, systemId, ndata) + + Handle an unparsed entity declaration event. + + +.. _entity-resolver-objects: + +EntityResolver Objects +---------------------- + + +.. method:: EntityResolver.resolveEntity(publicId, systemId) + + Resolve the system identifier of an entity and return either the system + identifier to read from as a string, or an InputSource to read from. The default + implementation returns *systemId*. + + +.. _sax-error-handler: + +ErrorHandler Objects +-------------------- + +Objects with this interface are used to receive error and warning information +from the :class:`XMLReader`. If you create an object that implements this +interface, then register the object with your :class:`XMLReader`, the parser +will call the methods in your object to report all warnings and errors. There +are three levels of errors available: warnings, (possibly) recoverable errors, +and unrecoverable errors. All methods take a :exc:`SAXParseException` as the +only parameter. Errors and warnings may be converted to an exception by raising +the passed-in exception object. + + +.. method:: ErrorHandler.error(exception) + + Called when the parser encounters a recoverable error. If this method does not + raise an exception, parsing may continue, but further document information + should not be expected by the application. Allowing the parser to continue may + allow additional errors to be discovered in the input document. + + +.. method:: ErrorHandler.fatalError(exception) + + Called when the parser encounters an error it cannot recover from; parsing is + expected to terminate when this method returns. + + +.. method:: ErrorHandler.warning(exception) + + Called when the parser presents minor warning information to the application. + Parsing is expected to continue when this method returns, and document + information will continue to be passed to the application. Raising an exception + in this method will cause parsing to end. + |