1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
|
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head><!--
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
This file is generated from xml source: DO NOT EDIT
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
-->
<title>mod_xml2enc - Apache HTTP Server</title>
<link href="../style/css/manual.css" rel="stylesheet" media="all" type="text/css" title="Main stylesheet" />
<link href="../style/css/manual-loose-100pc.css" rel="alternate stylesheet" media="all" type="text/css" title="No Sidebar - Default font size" />
<link href="../style/css/manual-print.css" rel="stylesheet" media="print" type="text/css" /><link rel="stylesheet" type="text/css" href="../style/css/prettify.css" />
<script src="../style/scripts/prettify.js" type="text/javascript">
</script>
<link href="../images/favicon.ico" rel="shortcut icon" /></head>
<body>
<div id="page-header">
<p class="menu"><a href="../mod/">Modules</a> | <a href="../mod/directives.html">Directives</a> | <a href="http://wiki.apache.org/httpd/FAQ">FAQ</a> | <a href="../glossary.html">Glossary</a> | <a href="../sitemap.html">Sitemap</a></p>
<p class="apache">Apache HTTP Server Version 2.5</p>
<img alt="" src="../images/feather.gif" /></div>
<div class="up"><a href="./"><img title="<-" alt="<-" src="../images/left.gif" /></a></div>
<div id="path">
<a href="http://www.apache.org/">Apache</a> > <a href="http://httpd.apache.org/">HTTP Server</a> > <a href="http://httpd.apache.org/docs/">Documentation</a> > <a href="../">Version 2.5</a> > <a href="./">Modules</a></div>
<div id="page-content">
<div id="preamble"><h1>Apache Module mod_xml2enc</h1>
<div class="toplang">
<p><span>Available Languages: </span><a href="../en/mod/mod_xml2enc.html" title="English"> en </a></p>
</div>
<table class="module"><tr><th><a href="module-dict.html#Description">Description:</a></th><td>Enhanced charset/internationalisation support for libxml2-based
filter modules</td></tr>
<tr><th><a href="module-dict.html#Status">Status:</a></th><td>Base</td></tr>
<tr><th><a href="module-dict.html#ModuleIdentifier">Module Identifier:</a></th><td>xml2enc_module</td></tr>
<tr><th><a href="module-dict.html#SourceFile">Source File:</a></th><td>mod_xml2enc.c</td></tr>
<tr><th><a href="module-dict.html#Compatibility">Compatibility:</a></th><td>Version 2.4 and later. Available as a third-party module
for 2.2.x versions</td></tr></table>
<h3>Summary</h3>
<p>This module provides enhanced internationalisation support for
markup-aware filter modules such as <code class="module"><a href="../mod/mod_proxy_html.html">mod_proxy_html</a></code>.
It can automatically detect the encoding of input data and ensure
they are correctly processed by the <a href="http://xmlsoft.org/">libxml2</a> parser, including converting to Unicode (UTF-8) where
necessary. It can also convert data to an encoding of choice
after markup processing, and will ensure the correct <var>charset</var>
value is set in the HTTP <var>Content-Type</var> header.</p>
</div>
<div id="quickview"><h3 class="directives">Directives</h3>
<ul id="toc">
<li><img alt="" src="../images/down.gif" /> <a href="#xml2encalias">xml2EncAlias</a></li>
<li><img alt="" src="../images/down.gif" /> <a href="#xml2encdefault">xml2EncDefault</a></li>
<li><img alt="" src="../images/down.gif" /> <a href="#xml2startparse">xml2StartParse</a></li>
</ul>
<h3>Topics</h3>
<ul id="topics">
<li><img alt="" src="../images/down.gif" /> <a href="#usage">Usage</a></li>
<li><img alt="" src="../images/down.gif" /> <a href="#api">Programming API</a></li>
<li><img alt="" src="../images/down.gif" /> <a href="#sniffing">Detecting an Encoding</a></li>
<li><img alt="" src="../images/down.gif" /> <a href="#output">Output Encoding</a></li>
<li><img alt="" src="../images/down.gif" /> <a href="#alias">Unsupported Encodings</a></li>
</ul><ul class="seealso"><li><a href="#comments_section">Comments</a></li></ul></div>
<div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
<div class="section">
<h2><a name="usage" id="usage">Usage</a></h2>
<p>There are two usage scenarios: with modules programmed to work
with mod_xml2enc, and with those that are not aware of it:</p>
<dl>
<dt>Filter modules enabled for mod_xml2enc</dt><dd>
<p>Modules such as <code class="module"><a href="../mod/mod_proxy_html.html">mod_proxy_html</a></code> version 3.1
and up use the <code>xml2enc_charset</code> optional function to retrieve
the charset argument to pass to the libxml2 parser, and may use the
<code>xml2enc_filter</code> optional function to postprocess to another
encoding. Using mod_xml2enc with an enabled module, no configuration
is necessary: the other module will configure mod_xml2enc for you
(though you may still want to customise it using the configuration
directives below).</p>
</dd>
<dt>Non-enabled modules</dt><dd>
<p>To use it with a libxml2-based module that isn't explicitly enabled for
mod_xml2enc, you will have to configure the filter chain yourself.
So to use it with a filter foo provided by a module mod_foo to
improve the latter's i18n support with HTML and XML, you could use</p>
<pre><code>
FilterProvider iconv xml2enc Content-Type $text/html
FilterProvider iconv xml2enc Content-Type $xml
FilterProvider markup foo Content-Type $text/html
FilterProvider markup foo Content-Type $xml
FilterChain iconv markup
</code></pre>
<p>mod_foo will now support any character set supported by either
(or both) of libxml2 or apr_xlate/iconv.</p>
</dd></dl>
</div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
<div class="section">
<h2><a name="api" id="api">Programming API</a></h2>
<p>Programmers writing libxml2-based filter modules are encouraged to
enable them for mod_xml2enc, to provide strong i18n support for your
users without reinventing the wheel. The programming API is exposed in
<var>mod_xml2enc.h</var>, and a usage example is
<code class="module"><a href="../mod/mod_proxy_html.html">mod_proxy_html</a></code>.</p>
</div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
<div class="section">
<h2><a name="sniffing" id="sniffing">Detecting an Encoding</a></h2>
<p>Unlike <code class="module"><a href="../mod/mod_charset_lite.html">mod_charset_lite</a></code>, mod_xml2enc is designed
to work with data whose encoding cannot be known in advance and thus
configured. It therefore uses 'sniffing' techniques to detect the
encoding of HTTP data as follows:</p>
<ol>
<li>If the HTTP <var>Content-Type</var> header includes a
<var>charset</var> parameter, that is used.</li>
<li>If the data start with an XML Byte Order Mark (BOM) or an
XML encoding declaration, that is used.</li>
<li>If an encoding is declared in an HTML <code><META></code>
element, that is used.</li>
<li>If none of the above match, the default value set by
<code class="directive">xml2EncDefault</code> is used.</li>
</ol>
<p>The rules are applied in order. As soon as a match is found,
it is used and detection is stopped.</p>
</div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
<div class="section">
<h2><a name="output" id="output">Output Encoding</a></h2>
<p><a href="http://xmlsoft.org/">libxml2</a> always uses UTF-8 (Unicode)
internally, and libxml2-based filter modules will output that by default.
mod_xml2enc can change the output encoding through the API, but there
is currently no way to configure that directly.</p>
<p>Changing the output encoding should (in theory, at least) never be
necessary, and is not recommended due to the extra processing load on
the server of an unnecessary conversion.</p>
</div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
<div class="section">
<h2><a name="alias" id="alias">Unsupported Encodings</a></h2>
<p>If you are working with encodings that are not supported by any of
the conversion methods available on your platform, you can still alias
them to a supported encoding using <code class="directive">xml2EncAlias</code>.</p>
</div>
<div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
<div class="directive-section"><h2><a name="xml2EncAlias" id="xml2EncAlias">xml2EncAlias</a> <a name="xml2encalias" id="xml2encalias">Directive</a></h2>
<table class="directive">
<tr><th><a href="directive-dict.html#Description">Description:</a></th><td>Recognise Aliases for encoding values</td></tr>
<tr><th><a href="directive-dict.html#Syntax">Syntax:</a></th><td><code>xml2EncAlias <var>charset alias [alias ...]</var></code></td></tr>
<tr><th><a href="directive-dict.html#Context">Context:</a></th><td>server config</td></tr>
<tr><th><a href="directive-dict.html#Status">Status:</a></th><td>Base</td></tr>
<tr><th><a href="directive-dict.html#Module">Module:</a></th><td>mod_xml2enc</td></tr>
</table>
<p>This server-wide directive aliases one or more encoding to another
encoding. This enables encodings not recognised by libxml2 to be handled
internally by libxml2's encoding support using the translation table for
a recognised encoding. This serves two purposes: to support character sets
(or names) not recognised either by libxml2 or iconv, and to skip
conversion for an encoding where it is known to be unnecessary.</p>
</div>
<div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
<div class="directive-section"><h2><a name="xml2EncDefault" id="xml2EncDefault">xml2EncDefault</a> <a name="xml2encdefault" id="xml2encdefault">Directive</a></h2>
<table class="directive">
<tr><th><a href="directive-dict.html#Description">Description:</a></th><td>Sets a default encoding to assume when absolutely no information
can be <a href="#sniffing">automatically detected</a></td></tr>
<tr><th><a href="directive-dict.html#Syntax">Syntax:</a></th><td><code>xml2EncDefault <var>name</var></code></td></tr>
<tr><th><a href="directive-dict.html#Context">Context:</a></th><td>server config, virtual host, directory, .htaccess</td></tr>
<tr><th><a href="directive-dict.html#Status">Status:</a></th><td>Base</td></tr>
<tr><th><a href="directive-dict.html#Module">Module:</a></th><td>mod_xml2enc</td></tr>
<tr><th><a href="directive-dict.html#Compatibility">Compatibility:</a></th><td>Version 2.4.0 and later; available as a third-party
module for earlier versions.</td></tr>
</table>
<p>If you are processing data with known encoding but no encoding
information, you can set this default to help mod_xml2enc process
the data correctly. For example, to work with the default value
of Latin1 (<var>iso-8859-1</var> specified in HTTP/1.0, use</p>
<pre class="prettyprint lang-config">xml2EncDefault iso-8859-1</pre>
</div>
<div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
<div class="directive-section"><h2><a name="xml2StartParse" id="xml2StartParse">xml2StartParse</a> <a name="xml2startparse" id="xml2startparse">Directive</a></h2>
<table class="directive">
<tr><th><a href="directive-dict.html#Description">Description:</a></th><td>Advise the parser to skip leading junk.</td></tr>
<tr><th><a href="directive-dict.html#Syntax">Syntax:</a></th><td><code>xml2StartParse <var>element [element ...]</var></code></td></tr>
<tr><th><a href="directive-dict.html#Context">Context:</a></th><td>server config, virtual host, directory, .htaccess</td></tr>
<tr><th><a href="directive-dict.html#Status">Status:</a></th><td>Base</td></tr>
<tr><th><a href="directive-dict.html#Module">Module:</a></th><td>mod_xml2enc</td></tr>
</table>
<p>Specify that the markup parser should start at the first instance
of any of the elements specified. This can be used as a workaround
where a broken backend inserts leading junk that messes up the parser (<a href="http://bahumbug.wordpress.com/2006/10/12/mod_proxy_html-revisited/">example here</a>).</p>
<p>It should never be used for XML, nor well-formed HTML.</p>
</div>
</div>
<div class="bottomlang">
<p><span>Available Languages: </span><a href="../en/mod/mod_xml2enc.html" title="English"> en </a></p>
</div><div class="top"><a href="#page-header"><img src="../images/up.gif" alt="top" /></a></div><div class="section"><h2><a id="comments_section" name="comments_section">Comments</a></h2><div class="warning"><strong>This section is experimental!</strong><br />Comments placed here should not be expected
to last beyond the testing phase of this system, nor do we in any way guarantee that we'll read them.</div>
<script type="text/javascript"><!--//--><![CDATA[//><!--
var disqus_shortname = 'httpd';
var disqus_identifier = 'http://httpd.apache.org/docs/2.4/mod/mod_xml2enc.html.en';
(function(w, d) {
if (w.location.hostname.toLowerCase() == "httpd.apache.org") {
d.write('<div id="disqus_thread"><\/div>');
var s = d.createElement('script');
s.type = 'text/javascript';
s.async = true;
s.src = 'http' + '://' + disqus_shortname + '.disqus.com/embed.js';
(d.getElementsByTagName('head')[0] || d.getElementsByTagName('body')[0]).appendChild(s);
}
else {
d.write('<div id="disqus_thread">Comments have been disabled for offline viewing.<\/div>');
}
})(window, document);
//--><!]]></script></div><div id="footer">
<p class="apache">Copyright 2012 The Apache Software Foundation.<br />Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.</p>
<p class="menu"><a href="../mod/">Modules</a> | <a href="../mod/directives.html">Directives</a> | <a href="http://wiki.apache.org/httpd/FAQ">FAQ</a> | <a href="../glossary.html">Glossary</a> | <a href="../sitemap.html">Sitemap</a></p></div><script type="text/javascript"><!--//--><![CDATA[//><!--
if (typeof(prettyPrint) !== 'undefined') {
prettyPrint();
}
//--><!]]></script>
</body></html>
|