1 files changed, 364 insertions, 0 deletions
diff --git a/APACHE_1_3_42/htdocs/manual/ebcdic.html b/APACHE_1_3_42/htdocs/manual/ebcdic.html
new file mode 100644
index 0000000000..7318f9f76a
--- /dev/null
+++ b/APACHE_1_3_42/htdocs/manual/ebcdic.html
@@ -0,0 +1,364 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta name="generator" content="HTML Tidy, see www.w3.org" />
+
+    <title>The Apache EBCDIC Port</title>
+  </head>
+  <!-- background white, links blue (unvisited), navy (visited), red (active) -->
+
+  <body bgcolor="#ffffff" text="#000000" link="#0000ff"
+  vlink="#000080" alink="#ff0000">
+    <!--#include virtual="header.html" -->
+
+    <h1 align="center">Overview of the Apache EBCDIC Port</h1>
+
+    <p>As of Version 1.3, the Apache HTTP Server includes a port to
+    (non-ASCII) mainframe machines which use the EBCDIC character
+    set as their native codeset.<br />
+     (Initially, that support covered only the Fujitsu-Siemens
+    family of mainframes running the <a
+    href="http://www.fujitsu-siemens.com/rl/products/software/bs2000bc.html">
+    BS2000/OSD operating system</a>, a mainframe OS which features
+    a SVR4-derived POSIX subsystem. Later, the two IBM mainframe
+    operating systems TPF and OS/390 were added).</p>
+    <hr />
+
+    <h2 align="center"><a id="ebcdic" name="ebcdic">EBCDIC-related
+    conversion functions</a></h2>
+    The EBCDIC related directives <a
+    href="mod/core.html#ebcdicconvert">EBCDICConvert</a>, <a
+    href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a>,
+    and <a href="mod/core.html#ebcdickludge">EBCDICKludge</a> are
+    available <b>only if the platform's character set is EBCDIC</b>
+    (This is currently only the case on Fujitsu-Siemens' BS2000/OSD
+    and IBM's OS/390 and TPF operating systems). EBCDIC stands for
+    <em>Extended Binary-Coded-Decimal Interchange Code</em> and is
+    the codeset used on mainframe machines, in contrast to ASCII
+    which is ubiquitous on almost all micro computers today. ASCII
+    (or its extension <em>latin1</em>) is the basis for the HTTP
+    transfer protocol, therefore all EBCDIC-based platforms need a
+    way to configure the code set conversion rules required between
+    the EBCDIC based mainframe host and the HTTP socket
+    protocol.<br />
+     
+
+    <p>On an EBCDIC based system, HTML files and other text files
+    are usually saved encoded in the native EBCDIC code set, while
+    image files and other binary data are stored with identical
+    encoding as on ASCII based machines. When the Apache server
+    accesses documents, it must therefore make a distinction
+    between text files (to be converted to/from ASCII, depending on
+    the transfer direction) and binary files (to be delivered
+    unconverted). Such a distinction can be made based on the
+    assigned MIME type, or based on the file extension
+    (<em>i.e.</em>, files sharing a common file suffix).</p>
+
+    <p>By default, the configuration is symmetric for input and
+    output (<em>i.e.</em>, when a PUT request is executed for a
+    document which was returned by a previous GET request, then the
+    resulting uploaded copy should be identical to the original
+    file). However, the conversion directives allow for specifying
+    different conversions for input and output.</p>
+
+    <p>The directives <a
+    href="mod/core.html#ebcdicconvert">EBCDICConvert</a> and <a
+    href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a>
+    are used to assign the conversion setting (On or Off) based on
+    file extensions or MIME types. Each configuration setting can
+    be defined for input only (<em>e.g.</em>, PUT method), output
+    only (<em>e.g.</em>, GET method), or both input and output. By
+    default, the conversion setting is applied for input and
+    output.</p>
+
+    <p>Note that after modifying the conversion settings for a
+    group of files, it is not sufficient to restart the server. The
+    reason for this is the fact that a cached copy of a document
+    (in a browser or proxy cache) will not get revalidated by
+    contents, but only by date. Since the modification time of the
+    document did not change, browsers will assume they can reuse
+    the cached copy.<br />
+     To recover from this situation, you must either clear all
+    cached copies (browser and proxy cache!), or update the
+    modification time of the documents (using the
+    <code>touch</code> command on the server).</p>
+
+    <p>Note also that server-parsed documents (CGI scripts, .shtml
+    files, and other interpreted files like PHP scripts etc.) are
+    not subject to any input conversion and must therefore be
+    stored in EBCDIC form on the server side.</p>
+
+    <p>In absense of any <a
+    href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a>
+    directive, and if no matching <a
+    href="mod/core.html#ebcdicconvert">EBCDICConvert</a> was found,
+    Apache falls back to an internal heuristic which assumes that
+    all documents with MIME types starting with
+    <samp>"text/"</samp>, <samp>"message/"</samp> or
+    <samp>"multipart/"</samp> as well as the MIME type
+    <samp>"application/x-www-form-urlencoded"</samp> are text
+    documents stored in EBCDIC, whereas all other documents are
+    binary files.</p>
+
+    <p>In order to provide backward compatibility with older
+    versions of apache, the <a
+    href="mod/core.html#ebcdickludge">EBCDICKludge</a> directive
+    allows for a less powerful mechanism to control the conversion
+    of documents to and from EBCDIC.</p>
+
+    <p><strong>Note</strong>:</p>
+
+    <blockquote>
+      The EBCDICKludge directive is deprecated, since its
+      functionality is superseded by the more powerful <a
+      href="mod/core.html#ebcdicconvert">EBCDICConvert</a> and <a
+      href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a>
+      directives.
+    </blockquote>
+    <br />
+     <br />
+     
+
+    <p>The directives are applied in the following order:</p>
+
+    <ol>
+      <li>First, the configured <a
+      href="mod/core.html#ebcdicconvert">EBCDICConvert</a>
+      directives in the current context are evaluated in
+      configuration file order. As soon as a matching file
+      extension is found, the search stops and the configured
+      conversion is applied.<br />
+       EBCDICConvert settings inherited from parent directories are
+      tested after the more specific (deeper) directory
+      levels.</li>
+
+      <li>If the <a
+      href="mod/core.html#ebcdickludge">EBCDICKludge</a> is in
+      effect, the next step tests for a MIME type of the format
+      <samp><i>type/</i><b>x-ascii-</b><i>subtype</i></samp>. If
+      the document has such a type, then the
+      <samp>"<b>x-ascii-</b>"</samp> substring is removed and the
+      conversion set to <samp>Off</samp>.</li>
+
+      <li>In the next step, the configured <a
+      href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a>
+      directives are evaluated in configuration file order. If the
+      document has a matching MIME type, the search stops and the
+      configured conversion is applied.<br />
+       EBCDICConvertByType settings inherited from parent
+      directories are tested after the more specific (deeper)
+      directory levels.<br />
+       If no <a
+      href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a>
+      directive at all exists in the current context, the server
+      falls back to the simple heuristics which assume that MIME
+      types starting with "text/", "message/" or "multipart/" (plus
+      the special type "application/x-www-form-urlencoded" used in
+      simple POST requests) imply a conversion, while all the rest
+      is delivered unconverted (<em>i.e.</em>, binary).</li>
+    </ol>
+    <br />
+     <br />
+     
+    <hr />
+
+    <h2 align="center"><a id="tech" name="tech">Technical
+    Details</a></h2>
+
+    <p>Since all Apache input and output is based upon the BUFF
+    data type and its methods, the easiest solution was to add the
+    actual conversion to the BUFF handling routines. The conversion
+    must be settable at any time, so BUFF flags were added which
+    define whether a BUFF object has currently enabled conversion
+    or not. Two such flags exist: one for data read from the client
+    (ASCII to EBCDIC conversion) and one for data returned to the
+    client (EBCDIC to ASCII conversion).</p>
+
+    <p>During sending of the header, Apache determines (based on
+    the returned MIME type for the request) whether conversion
+    should be used or the document returned unconverted. It uses
+    this decision to initialize the BUFF flag when the response
+    output begins. Modules should therefore determine the MIME type
+    for the current request before initiating the response by
+    calling ap_send_http_headers().</p>
+
+    <p>The BUFF flag is modified at several points in the HTTP
+    protocol:</p>
+
+    <ul>
+      <li><strong>set</strong> (In and Out) before a request is
+      received (because the request and the request header lines
+      are always in ASCII format)</li>
+
+      <li><strong>set/unset</strong> (for Input data) when the
+      request body is received - depending on the content type of
+      the request body (because the request body may contain ASCII
+      text or a binary file)</li>
+
+      <li><strong>set</strong> (for returned Output) before a
+      response header is sent (because the response header lines
+      are always in ASCII format)</li>
+
+      <li><strong>set/unset</strong> (for returned Output) when the
+      response body is sent - depending on the content type of the
+      response body (because the response body may contain text or
+      a binary file)</li>
+    </ul>
+    Additional transparent transitions may occur for
+    extracting/inserting the HTTP/1.1 chunking information
+    from/into the input/output body data stream, and for generating
+    <em>multipart</em> headers for <em>range</em> requests. (See
+    RFC2616 and src/main/http_protocol.c for details.) 
+    <hr />
+
+    <h2 align="center"><a id="port" name="port">Porting
+    Notes</a></h2>
+
+    <ol>
+      <li>
+        The relevant changes in the source are #ifdef'ed into two
+        categories: 
+
+        <dl>
+          <dt><code><strong>#ifdef
+          CHARSET_EBCDIC</strong></code></dt>
+
+          <dd>Code which is needed for any EBCDIC based machine.
+          This includes character translations, differences in
+          contiguity of the two character sets, flags which
+          indicate which part of the HTTP protocol has to be
+          converted and which part doesn't <em>etc.</em></dd>
+
+          <dt><code><strong>#ifdef _OSD_POSIX | TPF |
+          OS390</strong></code></dt>
+
+          <dd>Code which is needed for the Fujitsu-Siemens
+          BS2000/OSD | IBM TPF | IBM OS390 mainframe platforms
+          only. This deals with include file differences and socket
+          and fork implementation topics which are only required on
+          the respective platform.<br />
+          </dd>
+        </dl>
+      </li>
+
+      <li>The possibility to translate between ASCII and EBCDIC at
+      the socket level (on BS2000 POSIX, there is a socket option
+      which supports this) was intentionally <em>not</em> chosen,
+      because the byte stream at the HTTP protocol level consists
+      of a mixture of protocol related strings and non-protocol
+      related raw file data. HTTP protocol strings are always
+      encoded in ASCII (the GET request, any Header: lines, the
+      chunking information <em>etc.</em>) whereas the file transfer
+      parts (<em>i.e.</em>, GIF images, CGI output <em>etc.</em>)
+      should usually be just "passed through" by the server. This
+      separation between "protocol string" and "raw data" is
+      reflected in the server code by functions like bgets() or
+      rvputs() for strings, and functions like bwrite() for binary
+      data. A global translation of everything would therefore be
+      inadequate.<br />
+       (In the case of text files of course, provisions must be
+      made so that EBCDIC documents are always served in
+      ASCII)<br />
+       This port therefore features a built-in protocol level
+      conversion for the server-internal strings (which the
+      compiler translated to EBCDIC strings) and thus for all
+      server-generated documents.<br />
+      </li>
+
+      <li>By examining the call hierarchy for the BUFF management
+      routines, I added an "ebcdic/ascii conversion layer" which
+      would be crossed on every puts/write/get/gets, and conversion
+      flags which allowed enabling/disabling the conversions
+      on-the-fly. Usually, a document crosses this layer twice from
+      its origin source (a file or CGI output) to its destination
+      (the requesting client): <samp>file -&gt; Apache</samp>, and
+      <samp>Apache -&gt; client</samp>.<br />
+       The server can now read the header lines of a CGI-script
+      output in EBCDIC format, and then find out that the remainder
+      of the script's output is in ASCII (like in the case of the
+      output of a WWW Counter program: the document body contains a
+      GIF image). All header processing is done in the native
+      EBCDIC format; the server then determines, based on the type
+      of document being served, whether the document body (except
+      for the chunking information, of course) is in ASCII already
+      or must be converted from EBCDIC.<br />
+      </li>
+
+      <li>
+        By default, Apache assumes that documents with the MIME
+        types "text/*", "message/*", "multipart/*" and
+        "application/x-www-form-urlencoded" are text documents and
+        are stored as EBCDIC files, whereas all other files are
+        binary files (and stored in a byte-identical encoding as on
+        an ASCII machine).<br />
+         These defaults can be overridden on a <a
+        href="mod/core.html#ebcdicconvertbytype">by-MIME-type</a>
+        and/or <a
+        href="mod/core.html#ebcdicconvert">by-file-extension</a>
+        basis, using the directives 
+<pre>
+     <a
+href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a> {On|Off}[={In|Out|InOut}] <em>mimetype</em> [...]
+     <a
+href="mod/core.html#ebcdicconvert">EBCDICConvert</a>       {On|Off}[={In|Out|InOut}] <em>fileext</em> [...]
+   
+</pre>
+        where the <em>mimetype</em> argument may contain
+        wildcards.<br />
+      </li>
+
+      <li>Before adding the flexible conversion, non-text documents
+      were always served "binary" without conversion. This seemed
+      to be the most sensible choice for, .<em>e.g.</em>,
+      GIF/ZIP/AU file types (It of course requires the user to copy
+      them to the mainframe host using the "rcp -b" binary switch),
+      but proved to be inadequate for MIME types like
+      <samp>model/vrml</samp>, <samp>application/postscript</samp>
+      and <samp>application/x-javascript</samp>.<br />
+      </li>
+
+      <li>Server parsed files are always assumed to be in native
+      (<em>i.e.</em>, EBCDIC) format as used on the machine
+      (because they do not cross the conversion layer when being
+      read), and are converted after processing.<br />
+      </li>
+
+      <li>For CGI output, the CGI script determines whether a
+      conversion is needed or not: by setting the appropriate
+      Content-Type, text files can be converted, or GIF output can
+      be passed through unmodified (depending on the conversion
+      configured in the script's context).<br />
+      </li>
+    </ol>
+    <hr />
+
+    <h2 align="center"><a id="store" name="store">Document Storage
+    Notes</a></h2>
+
+    <h3 align="center">Binary Files</h3>
+
+    <p>When exchanging binary files between the mainframe host and
+    a Unix machine or Windows PC, be sure to use the ftp "binary"
+    (<samp>TYPE I</samp>) command, or use the
+    <samp>rcp&nbsp;-b</samp> command from the mainframe host (the
+    -b switch is not supported in unix rcp's).</p>
+
+    <h3 align="center">Text Documents</h3>
+
+    <p>The default assumption of the server is that Text Files
+    (<em>i.e.</em>, all files whose <samp>Content-Type:</samp>
+    starts with <samp>text/</samp>) are stored in the native
+    character set of the host, EBCDIC.</p>
+
+    <h3 align="center">Server Side Included Documents</h3>
+
+    <p>SSI documents must currently be stored in EBCDIC only. No
+    provision is made to convert them from ASCII before processing.
+    The same holds for other interpreted languages, like mod_perl
+    or mod_php.</p>
+    <!--#include virtual="footer.html" -->
+  </body>
+</html>
+