diff options
Diffstat (limited to 'APACHE_1_3_42/htdocs/manual/vhosts/details_1_2.html')
-rw-r--r-- | APACHE_1_3_42/htdocs/manual/vhosts/details_1_2.html | 386 |
1 files changed, 386 insertions, 0 deletions
diff --git a/APACHE_1_3_42/htdocs/manual/vhosts/details_1_2.html b/APACHE_1_3_42/htdocs/manual/vhosts/details_1_2.html new file mode 100644 index 0000000000..30cb9561be --- /dev/null +++ b/APACHE_1_3_42/htdocs/manual/vhosts/details_1_2.html @@ -0,0 +1,386 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> + +<html xmlns="http://www.w3.org/1999/xhtml"> + <head> + <meta name="generator" content="HTML Tidy, see www.w3.org" /> + + <title>An In-Depth Discussion of VirtualHost Matching</title> + </head> + <!-- Background white, links blue (unvisited), navy (visited), red (active) --> + + <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" + vlink="#000080" alink="#FF0000"> + <!--#include virtual="header.html" --> + + <h1 align="CENTER">An In-Depth Discussion of VirtualHost + Matching</h1> + + <p>This is a very rough document that was probably out of date + the moment it was written. It attempts to explain exactly what + the code does when deciding what virtual host to serve a hit + from. It's provided on the assumption that something is better + than nothing. The server version under discussion is Apache + 1.2.</p> + + <p>If you just want to "make it work" without understanding + how, there's a <a href="#whatworks">What Works</a> section at + the bottom.</p> + + <h3>Config File Parsing</h3> + + <p>There is a main_server which consists of all the definitions + appearing outside of <code>VirtualHost</code> sections. There + are virtual servers, called <em>vhosts</em>, which are defined + by <a + href="../mod/core.html#virtualhost"><samp>VirtualHost</samp></a> + sections.</p> + + <p>The directives <a + href="../mod/core.html#port"><samp>Port</samp></a>, <a + href="../mod/core.html#servername"><samp>ServerName</samp></a>, + <a + href="../mod/core.html#serverpath"><samp>ServerPath</samp></a>, + and <a + href="../mod/core.html#serveralias"><samp>ServerAlias</samp></a> + can appear anywhere within the definition of a server. However, + each appearance overrides the previous appearance (within that + server).</p> + + <p>The default value of the <code>Port</code> field for + main_server is 80. The main_server has no default + <code>ServerName</code>, <code>ServerPath</code>, or + <code>ServerAlias</code>.</p> + + <p>In the absence of any <a + href="../mod/core.html#listen"><samp>Listen</samp></a> + directives, the (final if there are multiple) <code>Port</code> + directive in the main_server indicates which port httpd will + listen on.</p> + + <p>The <code>Port</code> and <code>ServerName</code> directives + for any server main or virtual are used when generating URLs + such as during redirects.</p> + + <p>Each address appearing in the <code>VirtualHost</code> + directive can have an optional port. If the port is unspecified + it defaults to the value of the main_server's most recent + <code>Port</code> statement. The special port <samp>*</samp> + indicates a wildcard that matches any port. Collectively the + entire set of addresses (including multiple <samp>A</samp> + record results from DNS lookups) are called the vhost's + <em>address set</em>.</p> + + <p>The magic <code>_default_</code> address has significance + during the matching algorithm. It essentially matches any + unspecified address.</p> + + <p>After parsing the <code>VirtualHost</code> directive, the + vhost server is given a default <code>Port</code> equal to the + port assigned to the first name in its <code>VirtualHost</code> + directive. The complete list of names in the + <code>VirtualHost</code> directive are treated just like a + <code>ServerAlias</code> (but are not overridden by any + <code>ServerAlias</code> statement). Note that subsequent + <code>Port</code> statements for this vhost will not affect the + ports assigned in the address set.</p> + + <p>All vhosts are stored in a list which is in the reverse + order that they appeared in the config file. For example, if + the config file is:</p> + + <blockquote> +<pre> + <VirtualHost A> + ... + </VirtualHost> + + <VirtualHost B> + ... + </VirtualHost> + + <VirtualHost C> + ... + </VirtualHost> +</pre> + </blockquote> + Then the list will be ordered: main_server, C, B, A. Keep this + in mind. + + <p>After parsing has completed, the list of servers is scanned, + and various merges and default values are set. In + particular:</p> + + <ol> + <li>If a vhost has no <a + href="../mod/core.html#serveradmin"><code>ServerAdmin</code></a>, + <a + href="../mod/core.html#resourceconfig"><code>ResourceConfig</code></a>, + <a + href="../mod/core.html#accessconfig"><code>AccessConfig</code></a>, + <a href="../mod/core.html#timeout"><code>Timeout</code></a>, + <a + href="../mod/core.html#keepalivetimeout"><code>KeepAliveTimeout</code></a>, + <a + href="../mod/core.html#keepalive"><code>KeepAlive</code></a>, + <a + href="../mod/core.html#maxkeepaliverequests"><code>MaxKeepAliveRequests</code></a>, + or <a + href="../mod/core.html#sendbuffersize"><code>SendBufferSize</code></a> + directive then the respective value is inherited from the + main_server. (That is, inherited from whatever the final + setting of that value is in the main_server.)</li> + + <li>The "lookup defaults" that define the default directory + permissions for a vhost are merged with those of the main + server. This includes any per-directory configuration + information for any module.</li> + + <li>The per-server configs for each module from the + main_server are merged into the vhost server.</li> + </ol> + Essentially, the main_server is treated as "defaults" or a + "base" on which to build each vhost. But the positioning of + these main_server definitions in the config file is largely + irrelevant -- the entire config of the main_server has been + parsed when this final merging occurs. So even if a main_server + definition appears after a vhost definition it might affect the + vhost definition. + + <p>If the main_server has no <code>ServerName</code> at this + point, then the hostname of the machine that httpd is running + on is used instead. We will call the <em>main_server address + set</em> those IP addresses returned by a DNS lookup on the + <code>ServerName</code> of the main_server.</p> + + <p>Now a pass is made through the vhosts to fill in any missing + <code>ServerName</code> fields and to classify the vhost as + either an <em>IP-based</em> vhost or a <em>name-based</em> + vhost. A vhost is considered a name-based vhost if any of its + address set overlaps the main_server (the port associated with + each address must match the main_server's <code>Port</code>). + Otherwise it is considered an IP-based vhost.</p> + + <p>For any undefined <code>ServerName</code> fields, a + name-based vhost defaults to the address given first in the + <code>VirtualHost</code> statement defining the vhost. Any + vhost that includes the magic <samp>_default_</samp> wildcard + is given the same <code>ServerName</code> as the main_server. + Otherwise the vhost (which is necessarily an IP-based vhost) is + given a <code>ServerName</code> based on the result of a + reverse DNS lookup on the first address given in the + <code>VirtualHost</code> statement.</p> + + <h3>Vhost Matching</h3> + + <p><strong>Apache 1.3 differs from what is documented here, and + documentation still has to be written.</strong></p> + + <p>The server determines which vhost to use for a request as + follows:</p> + + <p><code>find_virtual_server</code>: When the connection is + first made by the client, the local IP address (the IP address + to which the client connected) is looked up in the server list. + A vhost is matched if it is an IP-based vhost, the IP address + matches and the port matches (taking into account + wildcards).</p> + + <p>If no vhosts are matched then the last occurrence, if it + appears, of a <samp>_default_</samp> address (which if you + recall the ordering of the server list mentioned above means + that this would be the first occurrence of + <samp>_default_</samp> in the config file) is matched.</p> + + <p>In any event, if nothing above has matched, then the + main_server is matched.</p> + + <p>The vhost resulting from the above search is stored with + data about the connection. We'll call this the <em>connection + vhost</em>. The connection vhost is constant over all requests + in a particular TCP/IP session -- that is, over all requests in + a KeepAlive/persistent session.</p> + + <p>For each request made on the connection the following + sequence of events further determines the actual vhost that + will be used to serve the request.</p> + + <p><code>check_fulluri</code>: If the requestURI is an + absoluteURI, that is it includes <code>http://hostname/</code>, + then an attempt is made to determine if the hostname's address + (and optional port) match that of the connection vhost. If it + does then the hostname portion of the URI is saved as the + <em>request_hostname</em>. If it does not match, then the URI + remains untouched. <strong>Note</strong>: to achieve this + address comparison, the hostname supplied goes through a DNS + lookup unless it matches the <code>ServerName</code> or the + local IP address of the client's socket.</p> + + <p><code>parse_uri</code>: If the URI begins with a protocol + (<em>i.e.</em>, <code>http:</code>, <code>ftp:</code>) then the + request is considered a proxy request. Note that even though we + may have stripped an <code>http://hostname/</code> in the + previous step, this could still be a proxy request.</p> + + <p><code>read_request</code>: If the request does not have a + hostname from the earlier step, then any <code>Host:</code> + header sent by the client is used as the request hostname.</p> + + <p><code>check_hostalias</code>: If the request now has a + hostname, then an attempt is made to match for this hostname. + The first step of this match is to compare any port, if one was + given in the request, against the <code>Port</code> field of + the connection vhost. If there's a mismatch then the vhost used + for the request is the connection vhost. (This is a bug, see + observations.)</p> + + <p>If the port matches, then httpd scans the list of vhosts + starting with the next server <strong>after</strong> the + connection vhost. This scan does not stop if there are any + matches, it goes through all possible vhosts, and in the end + uses the last match it found. The comparisons performed are as + follows:</p> + + <ul> + <li>Compare the request hostname:port with the vhost + <code>ServerName</code> and <code>Port</code>.</li> + + <li>Compare the request hostname against any and all + addresses given in the <code>VirtualHost</code> directive for + this vhost.</li> + + <li>Compare the request hostname against the + <code>ServerAlias</code> given for the vhost.</li> + </ul> + + <p><code>check_serverpath</code>: If the request has no + hostname (back up a few paragraphs) then a scan similar to the + one in <code>check_hostalias</code> is performed to match any + <code>ServerPath</code> directives given in the vhosts. Note + that the <strong>last match</strong> is used regardless (again + consider the ordering of the virtual hosts).</p> + + <h3>Observations</h3> + + <ul> + <li>It is difficult to define an IP-based vhost for the + machine's "main IP address". You essentially have to create a + bogus <code>ServerName</code> for the main_server that does + not match the machine's IPs.</li> + + <li> + During the scans in both <code>check_hostalias</code> and + <code>check_serverpath</code> no check is made that the + vhost being scanned is actually a name-based vhost. This + means, for example, that it's possible to match an IP-based + vhost through another address. But because the scan starts + in the vhost list at the first vhost that matched the local + IP address of the connection, not all IP-based vhosts can + be matched. + + <p>Consider the config file above with three vhosts A, B, + C. Suppose that B is a named-based vhost, and A and C are + IP-based vhosts. If a request comes in on B or C's address + containing a header "<samp>Host: A</samp>" then it will be + served from A's config. If a request comes in on A's + address then it will always be served from A's config + regardless of any Host: header.</p> + </li> + + <li> + Unless you have a <samp>_default_</samp> vhost, it doesn't + matter if you mix name-based vhosts in amongst IP-based + vhosts. During the <code>find_virtual_server</code> phase + above no named-based vhost will be matched, so the + main_server will remain the connection vhost. Then scans + will cover all vhosts in the vhost list. + + <p>If you do have a <samp>_default_</samp> vhost, then you + cannot place named-based vhosts after it in the config. + This is because on any connection to the main server IPs + the connection vhost will always be the + <samp>_default_</samp> vhost since none of the name-based + are considered during <code>find_virtual_server</code>.</p> + </li> + + <li>You should never specify DNS names in + <code>VirtualHost</code> directives because it will force + your server to rely on DNS to boot. Furthermore it poses a + security threat if you do not control the DNS for all the + domains listed. <a href="dns-caveats.html">There's more + information available on this and the next two + topics</a>.</li> + + <li><code>ServerName</code> should always be set for each + vhost. Otherwise A DNS lookup is required for each + vhost.</li> + + <li>A DNS lookup is always required for the main_server's + <code>ServerName</code> (or to generate that if it isn't + specified in the config).</li> + + <li>If a <code>ServerPath</code> directive exists which is a + prefix of another <code>ServerPath</code> directive that + appears later in the configuration file, then the former will + always be matched and the latter will never be matched. (That + is assuming that no Host header was available to disambiguate + the two.)</li> + + <li>If a vhost that would otherwise be a name-vhost includes + a <code>Port</code> statement that doesn't match the + main_server <code>Port</code> then it will be considered an + IP-based vhost. Then <code>find_virtual_server</code> will + match it (because the ports associated with each address in + the address set default to the port of the main_server) as + the connection vhost. Then <code>check_hostalias</code> will + refuse to check any other name-based vhost because of the + port mismatch. The result is that the vhost will steal all + hits going to the main_server address.</li> + + <li>If two IP-based vhosts have an address in common, the + vhost appearing later in the file is always matched. Such a + thing might happen inadvertently. If the config has + name-based vhosts and for some reason the main_server + <code>ServerName</code> resolves to the wrong address then + all the name-based vhosts will be parsed as ip-based vhosts. + Then the last of them will steal all the hits.</li> + + <li>The last name-based vhost in the config is always matched + for any hit which doesn't match one of the other name-based + vhosts.</li> + </ul> + + <h3><a id="whatworks" name="whatworks">What Works</a></h3> + + <p>In addition to the tips on the <a + href="../dns-caveats.html#tips">DNS Issues</a> page, here are some + further tips:</p> + + <ul> + <li>Place all main_server definitions before any VirtualHost + definitions. (This is to aid the readability of the + configuration -- the post-config merging process makes it + non-obvious that definitions mixed in around virtualhosts + might affect all virtualhosts.)</li> + + <li>Arrange your VirtualHosts such that all name-based + virtual hosts come first, followed by IP-based virtual hosts, + followed by any <samp>_default_</samp> virtual host</li> + + <li>Avoid <code>ServerPaths</code> which are prefixes of + other <code>ServerPaths</code>. If you cannot avoid this then + you have to ensure that the longer (more specific) prefix + vhost appears earlier in the configuration file than the + shorter (less specific) prefix (<em>i.e.</em>, "ServerPath + /abc" should appear after "ServerPath /abcdef").</li> + + <li>Do not use <em>port-based</em> vhosts in the same server + as name-based vhosts. A loose definition for port-based is a + vhost which is determined by the port on the server + (<em>i.e.</em>, one server with ports 8000, 8080, and 80 - + all of which have different configurations).</li> + </ul> + <!--#include virtual="footer.html" --> + </body> +</html> + |