summaryrefslogtreecommitdiff
path: root/docs/manual/vhosts/vhosts-in-depth.html
diff options
context:
space:
mode:
authorMartin Kraemer <martin@apache.org>1997-11-12 13:37:54 +0000
committerMartin Kraemer <martin@apache.org>1997-11-12 13:37:54 +0000
commit7bd9637c8647f2d497b251ddd1af0735d55d9fc3 (patch)
treeeefae3d974082c9ac3688f57e0248597924339be /docs/manual/vhosts/vhosts-in-depth.html
parenteff734582b272a5c5be7a148a19a705b7d4face9 (diff)
downloadhttpd-7bd9637c8647f2d497b251ddd1af0735d55d9fc3.tar.gz
Citing Lars:
Hi, the attachment includes a reworked Apache manual with the new virtual host documentation. As Dean suggested I created a new directory named 'vhosts' and moved the updated vhosts-in-depth etc. documents into the new directory, renamed them and updated all other documents which refered to the old docs (at least I tried to find all documents...). Submitted by: Lars Eilebrecht <sfx@unix-ag.org> Reviewed by: Martin Kraemer git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@79576 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'docs/manual/vhosts/vhosts-in-depth.html')
-rw-r--r--docs/manual/vhosts/vhosts-in-depth.html396
1 files changed, 396 insertions, 0 deletions
diff --git a/docs/manual/vhosts/vhosts-in-depth.html b/docs/manual/vhosts/vhosts-in-depth.html
new file mode 100644
index 0000000000..d2339bff81
--- /dev/null
+++ b/docs/manual/vhosts/vhosts-in-depth.html
@@ -0,0 +1,396 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
+<html><head>
+<title>An In-Depth Discussion of VirtualHost Matching</title>
+</head>
+
+<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
+<BODY
+ BGCOLOR="#FFFFFF"
+ TEXT="#000000"
+ LINK="#0000FF"
+ VLINK="#000080"
+ ALINK="#FF0000"
+>
+<!--#include virtual="header.html" -->
+<h1 ALIGN="CENTER">An In-Depth Discussion of VirtualHost Matching</h1>
+
+<p>This is a very rough document that was probably out of date the moment
+it was written. It attempts to explain exactly what the code does when
+deciding what virtual host to serve a hit from. It's provided on the
+assumption that something is better than nothing. The server version
+under discussion is Apache 1.2.
+
+<p>If you just want to &quot;make it work&quot; without understanding
+how, there's a <a href="#whatworks">What Works</a> section at the bottom.
+
+<h3>Config File Parsing</h3>
+
+<p>There is a main_server which consists of all the definitions appearing
+outside of <CODE>VirtualHost</CODE> sections. There are virtual servers,
+called <EM>vhosts</EM>, which are defined by
+<A
+ HREF="mod/core.html#virtualhost"
+><SAMP>VirtualHost</SAMP></A>
+sections.
+
+<p>The directives
+<A
+ HREF="mod/core.html#port"
+><SAMP>Port</SAMP></A>,
+<A
+ HREF="mod/core.html#servername"
+><SAMP>ServerName</SAMP></A>,
+<A
+ HREF="mod/core.html#serverpath"
+><SAMP>ServerPath</SAMP></A>,
+and
+<A
+ HREF="mod/core.html#serveralias"
+><SAMP>ServerAlias</SAMP></A>
+can appear anywhere within the definition of
+a server. However, each appearance overrides the previous appearance
+(within that server).
+
+<p>The default value of the <code>Port</code> field for main_server
+is 80. The main_server has no default <code>ServerName</code>,
+<code>ServerPath</code>, or <code>ServerAlias</code>.
+
+<p>In the absence of any
+<A
+ HREF="mod/core.html#listen"
+><SAMP>Listen</SAMP></A>
+directives, the (final if there
+are multiple) <code>Port</code> directive in the main_server indicates
+which port httpd will listen on.
+
+<p> The <code>Port</code> and <code>ServerName</code> directives for
+any server main or virtual are used when generating URLs such as during
+redirects.
+
+<p> Each address appearing in the <code>VirtualHost</code> directive
+can have an optional port. If the port is unspecified it defaults to
+the value of the main_server's most recent <code>Port</code> statement.
+The special port <SAMP>*</SAMP> indicates a wildcard that matches any port.
+Collectively the entire set of addresses (including multiple
+<SAMP>A</SAMP> record
+results from DNS lookups) are called the vhost's <EM>address set</EM>.
+
+<p> The magic <code>_default_</code> address has significance during
+the matching algorithm. It essentially matches any unspecified address.
+
+<p> After parsing the <code>VirtualHost</code> directive, the vhost server
+is given a default <code>Port</code> equal to the port assigned to the
+first name in its <code>VirtualHost</code> directive. The complete
+list of names in the <code>VirtualHost</code> directive are treated
+just like a <code>ServerAlias</code> (but are not overridden by any
+<code>ServerAlias</code> statement). Note that subsequent <code>Port</code>
+statements for this vhost will not affect the ports assigned in the
+address set.
+
+<p>
+All vhosts are stored in a list which is in the reverse order that
+they appeared in the config file. For example, if the config file is:
+
+<blockquote><pre>
+ &lt;VirtualHost A&gt;
+ ...
+ &lt;/VirtualHost&gt;
+
+ &lt;VirtualHost B&gt;
+ ...
+ &lt;/VirtualHost&gt;
+
+ &lt;VirtualHost C&gt;
+ ...
+ &lt;/VirtualHost&gt;
+</pre></blockquote>
+
+Then the list will be ordered: main_server, C, B, A. Keep this in mind.
+
+<p>
+After parsing has completed, the list of servers is scanned, and various
+merges and default values are set. In particular:
+
+<ol>
+<li>If a vhost has no
+ <A
+ HREF="mod/core.html#serveradmin"
+ ><code>ServerAdmin</code></A>,
+ <A
+ HREF="mod/core.html#resourceconfig"
+ ><code>ResourceConfig</code></A>,
+ <A
+ HREF="mod/core.html#accessconfig"
+ ><code>AccessConfig</code></A>,
+ <A
+ HREF="mod/core.html#timeout"
+ ><code>Timeout</code></A>,
+ <A
+ HREF="mod/core.html#keepalivetimeout"
+ ><code>KeepAliveTimeout</code></A>,
+ <A
+ HREF="mod/core.html#keepalive"
+ ><code>KeepAlive</code></A>,
+ <A
+ HREF="mod/core.html#maxkeepaliverequests"
+ ><code>MaxKeepAliveRequests</code></A>,
+ or
+ <A
+ HREF="mod/core.html#sendbuffersize"
+ ><code>SendBufferSize</code></A>
+ directive then the respective value is
+ inherited from the main_server. (That is, inherited from whatever
+ the final setting of that value is in the main_server.)
+
+<li>The &quot;lookup defaults&quot; that define the default directory
+ permissions
+ for a vhost are merged with those of the main server. This includes
+ any per-directory configuration information for any module.
+
+<li>The per-server configs for each module from the main_server are
+ merged into the vhost server.
+</ol>
+
+Essentially, the main_server is treated as &quot;defaults&quot; or a
+&quot;base&quot; on
+which to build each vhost. But the positioning of these main_server
+definitions in the config file is largely irrelevant -- the entire
+config of the main_server has been parsed when this final merging occurs.
+So even if a main_server definition appears after a vhost definition
+it might affect the vhost definition.
+
+<p> If the main_server has no <code>ServerName</code> at this point,
+then the hostname of the machine that httpd is running on is used
+instead. We will call the <EM>main_server address set</EM> those IP
+addresses returned by a DNS lookup on the <code>ServerName</code> of
+the main_server.
+
+<p> Now a pass is made through the vhosts to fill in any missing
+<code>ServerName</code> fields and to classify the vhost as either
+an <EM>IP-based</EM> vhost or a <EM>name-based</EM> vhost. A vhost is
+considered a name-based vhost if any of its address set overlaps the
+main_server (the port associated with each address must match the
+main_server's <code>Port</code>). Otherwise it is considered an IP-based
+vhost.
+
+<p> For any undefined <code>ServerName</code> fields, a name-based vhost
+defaults to the address given first in the <code>VirtualHost</code>
+statement defining the vhost. Any vhost that includes the magic
+<SAMP>_default_</SAMP> wildcard is given the same <code>ServerName</code> as
+the main_server. Otherwise the vhost (which is necessarily an IP-based
+vhost) is given a <code>ServerName</code> based on the result of a reverse
+DNS lookup on the first address given in the <code>VirtualHost</code>
+statement.
+
+<p>
+
+<h3>Vhost Matching</h3>
+
+
+<p><strong>Apache 1.3 differs from what is documented
+here, and documentation still has to be written.</strong>
+
+<p>
+The server determines which vhost to use for a request as follows:
+
+<p> <code>find_virtual_server</code>: When the connection is first made
+by the client, the local IP address (the IP address to which the client
+connected) is looked up in the server list. A vhost is matched if it
+is an IP-based vhost, the IP address matches and the port matches
+(taking into account wildcards).
+
+<p> If no vhosts are matched then the last occurrence, if it appears,
+of a <SAMP>_default_</SAMP> address (which if you recall the ordering of the
+server list mentioned above means that this would be the first occurrence
+of <SAMP>_default_</SAMP> in the config file) is matched.
+
+<p> In any event, if nothing above has matched, then the main_server is
+matched.
+
+<p> The vhost resulting from the above search is stored with data
+about the connection. We'll call this the <EM>connection vhost</EM>.
+The connection vhost is constant over all requests in a particular TCP/IP
+session -- that is, over all requests in a KeepAlive/persistent session.
+
+<p> For each request made on the connection the following sequence of
+events further determines the actual vhost that will be used to serve
+the request.
+
+<p> <code>check_fulluri</code>: If the requestURI is an absoluteURI, that
+is it includes <code>http://hostname/</code>, then an attempt is made to
+determine if the hostname's address (and optional port) match that of
+the connection vhost. If it does then the hostname portion of the URI
+is saved as the <EM>request_hostname</EM>. If it does not match, then the
+URI remains untouched. <STRONG>Note</STRONG>: to achieve this address
+comparison,
+the hostname supplied goes through a DNS lookup unless it matches the
+<code>ServerName</code> or the local IP address of the client's socket.
+
+<p> <code>parse_uri</code>: If the URI begins with a protocol
+(<EM>i.e.</EM>, <code>http:</code>, <code>ftp:</code>) then the request is
+considered a proxy request. Note that even though we may have stripped
+an <code>http://hostname/</code> in the previous step, this could still
+be a proxy request.
+
+<p> <code>read_request</code>: If the request does not have a hostname
+from the earlier step, then any <code>Host:</code> header sent by the
+client is used as the request hostname.
+
+<p> <code>check_hostalias</code>: If the request now has a hostname,
+then an attempt is made to match for this hostname. The first step
+of this match is to compare any port, if one was given in the request,
+against the <code>Port</code> field of the connection vhost. If there's
+a mismatch then the vhost used for the request is the connection vhost.
+(This is a bug, see observations.)
+
+<p>
+If the port matches, then httpd scans the list of vhosts starting with
+the next server <STRONG>after</STRONG> the connection vhost. This scan does not
+stop if there are any matches, it goes through all possible vhosts,
+and in the end uses the last match it found. The comparisons performed
+are as follows:
+
+<ul>
+<li>Compare the request hostname:port with the vhost
+ <code>ServerName</code> and <code>Port</code>.
+
+<li>Compare the request hostname against any and all addresses given in
+ the <code>VirtualHost</code> directive for this vhost.
+
+<li>Compare the request hostname against the <code>ServerAlias</code>
+ given for the vhost.
+</ul>
+
+<p>
+<code>check_serverpath</code>: If the request has no hostname
+(back up a few paragraphs) then a scan similar to the one
+in <code>check_hostalias</code> is performed to match any
+<code>ServerPath</code> directives given in the vhosts. Note that the
+<STRONG>last match</STRONG> is used regardless (again consider the ordering of
+the virtual hosts).
+
+<h3>Observations</h3>
+
+<ul>
+
+<li>It is difficult to define an IP-based vhost for the machine's
+ &quot;main IP address&quot;. You essentially have to create a bogus
+ <code>ServerName</code> for the main_server that does not match the
+ machine's IPs.
+ <P>
+
+<li>During the scans in both <code>check_hostalias</code> and
+ <code>check_serverpath</code> no check is made that the vhost being
+ scanned is actually a name-based vhost. This means, for example, that
+ it's possible to match an IP-based vhost through another address. But
+ because the scan starts in the vhost list at the first vhost that
+ matched the local IP address of the connection, not all IP-based vhosts
+ can be matched.
+ <p>
+ Consider the config file above with three vhosts A, B, C. Suppose
+ that B is a named-based vhost, and A and C are IP-based vhosts. If
+ a request comes in on B or C's address containing a header
+ &quot;<SAMP>Host: A</SAMP>&quot; then
+ it will be served from A's config. If a request comes in on A's
+ address then it will always be served from A's config regardless of
+ any Host: header.
+ </p>
+
+<li>Unless you have a <SAMP>_default_</SAMP> vhost,
+ it doesn't matter if you mix name-based vhosts in amongst IP-based
+ vhosts. During the <code>find_virtual_server</code> phase above no
+ named-based vhost will be matched, so the main_server will remain the
+ connection vhost. Then scans will cover all vhosts in the vhost list.
+ <p>
+ If you do have a <SAMP>_default_</SAMP> vhost, then you cannot place
+ named-based vhosts after it in the config. This is because on any
+ connection to the main server IPs the connection vhost will always be
+ the <SAMP>_default_</SAMP> vhost since none of the name-based are
+ considered during <code>find_virtual_server</code>.
+ </p>
+
+<li>You should never specify DNS names in <code>VirtualHost</code>
+ directives because it will force your server to rely on DNS to boot.
+ Furthermore it poses a security threat if you do not control the
+ DNS for all the domains listed.
+ <a href="dns-caveats.html">There's more information
+ available on this and the next two topics</a>.
+ <p>
+
+<li><code>ServerName</code> should always be set for each vhost. Otherwise
+ A DNS lookup is required for each vhost.
+ <p>
+
+<li>A DNS lookup is always required for the main_server's
+ <code>ServerName</code> (or to generate that if it isn't specified
+ in the config).
+ <p>
+
+<li>If a <code>ServerPath</code> directive exists which is a prefix of
+ another <code>ServerPath</code> directive that appears later in
+ the configuration file, then the former will always be matched
+ and the latter will never be matched. (That is assuming that no
+ Host header was available to disambiguate the two.)
+ <p>
+
+<li>If a vhost that would otherwise be a name-vhost includes a
+ <code>Port</code> statement that doesn't match the main_server
+ <code>Port</code> then it will be considered an IP-based vhost.
+ Then <code>find_virtual_server</code> will match it (because
+ the ports associated with each address in the address set default
+ to the port of the main_server) as the connection vhost. Then
+ <code>check_hostalias</code> will refuse to check any other name-based
+ vhost because of the port mismatch. The result is that the vhost
+ will steal all hits going to the main_server address.
+ <p>
+
+<li>If two IP-based vhosts have an address in common, the vhost appearing
+ later in the file is always matched. Such a thing might happen
+ inadvertently. If the config has name-based vhosts and for some reason
+ the main_server <code>ServerName</code> resolves to the wrong address
+ then all the name-based vhosts will be parsed as ip-based vhosts.
+ Then the last of them will steal all the hits.
+ <P>
+
+<li>The last name-based vhost in the config is always matched for any hit
+ which doesn't match one of the other name-based vhosts.
+
+</ul>
+
+<h3><a name="whatworks">What Works</a></h3>
+
+<p>In addition to the tips on the <a href="dns-caveats.html#tips">DNS
+Issues</a> page, here are some further tips:
+
+<ul>
+
+<li>Place all main_server definitions before any VirtualHost definitions.
+(This is to aid the readability of the configuration -- the post-config
+merging process makes it non-obvious that definitions mixed in around
+virtualhosts might affect all virtualhosts.)
+<p>
+
+<li>Arrange your VirtualHosts such
+that all name-based virtual hosts come first, followed by IP-based
+virtual hosts, followed by any <SAMP>_default_</SAMP> virtual host
+<p>
+
+<li>Avoid <code>ServerPaths</code> which are prefixes of other
+<code>ServerPaths</code>. If you cannot avoid this then you have to
+ensure that the longer (more specific) prefix vhost appears earlier in
+the configuration file than the shorter (less specific) prefix
+(<EM>i.e.</EM>, &quot;ServerPath /abc&quot; should appear after
+&quot;ServerPath /abcdef&quot;).
+<p>
+
+<li>Do not use <EM>port-based</EM> vhosts in the same server as
+name-based vhosts. A loose definition for port-based is a vhost which
+is determined by the port on the server (<em>i.e.</em>, one server with
+ports 8000, 8080, and 80 - all of which have different configurations).
+<p>
+
+</ul>
+
+<!--#include virtual="footer.html" -->
+</BODY>
+</HTML>