diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/manual/developer/API.html | 988 | ||||
-rw-r--r-- | docs/manual/misc/API.html | 988 | ||||
-rw-r--r-- | docs/manual/misc/FAQ.html | 162 | ||||
-rw-r--r-- | docs/manual/misc/client_block_api.html | 70 | ||||
-rw-r--r-- | docs/manual/misc/compat_notes.html | 108 | ||||
-rw-r--r-- | docs/manual/misc/security_tips.html | 92 | ||||
-rw-r--r-- | docs/manual/platform/perf-bsd44.html | 215 | ||||
-rw-r--r-- | docs/manual/platform/perf-dec.html | 267 | ||||
-rw-r--r-- | docs/manual/platform/perf.html | 134 |
9 files changed, 3024 insertions, 0 deletions
diff --git a/docs/manual/developer/API.html b/docs/manual/developer/API.html new file mode 100644 index 0000000000..f860996e47 --- /dev/null +++ b/docs/manual/developer/API.html @@ -0,0 +1,988 @@ +<!--%hypertext --> +<html><head> +<title>Apache API notes</title> +</head> +<body> +<!--/%hypertext --> +<h1>Apache API notes</h1> + +These are some notes on the Apache API and the data structures you +have to deal with, etc. They are not yet nearly complete, but +hopefully, they will help you get your bearings. Keep in mind that +the API is still subject to change as we gain experience with it. +(See the TODO file for what <em>might</em> be coming). However, +it will be easy to adapt modules to any changes that are made. +(We have more modules to adapt than you do). +<p> + +A few notes on general pedagogical style here. In the interest of +conciseness, all structure declarations here are incomplete --- the +real ones have more slots that I'm not telling you about. For the +most part, these are reserved to one component of the server core or +another, and should be altered by modules with caution. However, in +some cases, they really are things I just haven't gotten around to +yet. Welcome to the bleeding edge.<p> + +Finally, here's an outline, to give you some bare idea of what's +coming up, and in what order: + +<ul> +<li> <a href="#basics">Basic concepts.</a> +<menu> + <li> <a href="#HMR">Handlers, Modules, and Requests</a> + <li> <a href="#moduletour">A brief tour of a module</a> +</menu> +<li> <a href="#handlers">How handlers work</a> +<menu> + <li> <a href="#req_tour">A brief tour of the <code>request_rec</code></a> + <li> <a href="#req_orig">Where request_rec structures come from</a> + <li> <a href="#req_return">Handling requests, declining, and returning error codes</a> + <li> <a href="#resp_handlers">Special considerations for response handlers</a> + <li> <a href="#auth_handlers">Special considerations for authentication handlers</a> + <li> <a href="#log_handlers">Special considerations for logging handlers</a> +</menu> +<li> <a href="#pools">Resource allocation and resource pools</a> +<li> <a href="#config">Configuration, commands and the like</a> +<menu> + <li> <a href="#per-dir">Per-directory configuration structures</a> + <li> <a href="#commands">Command handling</a> + <li> <a href="#servconf">Side notes --- per-server configuration, virtual servers, etc.</a> +</menu> +</ul> + +<h2><a name="basics">Basic concepts.</a></h2> + +We begin with an overview of the basic concepts behind the +API, and how they are manifested in the code. + +<h3><a name="HMR">Handlers, Modules, and Requests</a></h3> + +Apache breaks down request handling into a series of steps, more or +less the same way the Netscape server API does (although this API has +a few more stages than NetSite does, as hooks for stuff I thought +might be useful in the future). These are: + +<ul> + <li> URI -> Filename translation + <li> Auth ID checking [is the user who they say they are?] + <li> Auth access checking [is the user authorized <em>here</em>?] + <li> Access checking other than auth + <li> Determining MIME type of the object requested + <li> `Fixups' --- there aren't any of these yet, but the phase is + intended as a hook for possible extensions like + <code>SetEnv</code>, which don't really fit well elsewhere. + <li> Actually sending a response back to the client. + <li> Logging the request +</ul> + +These phases are handled by looking at each of a succession of +<em>modules</em>, looking to see if each of them has a handler for the +phase, and attempting invoking it if so. The handler can typically do +one of three things: + +<ul> + <li> <em>Handle</em> the request, and indicate that it has done so + by returning the magic constant <code>OK</code>. + <li> <em>Decline</em> to handle the request, by returning the magic + integer constant <code>DECLINED</code>. In this case, the + server behaves in all respects as if the handler simply hadn't + been there. + <li> Signal an error, by returning one of the HTTP error codes. + This terminates normal handling of the request, although an + ErrorDocument may be invoked to try to mop up, and it will be + logged in any case. +</ul> + +Most phases are terminated by the first module that handles them; +however, for logging, `fixups', and non-access authentication +checking, all handlers always run (barring an error). Also, the +response phase is unique in that modules may declare multiple handlers +for it, via a dispatch table keyed on the MIME type of the requested +object. Modules may declare a response-phase handler which can handle +<em>any</em> request, by giving it the key <code>*/*</code> (i.e., a +wildcard MIME type specification). However, wildcard handlers are +only invoked if the server has already tried and failed to find a more +specific response handler for the MIME type of the requested object +(either none existed, or they all declined).<p> + +The handlers themselves are functions of one argument (a +<code>request_rec</code> structure. vide infra), which returns an +integer, as above.<p> + +<h3><a name="moduletour">A brief tour of a module</a></h3> + +At this point, we need to explain the structure of a module. Our +candidate will be one of the messier ones, the CGI module --- this +handles both CGI scripts and the <code>ScriptAlias</code> config file +command. It's actually a great deal more complicated than most +modules, but if we're going to have only one example, it might as well +be the one with its fingers in every place.<p> + +Let's begin with handlers. In order to handle the CGI scripts, the +module declares a response handler for them. Because of +<code>ScriptAlias</code>, it also has handlers for the name +translation phase (to recognise <code>ScriptAlias</code>ed URIs), the +type-checking phase (any <code>ScriptAlias</code>ed request is typed +as a CGI script).<p> + +The module needs to maintain some per (virtual) +server information, namely, the <code>ScriptAlias</code>es in effect; +the module structure therefore contains pointers to a functions which +builds these structures, and to another which combines two of them (in +case the main server and a virtual server both have +<code>ScriptAlias</code>es declared).<p> + +Finally, this module contains code to handle the +<code>ScriptAlias</code> command itself. This particular module only +declares one command, but there could be more, so modules have +<em>command tables</em> which declare their commands, and describe +where they are permitted, and how they are to be invoked. <p> + +A final note on the declared types of the arguments of some of these +commands: a <code>pool</code> is a pointer to a <em>resource pool</em> +structure; these are used by the server to keep track of the memory +which has been allocated, files opened, etc., either to service a +particular request, or to handle the process of configuring itself. +That way, when the request is over (or, for the configuration pool, +when the server is restarting), the memory can be freed, and the files +closed, <i>en masse</i>, without anyone having to write explicit code to +track them all down and dispose of them. Also, a +<code>cmd_parms</code> structure contains various information about +the config file being read, and other status information, which is +sometimes of use to the function which processes a config-file command +(such as <code>ScriptAlias</code>). + +With no further ado, the module itself: + +<pre> +/* Declarations of handlers. */ + +int translate_scriptalias (request_rec *); +int type_scriptalias (request_rec *); +int cgi_handler (request_rec *); + +/* Subsidiary dispatch table for response-phase handlers, by MIME type */ + +handler_rec cgi_handlers[] = { +{ "application/x-httpd-cgi", cgi_handler }, +{ NULL } +}; + +/* Declarations of routines to manipulate the module's configuration + * info. Note that these are returned, and passed in, as void *'s; + * the server core keeps track of them, but it doesn't, and can't, + * know their internal structure. + */ + +void *make_cgi_server_config (pool *); +void *merge_cgi_server_config (pool *, void *, void *); + +/* Declarations of routines to handle config-file commands */ + +extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake, + char *real); + +command_rec cgi_cmds[] = { +{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2, + "a fakename and a realname"}, +{ NULL } +}; + +module cgi_module = { + STANDARD_MODULE_STUFF, + NULL, /* initializer */ + NULL, /* dir config creator */ + NULL, /* dir merger --- default is to override */ + make_cgi_server_config, /* server config */ + merge_cgi_server_config, /* merge server config */ + cgi_cmds, /* command table */ + cgi_handlers, /* handlers */ + translate_scriptalias, /* filename translation */ + NULL, /* check_user_id */ + NULL, /* check auth */ + NULL, /* check access */ + type_scriptalias, /* type_checker */ + NULL, /* fixups */ + NULL /* logger */ +}; +</pre> + +<h2><a name="handlers">How handlers work</a></h2> + +The sole argument to handlers is a <code>request_rec</code> structure. +This structure describes a particular request which has been made to +the server, on behalf of a client. In most cases, each connection to +the client generates only one <code>request_rec</code> structure.<p> + +<h3><a name="req_tour">A brief tour of the <code>request_rec</code></a></h3> + +The <code>request_rec</code> contains pointers to a resource pool +which will be cleared when the server is finished handling the +request; to structures containing per-server and per-connection +information, and most importantly, information on the request itself.<p> + +The most important such information is a small set of character +strings describing attributes of the object being requested, including +its URI, filename, content-type and content-encoding (these being filled +in by the translation and type-check handlers which handle the +request, respectively). <p> + +Other commonly used data items are tables giving the MIME headers on +the client's original request, MIME headers to be sent back with the +response (which modules can add to at will), and environment variables +for any subprocesses which are spawned off in the course of servicing +the request. These tables are manipulated using the +<code>table_get</code> and <code>table_set</code> routines. <p> + +Finally, there are pointers to two data structures which, in turn, +point to per-module configuration structures. Specifically, these +hold pointers to the data structures which the module has built to +describe the way it has been configured to operate in a given +directory (via <code>.htaccess</code> files or +<code><Directory></code> sections), for private data it has +built in the course of servicing the request (so modules' handlers for +one phase can pass `notes' to their handlers for other phases). There +is another such configuration vector in the <code>server_rec</code> +data structure pointed to by the <code>request_rec</code>, which +contains per (virtual) server configuration data.<p> + +Here is an abridged declaration, giving the fields most commonly used:<p> + +<pre> +struct request_rec { + + pool *pool; + conn_rec *connection; + server_rec *server; + + /* What object is being requested */ + + char *uri; + char *filename; + char *path_info; + char *args; /* QUERY_ARGS, if any */ + struct stat finfo; /* Set by server core; + * st_mode set to zero if no such file */ + + char *content_type; + char *content_encoding; + + /* MIME header environments, in and out. Also, an array containing + * environment variables to be passed to subprocesses, so people can + * write modules to add to that environment. + * + * The difference between headers_out and err_headers_out is that + * the latter are printed even on error, and persist across internal + * redirects (so the headers printed for ErrorDocument handlers will + * have them). + */ + + table *headers_in; + table *headers_out; + table *err_headers_out; + table *subprocess_env; + + /* Info about the request itself... */ + + int header_only; /* HEAD request, as opposed to GET */ + char *protocol; /* Protocol, as given to us, or HTTP/0.9 */ + char *method; /* GET, HEAD, POST, etc. */ + int method_number; /* M_GET, M_POST, etc. */ + + /* Info for logging */ + + char *the_request; + int bytes_sent; + + /* A flag which modules can set, to indicate that the data being + * returned is volatile, and clients should be told not to cache it. + */ + + int no_cache; + + /* Various other config info which may change with .htaccess files + * These are config vectors, with one void* pointer for each module + * (the thing pointed to being the module's business). + */ + + void *per_dir_config; /* Options set in config files, etc. */ + void *request_config; /* Notes on *this* request */ + +}; + +</pre> + +<h3><a name="req_orig">Where request_rec structures come from</a></h3> + +Most <code>request_rec</code> structures are built by reading an HTTP +request from a client, and filling in the fields. However, there are +a few exceptions: + +<ul> + <li> If the request is to an imagemap, a type map (i.e., a + <code>*.var</code> file), or a CGI script which returned a + local `Location:', then the resource which the user requested + is going to be ultimately located by some URI other than what + the client originally supplied. In this case, the server does + an <em>internal redirect</em>, constructing a new + <code>request_rec</code> for the new URI, and processing it + almost exactly as if the client had requested the new URI + directly. <p> + + <li> If some handler signaled an error, and an + <code>ErrorDocument</code> is in scope, the same internal + redirect machinery comes into play.<p> + + <li> Finally, a handler occasionally needs to investigate `what + would happen if' some other request were run. For instance, + the directory indexing module needs to know what MIME type + would be assigned to a request for each directory entry, in + order to figure out what icon to use.<p> + + Such handlers can construct a <em>sub-request</em>, using the + functions <code>sub_req_lookup_file</code> and + <code>sub_req_lookup_uri</code>; this constructs a new + <code>request_rec</code> structure and processes it as you + would expect, up to but not including the point of actually + sending a response. (These functions skip over the access + checks if the sub-request is for a file in the same directory + as the original request).<p> + + (Server-side includes work by building sub-requests and then + actually invoking the response handler for them, via the + function <code>run_sub_request</code>). +</ul> + +<h3><a name="req_return">Handling requests, declining, and returning error codes</a></h3> + +As discussed above, each handler, when invoked to handle a particular +<code>request_rec</code>, has to return an <code>int</code> to +indicate what happened. That can either be + +<ul> + <li> OK --- the request was handled successfully. This may or may + not terminate the phase. + <li> DECLINED --- no erroneous condition exists, but the module + declines to handle the phase; the server tries to find another. + <li> an HTTP error code, which aborts handling of the request. +</ul> + +Note that if the error code returned is <code>REDIRECT</code>, then +the module should put a <code>Location</code> in the request's +<code>headers_out</code>, to indicate where the client should be +redirected <em>to</em>. <p> + +<h3><a name="resp_handlers">Special considerations for response handlers</a></h3> + +Handlers for most phases do their work by simply setting a few fields +in the <code>request_rec</code> structure (or, in the case of access +checkers, simply by returning the correct error code). However, +response handlers have to actually send a request back to the client. <p> + +They should begin by sending an HTTP response header, using the +function <code>send_http_header</code>. (You don't have to do +anything special to skip sending the header for HTTP/0.9 requests; the +function figures out on its own that it shouldn't do anything). If +the request is marked <code>header_only</code>, that's all they should +do; they should return after that, without attempting any further +output. <p> + +Otherwise, they should produce a request body which responds to the +client as appropriate. The primitives for this are <code>rputc</code> +and <code>rprintf</code>, for internally generated output, and +<code>send_fd</code>, to copy the contents of some <code>FILE *</code> +straight to the client. <p> + +At this point, you should more or less understand the following piece +of code, which is the handler which handles <code>GET</code> requests +which have no more specific handler; it also shows how conditional +<code>GET</code>s can be handled, if it's desirable to do so in a +particular response handler --- <code>set_last_modified</code> checks +against the <code>If-modified-since</code> value supplied by the +client, if any, and returns an appropriate code (which will, if +nonzero, be USE_LOCAL_COPY). No similar considerations apply for +<code>set_content_length</code>, but it returns an error code for +symmetry.<p> + +<pre> +int default_handler (request_rec *r) +{ + int errstatus; + FILE *f; + + if (r->method_number != M_GET) return DECLINED; + if (r->finfo.st_mode == 0) return NOT_FOUND; + + if ((errstatus = set_content_length (r, r->finfo.st_size)) + || (errstatus = set_last_modified (r, r->finfo.st_mtime))) + return errstatus; + + f = fopen (r->filename, "r"); + + if (f == NULL) { + log_reason("file permissions deny server access", + r->filename, r); + return FORBIDDEN; + } + + register_timeout ("send", r); + send_http_header (r); + + if (!r->header_only) send_fd (f, r); + pfclose (r->pool, f); + return OK; +} +</pre> + +Finally, if all of this is too much of a challenge, there are a few +ways out of it. First off, as shown above, a response handler which +has not yet produced any output can simply return an error code, in +which case the server will automatically produce an error response. +Secondly, it can punt to some other handler by invoking +<code>internal_redirect</code>, which is how the internal redirection +machinery discussed above is invoked. A response handler which has +internally redirected should always return <code>OK</code>. <p> + +(Invoking <code>internal_redirect</code> from handlers which are +<em>not</em> response handlers will lead to serious confusion). + +<h3><a name="auth_handlers">Special considerations for authentication handlers</a></h3> + +Stuff that should be discussed here in detail: + +<ul> + <li> Authentication-phase handlers not invoked unless auth is + configured for the directory. + <li> Common auth configuration stored in the core per-dir + configuration; it has accessors <code>auth_type</code>, + <code>auth_name</code>, and <code>requires</code>. + <li> Common routines, to handle the protocol end of things, at least + for HTTP basic authentication (<code>get_basic_auth_pw</code>, + which sets the <code>connection->user</code> structure field + automatically, and <code>note_basic_auth_failure</code>, which + arranges for the proper <code>WWW-Authenticate:</code> header + to be sent back). +</ul> + +<h3><a name="log_handlers">Special considerations for logging handlers</a></h3> + +When a request has internally redirected, there is the question of +what to log. Apache handles this by bundling the entire chain of +redirects into a list of <code>request_rec</code> structures which are +threaded through the <code>r->prev</code> and <code>r->next</code> +pointers. The <code>request_rec</code> which is passed to the logging +handlers in such cases is the one which was originally built for the +initial request from the client; note that the bytes_sent field will +only be correct in the last request in the chain (the one for which a +response was actually sent). + +<h2><a name="pools">Resource allocation and resource pools</a></h2> + +One of the problems of writing and designing a server-pool server is +that of preventing leakage, that is, allocating resources (memory, +open files, etc.), without subsequently releasing them. The resource +pool machinery is designed to make it easy to prevent this from +happening, by allowing resource to be allocated in such a way that +they are <em>automatically</em> released when the server is done with +them. <p> + +The way this works is as follows: the memory which is allocated, file +opened, etc., to deal with a particular request are tied to a +<em>resource pool</em> which is allocated for the request. The pool +is a data structure which itself tracks the resources in question. <p> + +When the request has been processed, the pool is <em>cleared</em>. At +that point, all the memory associated with it is released for reuse, +all files associated with it are closed, and any other clean-up +functions which are associated with the pool are run. When this is +over, we can be confident that all the resource tied to the pool have +been released, and that none of them have leaked. <p> + +Server restarts, and allocation of memory and resources for per-server +configuration, are handled in a similar way. There is a +<em>configuration pool</em>, which keeps track of resources which were +allocated while reading the server configuration files, and handling +the commands therein (for instance, the memory that was allocated for +per-server module configuration, log files and other files that were +opened, and so forth). When the server restarts, and has to reread +the configuration files, the configuration pool is cleared, and so the +memory and file descriptors which were taken up by reading them the +last time are made available for reuse. <p> + +It should be noted that use of the pool machinery isn't generally +obligatory, except for situations like logging handlers, where you +really need to register cleanups to make sure that the log file gets +closed when the server restarts (this is most easily done by using the +function <code><a href="#pool-files">pfopen</a></code>, which also +arranges for the underlying file descriptor to be closed before any +child processes, such as for CGI scripts, are <code>exec</code>ed), or +in case you are using the timeout machinery (which isn't yet even +documented here). However, there are two benefits to using it: +resources allocated to a pool never leak (even if you allocate a +scratch string, and just forget about it); also, for memory +allocation, <code>palloc</code> is generally faster than +<code>malloc</code>.<p> + +We begin here by describing how memory is allocated to pools, and then +discuss how other resources are tracked by the resource pool +machinery. + +<h3>Allocation of memory in pools</h3> + +Memory is allocated to pools by calling the function +<code>palloc</code>, which takes two arguments, one being a pointer to +a resource pool structure, and the other being the amount of memory to +allocate (in <code>char</code>s). Within handlers for handling +requests, the most common way of getting a resource pool structure is +by looking at the <code>pool</code> slot of the relevant +<code>request_rec</code>; hence the repeated appearance of the +following idiom in module code: + +<pre> +int my_handler(request_rec *r) +{ + struct my_structure *foo; + ... + + foo = (foo *)palloc (r->pool, sizeof(my_structure)); +} +</pre> + +Note that <em>there is no <code>pfree</code></em> --- +<code>palloc</code>ed memory is freed only when the associated +resource pool is cleared. This means that <code>palloc</code> does not +have to do as much accounting as <code>malloc()</code>; all it does in +the typical case is to round up the size, bump a pointer, and do a +range check.<p> + +(It also raises the possibility that heavy use of <code>palloc</code> +could cause a server process to grow excessively large. There are +two ways to deal with this, which are dealt with below; briefly, you +can use <code>malloc</code>, and try to be sure that all of the memory +gets explicitly <code>free</code>d, or you can allocate a sub-pool of +the main pool, allocate your memory in the sub-pool, and clear it out +periodically. The latter technique is discussed in the section on +sub-pools below, and is used in the directory-indexing code, in order +to avoid excessive storage allocation when listing directories with +thousands of files). + +<h3>Allocating initialized memory</h3> + +There are functions which allocate initialized memory, and are +frequently useful. The function <code>pcalloc</code> has the same +interface as <code>palloc</code>, but clears out the memory it +allocates before it returns it. The function <code>pstrdup</code> +takes a resource pool and a <code>char *</code> as arguments, and +allocates memory for a copy of the string the pointer points to, +returning a pointer to the copy. Finally <code>pstrcat</code> is a +varargs-style function, which takes a pointer to a resource pool, and +at least two <code>char *</code> arguments, the last of which must be +<code>NULL</code>. It allocates enough memory to fit copies of each +of the strings, as a unit; for instance: + +<pre> + pstrcat (r->pool, "foo", "/", "bar", NULL); +</pre> + +returns a pointer to 8 bytes worth of memory, initialized to +<code>"foo/bar"</code>. + +<h3>Tracking open files, etc.</h3> + +As indicated above, resource pools are also used to track other sorts +of resources besides memory. The most common are open files. The +routine which is typically used for this is <code>pfopen</code>, which +takes a resource pool and two strings as arguments; the strings are +the same as the typical arguments to <code>fopen</code>, e.g., + +<pre> + ... + FILE *f = pfopen (r->pool, r->filename, "r"); + + if (f == NULL) { ... } else { ... } +</pre> + +There is also a <code>popenf</code> routine, which parallels the +lower-level <code>open</code> system call. Both of these routines +arrange for the file to be closed when the resource pool in question +is cleared. <p> + +Unlike the case for memory, there <em>are</em> functions to close +files allocated with <code>pfopen</code>, and <code>popenf</code>, +namely <code>pfclose</code> and <code>pclosef</code>. (This is +because, on many systems, the number of files which a single process +can have open is quite limited). It is important to use these +functions to close files allocated with <code>pfopen</code> and +<code>popenf</code>, since to do otherwise could cause fatal errors on +systems such as Linux, which react badly if the same +<code>FILE*</code> is closed more than once. <p> + +(Using the <code>close</code> functions is not mandatory, since the +file will eventually be closed regardless, but you should consider it +in cases where your module is opening, or could open, a lot of files). + +<h3>Other sorts of resources --- cleanup functions</h3> + +More text goes here. Describe the the cleanup primitives in terms of +which the file stuff is implemented; also, <code>spawn_process</code>. + +<h3>Fine control --- creating and dealing with sub-pools, with a note +on sub-requests</h3> + +On rare occasions, too-free use of <code>palloc()</code> and the +associated primitives may result in undesirably profligate resource +allocation. You can deal with such a case by creating a +<em>sub-pool</em>, allocating within the sub-pool rather than the main +pool, and clearing or destroying the sub-pool, which releases the +resources which were associated with it. (This really <em>is</em> a +rare situation; the only case in which it comes up in the standard +module set is in case of listing directories, and then only with +<em>very</em> large directories. Unnecessary use of the primitives +discussed here can hair up your code quite a bit, with very little +gain). <p> + +The primitive for creating a sub-pool is <code>make_sub_pool</code>, +which takes another pool (the parent pool) as an argument. When the +main pool is cleared, the sub-pool will be destroyed. The sub-pool +may also be cleared or destroyed at any time, by calling the functions +<code>clear_pool</code> and <code>destroy_pool</code>, respectively. +(The difference is that <code>clear_pool</code> frees resources +associated with the pool, while <code>destroy_pool</code> also +deallocates the pool itself. In the former case, you can allocate new +resources within the pool, and clear it again, and so forth; in the +latter case, it is simply gone). <p> + +One final note --- sub-requests have their own resource pools, which +are sub-pools of the resource pool for the main request. The polite +way to reclaim the resources associated with a sub request which you +have allocated (using the <code>sub_req_lookup_...</code> functions) +is <code>destroy_sub_request</code>, which frees the resource pool. +Before calling this function, be sure to copy anything that you care +about which might be allocated in the sub-request's resource pool into +someplace a little less volatile (for instance, the filename in its +<code>request_rec</code> structure). <p> + +(Again, under most circumstances, you shouldn't feel obliged to call +this function; only 2K of memory or so are allocated for a typical sub +request, and it will be freed anyway when the main request pool is +cleared. It is only when you are allocating many, many sub-requests +for a single main request that you should seriously consider the +<code>destroy...</code> functions). + +<h2><a name="config">Configuration, commands and the like</a></h2> + +One of the design goals for this server was to maintain external +compatibility with the NCSA 1.3 server --- that is, to read the same +configuration files, to process all the directives therein correctly, +and in general to be a drop-in replacement for NCSA. On the other +hand, another design goal was to move as much of the server's +functionality into modules which have as little as possible to do with +the monolithic server core. The only way to reconcile these goals is +to move the handling of most commands from the central server into the +modules. <p> + +However, just giving the modules command tables is not enough to +divorce them completely from the server core. The server has to +remember the commands in order to act on them later. That involves +maintaining data which is private to the modules, and which can be +either per-server, or per-directory. Most things are per-directory, +including in particular access control and authorization information, +but also information on how to determine file types from suffixes, +which can be modified by <code>AddType</code> and +<code>DefaultType</code> directives, and so forth. In general, the +governing philosophy is that anything which <em>can</em> be made +configurable by directory should be; per-server information is +generally used in the standard set of modules for information like +<code>Alias</code>es and <code>Redirect</code>s which come into play +before the request is tied to a particular place in the underlying +file system. <p> + +Another requirement for emulating the NCSA server is being able to +handle the per-directory configuration files, generally called +<code>.htaccess</code> files, though even in the NCSA server they can +contain directives which have nothing at all to do with access +control. Accordingly, after URI -> filename translation, but before +performing any other phase, the server walks down the directory +hierarchy of the underlying filesystem, following the translated +pathname, to read any <code>.htaccess</code> files which might be +present. The information which is read in then has to be +<em>merged</em> with the applicable information from the server's own +config files (either from the <code><Directory></code> sections +in <code>access.conf</code>, or from defaults in +<code>srm.conf</code>, which actually behaves for most purposes almost +exactly like <code><Directory /></code>).<p> + +Finally, after having served a request which involved reading +<code>.htaccess</code> files, we need to discard the storage allocated +for handling them. That is solved the same way it is solved wherever +else similar problems come up, by tying those structures to the +per-transaction resource pool. <p> + +<h3><a name="per-dir">Per-directory configuration structures</a></h3> + +Let's look out how all of this plays out in <code>mod_mime.c</code>, +which defines the file typing handler which emulates the NCSA server's +behavior of determining file types from suffixes. What we'll be +looking at, here, is the code which implements the +<code>AddType</code> and <code>AddEncoding</code> commands. These +commands can appear in <code>.htaccess</code> files, so they must be +handled in the module's private per-directory data, which in fact, +consists of two separate <code>table</code>s for MIME types and +encoding information, and is declared as follows: + +<pre> +typedef struct { + table *forced_types; /* Additional AddTyped stuff */ + table *encoding_types; /* Added with AddEncoding... */ +} mime_dir_config; +</pre> + +When the server is reading a configuration file, or +<code><Directory></code> section, which includes one of the MIME +module's commands, it needs to create a <code>mime_dir_config</code> +structure, so those commands have something to act on. It does this +by invoking the function it finds in the module's `create per-dir +config slot', with two arguments: the name of the directory to which +this configuration information applies (or <code>NULL</code> for +<code>srm.conf</code>), and a pointer to a resource pool in which the +allocation should happen. <p> + +(If we are reading a <code>.htaccess</code> file, that resource pool +is the per-request resource pool for the request; otherwise it is a +resource pool which is used for configuration data, and cleared on +restarts. Either way, it is important for the structure being created +to vanish when the pool is cleared, by registering a cleanup on the +pool if necessary). <p> + +For the MIME module, the per-dir config creation function just +<code>palloc</code>s the structure above, and a creates a couple of +<code>table</code>s to fill it. That looks like this: + +<pre> +void *create_mime_dir_config (pool *p, char *dummy) +{ + mime_dir_config *new = + (mime_dir_config *) palloc (p, sizeof(mime_dir_config)); + + new->forced_types = make_table (p, 4); + new->encoding_types = make_table (p, 4); + + return new; +} +</pre> + +Now, suppose we've just read in a <code>.htaccess</code> file. We +already have the per-directory configuration structure for the next +directory up in the hierarchy. If the <code>.htaccess</code> file we +just read in didn't have any <code>AddType</code> or +<code>AddEncoding</code> commands, its per-directory config structure +for the MIME module is still valid, and we can just use it. +Otherwise, we need to merge the two structures somehow. <p> + +To do that, the server invokes the module's per-directory config merge +function, if one is present. That function takes three arguments: +the two structures being merged, and a resource pool in which to +allocate the result. For the MIME module, all that needs to be done +is overlay the tables from the new per-directory config structure with +those from the parent: + +<pre> +void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv) +{ + mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv; + mime_dir_config *subdir = (mime_dir_config *)subdirv; + mime_dir_config *new = + (mime_dir_config *)palloc (p, sizeof(mime_dir_config)); + + new->forced_types = overlay_tables (p, subdir->forced_types, + parent_dir->forced_types); + new->encoding_types = overlay_tables (p, subdir->encoding_types, + parent_dir->encoding_types); + + return new; +} +</pre> + +As a note --- if there is no per-directory merge function present, the +server will just use the subdirectory's configuration info, and ignore +the parent's. For some modules, that works just fine (e.g., for the +includes module, whose per-directory configuration information +consists solely of the state of the <code>XBITHACK</code>), and for +those modules, you can just not declare one, and leave the +corresponding structure slot in the module itself <code>NULL</code>.<p> + +<h3><a name="commands">Command handling</a></h3> + +Now that we have these structures, we need to be able to figure out +how to fill them. That involves processing the actual +<code>AddType</code> and <code>AddEncoding</code> commands. To find +commands, the server looks in the module's <code>command table</code>. +That table contains information on how many arguments the commands +take, and in what formats, where it is permitted, and so forth. That +information is sufficient to allow the server to invoke most +command-handling functions with pre-parsed arguments. Without further +ado, let's look at the <code>AddType</code> command handler, which +looks like this (the <code>AddEncoding</code> command looks basically +the same, and won't be shown here): + +<pre> +char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext) +{ + if (*ext == '.') ++ext; + table_set (m->forced_types, ext, ct); + return NULL; +} +</pre> + +This command handler is unusually simple. As you can see, it takes +four arguments, two of which are pre-parsed arguments, the third being +the per-directory configuration structure for the module in question, +and the fourth being a pointer to a <code>cmd_parms</code> structure. +That structure contains a bunch of arguments which are frequently of +use to some, but not all, commands, including a resource pool (from +which memory can be allocated, and to which cleanups should be tied), +and the (virtual) server being configured, from which the module's +per-server configuration data can be obtained if required.<p> + +Another way in which this particular command handler is unusually +simple is that there are no error conditions which it can encounter. +If there were, it could return an error message instead of +<code>NULL</code>; this causes an error to be printed out on the +server's <code>stderr</code>, followed by a quick exit, if it is in +the main config files; for a <code>.htaccess</code> file, the syntax +error is logged in the server error log (along with an indication of +where it came from), and the request is bounced with a server error +response (HTTP error status, code 500). <p> + +The MIME module's command table has entries for these commands, which +look like this: + +<pre> +command_rec mime_cmds[] = { +{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2, + "a mime type followed by a file extension" }, +{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, + "an encoding (e.g., gzip), followed by a file extension" }, +{ NULL } +}; +</pre> + +The entries in these tables are: + +<ul> + <li> The name of the command + <li> The function which handles it + <li> a <code>(void *)</code> pointer, which is passed in the + <code>cmd_parms</code> structure to the command handler --- + this is useful in case many similar commands are handled by the + same function. + <li> A bit mask indicating where the command may appear. There are + mask bits corresponding to each <code>AllowOverride</code> + option, and an additional mask bit, <code>RSRC_CONF</code>, + indicating that the command may appear in the server's own + config files, but <em>not</em> in any <code>.htaccess</code> + file. + <li> A flag indicating how many arguments the command handler wants + pre-parsed, and how they should be passed in. + <code>TAKE2</code> indicates two pre-parsed arguments. Other + options are <code>TAKE1</code>, which indicates one pre-parsed + argument, <code>FLAG</code>, which indicates that the argument + should be <code>On</code> or <code>Off</code>, and is passed in + as a boolean flag, <code>RAW_ARGS</code>, which causes the + server to give the command the raw, unparsed arguments + (everything but the command name itself). There is also + <code>ITERATE</code>, which means that the handler looks the + same as <code>TAKE1</code>, but that if multiple arguments are + present, it should be called multiple times, and finally + <code>ITERATE2</code>, which indicates that the command handler + looks like a <code>TAKE2</code>, but if more arguments are + present, then it should be called multiple times, holding the + first argument constant. + <li> Finally, we have a string which describes the arguments that + should be present. If the arguments in the actual config file + are not as required, this string will be used to help give a + more specific error message. (You can safely leave this + <code>NULL</code>). +</ul> + +Finally, having set this all up, we have to use it. This is +ultimately done in the module's handlers, specifically for its +file-typing handler, which looks more or less like this; note that the +per-directory configuration structure is extracted from the +<code>request_rec</code>'s per-directory configuration vector by using +the <code>get_module_config</code> function. + +<pre> +int find_ct(request_rec *r) +{ + int i; + char *fn = pstrdup (r->pool, r->filename); + mime_dir_config *conf = (mime_dir_config *) + get_module_config(r->per_dir_config, &mime_module); + char *type; + + if (S_ISDIR(r->finfo.st_mode)) { + r->content_type = DIR_MAGIC_TYPE; + return OK; + } + + if((i=rind(fn,'.')) < 0) return DECLINED; + ++i; + + if ((type = table_get (conf->encoding_types, &fn[i]))) + { + r->content_encoding = type; + + /* go back to previous extension to try to use it as a type */ + + fn[i-1] = '\0'; + if((i=rind(fn,'.')) < 0) return OK; + ++i; + } + + if ((type = table_get (conf->forced_types, &fn[i]))) + { + r->content_type = type; + } + + return OK; +} + +</pre> + +<h3><a name="servconf">Side notes --- per-server configuration, virtual servers, etc.</a></h3> + +The basic ideas behind per-server module configuration are basically +the same as those for per-directory configuration; there is a creation +function and a merge function, the latter being invoked where a +virtual server has partially overridden the base server configuration, +and a combined structure must be computed. (As with per-directory +configuration, the default if no merge function is specified, and a +module is configured in some virtual server, is that the base +configuration is simply ignored). <p> + +The only substantial difference is that when a command needs to +configure the per-server private module data, it needs to go to the +<code>cmd_parms</code> data to get at it. Here's an example, from the +alias module, which also indicates how a syntax error can be returned +(note that the per-directory configuration argument to the command +handler is declared as a dummy, since the module doesn't actually have +per-directory config data): + +<pre> +char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url) +{ + server_rec *s = cmd->server; + alias_server_conf *conf = (alias_server_conf *) + get_module_config(s->module_config,&alias_module); + alias_entry *new = push_array (conf->redirects); + + if (!is_url (url)) return "Redirect to non-URL"; + + new->fake = f; new->real = url; + return NULL; +} +</pre> +<!--%hypertext --> +</body></html> +<!--/%hypertext --> diff --git a/docs/manual/misc/API.html b/docs/manual/misc/API.html new file mode 100644 index 0000000000..f860996e47 --- /dev/null +++ b/docs/manual/misc/API.html @@ -0,0 +1,988 @@ +<!--%hypertext --> +<html><head> +<title>Apache API notes</title> +</head> +<body> +<!--/%hypertext --> +<h1>Apache API notes</h1> + +These are some notes on the Apache API and the data structures you +have to deal with, etc. They are not yet nearly complete, but +hopefully, they will help you get your bearings. Keep in mind that +the API is still subject to change as we gain experience with it. +(See the TODO file for what <em>might</em> be coming). However, +it will be easy to adapt modules to any changes that are made. +(We have more modules to adapt than you do). +<p> + +A few notes on general pedagogical style here. In the interest of +conciseness, all structure declarations here are incomplete --- the +real ones have more slots that I'm not telling you about. For the +most part, these are reserved to one component of the server core or +another, and should be altered by modules with caution. However, in +some cases, they really are things I just haven't gotten around to +yet. Welcome to the bleeding edge.<p> + +Finally, here's an outline, to give you some bare idea of what's +coming up, and in what order: + +<ul> +<li> <a href="#basics">Basic concepts.</a> +<menu> + <li> <a href="#HMR">Handlers, Modules, and Requests</a> + <li> <a href="#moduletour">A brief tour of a module</a> +</menu> +<li> <a href="#handlers">How handlers work</a> +<menu> + <li> <a href="#req_tour">A brief tour of the <code>request_rec</code></a> + <li> <a href="#req_orig">Where request_rec structures come from</a> + <li> <a href="#req_return">Handling requests, declining, and returning error codes</a> + <li> <a href="#resp_handlers">Special considerations for response handlers</a> + <li> <a href="#auth_handlers">Special considerations for authentication handlers</a> + <li> <a href="#log_handlers">Special considerations for logging handlers</a> +</menu> +<li> <a href="#pools">Resource allocation and resource pools</a> +<li> <a href="#config">Configuration, commands and the like</a> +<menu> + <li> <a href="#per-dir">Per-directory configuration structures</a> + <li> <a href="#commands">Command handling</a> + <li> <a href="#servconf">Side notes --- per-server configuration, virtual servers, etc.</a> +</menu> +</ul> + +<h2><a name="basics">Basic concepts.</a></h2> + +We begin with an overview of the basic concepts behind the +API, and how they are manifested in the code. + +<h3><a name="HMR">Handlers, Modules, and Requests</a></h3> + +Apache breaks down request handling into a series of steps, more or +less the same way the Netscape server API does (although this API has +a few more stages than NetSite does, as hooks for stuff I thought +might be useful in the future). These are: + +<ul> + <li> URI -> Filename translation + <li> Auth ID checking [is the user who they say they are?] + <li> Auth access checking [is the user authorized <em>here</em>?] + <li> Access checking other than auth + <li> Determining MIME type of the object requested + <li> `Fixups' --- there aren't any of these yet, but the phase is + intended as a hook for possible extensions like + <code>SetEnv</code>, which don't really fit well elsewhere. + <li> Actually sending a response back to the client. + <li> Logging the request +</ul> + +These phases are handled by looking at each of a succession of +<em>modules</em>, looking to see if each of them has a handler for the +phase, and attempting invoking it if so. The handler can typically do +one of three things: + +<ul> + <li> <em>Handle</em> the request, and indicate that it has done so + by returning the magic constant <code>OK</code>. + <li> <em>Decline</em> to handle the request, by returning the magic + integer constant <code>DECLINED</code>. In this case, the + server behaves in all respects as if the handler simply hadn't + been there. + <li> Signal an error, by returning one of the HTTP error codes. + This terminates normal handling of the request, although an + ErrorDocument may be invoked to try to mop up, and it will be + logged in any case. +</ul> + +Most phases are terminated by the first module that handles them; +however, for logging, `fixups', and non-access authentication +checking, all handlers always run (barring an error). Also, the +response phase is unique in that modules may declare multiple handlers +for it, via a dispatch table keyed on the MIME type of the requested +object. Modules may declare a response-phase handler which can handle +<em>any</em> request, by giving it the key <code>*/*</code> (i.e., a +wildcard MIME type specification). However, wildcard handlers are +only invoked if the server has already tried and failed to find a more +specific response handler for the MIME type of the requested object +(either none existed, or they all declined).<p> + +The handlers themselves are functions of one argument (a +<code>request_rec</code> structure. vide infra), which returns an +integer, as above.<p> + +<h3><a name="moduletour">A brief tour of a module</a></h3> + +At this point, we need to explain the structure of a module. Our +candidate will be one of the messier ones, the CGI module --- this +handles both CGI scripts and the <code>ScriptAlias</code> config file +command. It's actually a great deal more complicated than most +modules, but if we're going to have only one example, it might as well +be the one with its fingers in every place.<p> + +Let's begin with handlers. In order to handle the CGI scripts, the +module declares a response handler for them. Because of +<code>ScriptAlias</code>, it also has handlers for the name +translation phase (to recognise <code>ScriptAlias</code>ed URIs), the +type-checking phase (any <code>ScriptAlias</code>ed request is typed +as a CGI script).<p> + +The module needs to maintain some per (virtual) +server information, namely, the <code>ScriptAlias</code>es in effect; +the module structure therefore contains pointers to a functions which +builds these structures, and to another which combines two of them (in +case the main server and a virtual server both have +<code>ScriptAlias</code>es declared).<p> + +Finally, this module contains code to handle the +<code>ScriptAlias</code> command itself. This particular module only +declares one command, but there could be more, so modules have +<em>command tables</em> which declare their commands, and describe +where they are permitted, and how they are to be invoked. <p> + +A final note on the declared types of the arguments of some of these +commands: a <code>pool</code> is a pointer to a <em>resource pool</em> +structure; these are used by the server to keep track of the memory +which has been allocated, files opened, etc., either to service a +particular request, or to handle the process of configuring itself. +That way, when the request is over (or, for the configuration pool, +when the server is restarting), the memory can be freed, and the files +closed, <i>en masse</i>, without anyone having to write explicit code to +track them all down and dispose of them. Also, a +<code>cmd_parms</code> structure contains various information about +the config file being read, and other status information, which is +sometimes of use to the function which processes a config-file command +(such as <code>ScriptAlias</code>). + +With no further ado, the module itself: + +<pre> +/* Declarations of handlers. */ + +int translate_scriptalias (request_rec *); +int type_scriptalias (request_rec *); +int cgi_handler (request_rec *); + +/* Subsidiary dispatch table for response-phase handlers, by MIME type */ + +handler_rec cgi_handlers[] = { +{ "application/x-httpd-cgi", cgi_handler }, +{ NULL } +}; + +/* Declarations of routines to manipulate the module's configuration + * info. Note that these are returned, and passed in, as void *'s; + * the server core keeps track of them, but it doesn't, and can't, + * know their internal structure. + */ + +void *make_cgi_server_config (pool *); +void *merge_cgi_server_config (pool *, void *, void *); + +/* Declarations of routines to handle config-file commands */ + +extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake, + char *real); + +command_rec cgi_cmds[] = { +{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2, + "a fakename and a realname"}, +{ NULL } +}; + +module cgi_module = { + STANDARD_MODULE_STUFF, + NULL, /* initializer */ + NULL, /* dir config creator */ + NULL, /* dir merger --- default is to override */ + make_cgi_server_config, /* server config */ + merge_cgi_server_config, /* merge server config */ + cgi_cmds, /* command table */ + cgi_handlers, /* handlers */ + translate_scriptalias, /* filename translation */ + NULL, /* check_user_id */ + NULL, /* check auth */ + NULL, /* check access */ + type_scriptalias, /* type_checker */ + NULL, /* fixups */ + NULL /* logger */ +}; +</pre> + +<h2><a name="handlers">How handlers work</a></h2> + +The sole argument to handlers is a <code>request_rec</code> structure. +This structure describes a particular request which has been made to +the server, on behalf of a client. In most cases, each connection to +the client generates only one <code>request_rec</code> structure.<p> + +<h3><a name="req_tour">A brief tour of the <code>request_rec</code></a></h3> + +The <code>request_rec</code> contains pointers to a resource pool +which will be cleared when the server is finished handling the +request; to structures containing per-server and per-connection +information, and most importantly, information on the request itself.<p> + +The most important such information is a small set of character +strings describing attributes of the object being requested, including +its URI, filename, content-type and content-encoding (these being filled +in by the translation and type-check handlers which handle the +request, respectively). <p> + +Other commonly used data items are tables giving the MIME headers on +the client's original request, MIME headers to be sent back with the +response (which modules can add to at will), and environment variables +for any subprocesses which are spawned off in the course of servicing +the request. These tables are manipulated using the +<code>table_get</code> and <code>table_set</code> routines. <p> + +Finally, there are pointers to two data structures which, in turn, +point to per-module configuration structures. Specifically, these +hold pointers to the data structures which the module has built to +describe the way it has been configured to operate in a given +directory (via <code>.htaccess</code> files or +<code><Directory></code> sections), for private data it has +built in the course of servicing the request (so modules' handlers for +one phase can pass `notes' to their handlers for other phases). There +is another such configuration vector in the <code>server_rec</code> +data structure pointed to by the <code>request_rec</code>, which +contains per (virtual) server configuration data.<p> + +Here is an abridged declaration, giving the fields most commonly used:<p> + +<pre> +struct request_rec { + + pool *pool; + conn_rec *connection; + server_rec *server; + + /* What object is being requested */ + + char *uri; + char *filename; + char *path_info; + char *args; /* QUERY_ARGS, if any */ + struct stat finfo; /* Set by server core; + * st_mode set to zero if no such file */ + + char *content_type; + char *content_encoding; + + /* MIME header environments, in and out. Also, an array containing + * environment variables to be passed to subprocesses, so people can + * write modules to add to that environment. + * + * The difference between headers_out and err_headers_out is that + * the latter are printed even on error, and persist across internal + * redirects (so the headers printed for ErrorDocument handlers will + * have them). + */ + + table *headers_in; + table *headers_out; + table *err_headers_out; + table *subprocess_env; + + /* Info about the request itself... */ + + int header_only; /* HEAD request, as opposed to GET */ + char *protocol; /* Protocol, as given to us, or HTTP/0.9 */ + char *method; /* GET, HEAD, POST, etc. */ + int method_number; /* M_GET, M_POST, etc. */ + + /* Info for logging */ + + char *the_request; + int bytes_sent; + + /* A flag which modules can set, to indicate that the data being + * returned is volatile, and clients should be told not to cache it. + */ + + int no_cache; + + /* Various other config info which may change with .htaccess files + * These are config vectors, with one void* pointer for each module + * (the thing pointed to being the module's business). + */ + + void *per_dir_config; /* Options set in config files, etc. */ + void *request_config; /* Notes on *this* request */ + +}; + +</pre> + +<h3><a name="req_orig">Where request_rec structures come from</a></h3> + +Most <code>request_rec</code> structures are built by reading an HTTP +request from a client, and filling in the fields. However, there are +a few exceptions: + +<ul> + <li> If the request is to an imagemap, a type map (i.e., a + <code>*.var</code> file), or a CGI script which returned a + local `Location:', then the resource which the user requested + is going to be ultimately located by some URI other than what + the client originally supplied. In this case, the server does + an <em>internal redirect</em>, constructing a new + <code>request_rec</code> for the new URI, and processing it + almost exactly as if the client had requested the new URI + directly. <p> + + <li> If some handler signaled an error, and an + <code>ErrorDocument</code> is in scope, the same internal + redirect machinery comes into play.<p> + + <li> Finally, a handler occasionally needs to investigate `what + would happen if' some other request were run. For instance, + the directory indexing module needs to know what MIME type + would be assigned to a request for each directory entry, in + order to figure out what icon to use.<p> + + Such handlers can construct a <em>sub-request</em>, using the + functions <code>sub_req_lookup_file</code> and + <code>sub_req_lookup_uri</code>; this constructs a new + <code>request_rec</code> structure and processes it as you + would expect, up to but not including the point of actually + sending a response. (These functions skip over the access + checks if the sub-request is for a file in the same directory + as the original request).<p> + + (Server-side includes work by building sub-requests and then + actually invoking the response handler for them, via the + function <code>run_sub_request</code>). +</ul> + +<h3><a name="req_return">Handling requests, declining, and returning error codes</a></h3> + +As discussed above, each handler, when invoked to handle a particular +<code>request_rec</code>, has to return an <code>int</code> to +indicate what happened. That can either be + +<ul> + <li> OK --- the request was handled successfully. This may or may + not terminate the phase. + <li> DECLINED --- no erroneous condition exists, but the module + declines to handle the phase; the server tries to find another. + <li> an HTTP error code, which aborts handling of the request. +</ul> + +Note that if the error code returned is <code>REDIRECT</code>, then +the module should put a <code>Location</code> in the request's +<code>headers_out</code>, to indicate where the client should be +redirected <em>to</em>. <p> + +<h3><a name="resp_handlers">Special considerations for response handlers</a></h3> + +Handlers for most phases do their work by simply setting a few fields +in the <code>request_rec</code> structure (or, in the case of access +checkers, simply by returning the correct error code). However, +response handlers have to actually send a request back to the client. <p> + +They should begin by sending an HTTP response header, using the +function <code>send_http_header</code>. (You don't have to do +anything special to skip sending the header for HTTP/0.9 requests; the +function figures out on its own that it shouldn't do anything). If +the request is marked <code>header_only</code>, that's all they should +do; they should return after that, without attempting any further +output. <p> + +Otherwise, they should produce a request body which responds to the +client as appropriate. The primitives for this are <code>rputc</code> +and <code>rprintf</code>, for internally generated output, and +<code>send_fd</code>, to copy the contents of some <code>FILE *</code> +straight to the client. <p> + +At this point, you should more or less understand the following piece +of code, which is the handler which handles <code>GET</code> requests +which have no more specific handler; it also shows how conditional +<code>GET</code>s can be handled, if it's desirable to do so in a +particular response handler --- <code>set_last_modified</code> checks +against the <code>If-modified-since</code> value supplied by the +client, if any, and returns an appropriate code (which will, if +nonzero, be USE_LOCAL_COPY). No similar considerations apply for +<code>set_content_length</code>, but it returns an error code for +symmetry.<p> + +<pre> +int default_handler (request_rec *r) +{ + int errstatus; + FILE *f; + + if (r->method_number != M_GET) return DECLINED; + if (r->finfo.st_mode == 0) return NOT_FOUND; + + if ((errstatus = set_content_length (r, r->finfo.st_size)) + || (errstatus = set_last_modified (r, r->finfo.st_mtime))) + return errstatus; + + f = fopen (r->filename, "r"); + + if (f == NULL) { + log_reason("file permissions deny server access", + r->filename, r); + return FORBIDDEN; + } + + register_timeout ("send", r); + send_http_header (r); + + if (!r->header_only) send_fd (f, r); + pfclose (r->pool, f); + return OK; +} +</pre> + +Finally, if all of this is too much of a challenge, there are a few +ways out of it. First off, as shown above, a response handler which +has not yet produced any output can simply return an error code, in +which case the server will automatically produce an error response. +Secondly, it can punt to some other handler by invoking +<code>internal_redirect</code>, which is how the internal redirection +machinery discussed above is invoked. A response handler which has +internally redirected should always return <code>OK</code>. <p> + +(Invoking <code>internal_redirect</code> from handlers which are +<em>not</em> response handlers will lead to serious confusion). + +<h3><a name="auth_handlers">Special considerations for authentication handlers</a></h3> + +Stuff that should be discussed here in detail: + +<ul> + <li> Authentication-phase handlers not invoked unless auth is + configured for the directory. + <li> Common auth configuration stored in the core per-dir + configuration; it has accessors <code>auth_type</code>, + <code>auth_name</code>, and <code>requires</code>. + <li> Common routines, to handle the protocol end of things, at least + for HTTP basic authentication (<code>get_basic_auth_pw</code>, + which sets the <code>connection->user</code> structure field + automatically, and <code>note_basic_auth_failure</code>, which + arranges for the proper <code>WWW-Authenticate:</code> header + to be sent back). +</ul> + +<h3><a name="log_handlers">Special considerations for logging handlers</a></h3> + +When a request has internally redirected, there is the question of +what to log. Apache handles this by bundling the entire chain of +redirects into a list of <code>request_rec</code> structures which are +threaded through the <code>r->prev</code> and <code>r->next</code> +pointers. The <code>request_rec</code> which is passed to the logging +handlers in such cases is the one which was originally built for the +initial request from the client; note that the bytes_sent field will +only be correct in the last request in the chain (the one for which a +response was actually sent). + +<h2><a name="pools">Resource allocation and resource pools</a></h2> + +One of the problems of writing and designing a server-pool server is +that of preventing leakage, that is, allocating resources (memory, +open files, etc.), without subsequently releasing them. The resource +pool machinery is designed to make it easy to prevent this from +happening, by allowing resource to be allocated in such a way that +they are <em>automatically</em> released when the server is done with +them. <p> + +The way this works is as follows: the memory which is allocated, file +opened, etc., to deal with a particular request are tied to a +<em>resource pool</em> which is allocated for the request. The pool +is a data structure which itself tracks the resources in question. <p> + +When the request has been processed, the pool is <em>cleared</em>. At +that point, all the memory associated with it is released for reuse, +all files associated with it are closed, and any other clean-up +functions which are associated with the pool are run. When this is +over, we can be confident that all the resource tied to the pool have +been released, and that none of them have leaked. <p> + +Server restarts, and allocation of memory and resources for per-server +configuration, are handled in a similar way. There is a +<em>configuration pool</em>, which keeps track of resources which were +allocated while reading the server configuration files, and handling +the commands therein (for instance, the memory that was allocated for +per-server module configuration, log files and other files that were +opened, and so forth). When the server restarts, and has to reread +the configuration files, the configuration pool is cleared, and so the +memory and file descriptors which were taken up by reading them the +last time are made available for reuse. <p> + +It should be noted that use of the pool machinery isn't generally +obligatory, except for situations like logging handlers, where you +really need to register cleanups to make sure that the log file gets +closed when the server restarts (this is most easily done by using the +function <code><a href="#pool-files">pfopen</a></code>, which also +arranges for the underlying file descriptor to be closed before any +child processes, such as for CGI scripts, are <code>exec</code>ed), or +in case you are using the timeout machinery (which isn't yet even +documented here). However, there are two benefits to using it: +resources allocated to a pool never leak (even if you allocate a +scratch string, and just forget about it); also, for memory +allocation, <code>palloc</code> is generally faster than +<code>malloc</code>.<p> + +We begin here by describing how memory is allocated to pools, and then +discuss how other resources are tracked by the resource pool +machinery. + +<h3>Allocation of memory in pools</h3> + +Memory is allocated to pools by calling the function +<code>palloc</code>, which takes two arguments, one being a pointer to +a resource pool structure, and the other being the amount of memory to +allocate (in <code>char</code>s). Within handlers for handling +requests, the most common way of getting a resource pool structure is +by looking at the <code>pool</code> slot of the relevant +<code>request_rec</code>; hence the repeated appearance of the +following idiom in module code: + +<pre> +int my_handler(request_rec *r) +{ + struct my_structure *foo; + ... + + foo = (foo *)palloc (r->pool, sizeof(my_structure)); +} +</pre> + +Note that <em>there is no <code>pfree</code></em> --- +<code>palloc</code>ed memory is freed only when the associated +resource pool is cleared. This means that <code>palloc</code> does not +have to do as much accounting as <code>malloc()</code>; all it does in +the typical case is to round up the size, bump a pointer, and do a +range check.<p> + +(It also raises the possibility that heavy use of <code>palloc</code> +could cause a server process to grow excessively large. There are +two ways to deal with this, which are dealt with below; briefly, you +can use <code>malloc</code>, and try to be sure that all of the memory +gets explicitly <code>free</code>d, or you can allocate a sub-pool of +the main pool, allocate your memory in the sub-pool, and clear it out +periodically. The latter technique is discussed in the section on +sub-pools below, and is used in the directory-indexing code, in order +to avoid excessive storage allocation when listing directories with +thousands of files). + +<h3>Allocating initialized memory</h3> + +There are functions which allocate initialized memory, and are +frequently useful. The function <code>pcalloc</code> has the same +interface as <code>palloc</code>, but clears out the memory it +allocates before it returns it. The function <code>pstrdup</code> +takes a resource pool and a <code>char *</code> as arguments, and +allocates memory for a copy of the string the pointer points to, +returning a pointer to the copy. Finally <code>pstrcat</code> is a +varargs-style function, which takes a pointer to a resource pool, and +at least two <code>char *</code> arguments, the last of which must be +<code>NULL</code>. It allocates enough memory to fit copies of each +of the strings, as a unit; for instance: + +<pre> + pstrcat (r->pool, "foo", "/", "bar", NULL); +</pre> + +returns a pointer to 8 bytes worth of memory, initialized to +<code>"foo/bar"</code>. + +<h3>Tracking open files, etc.</h3> + +As indicated above, resource pools are also used to track other sorts +of resources besides memory. The most common are open files. The +routine which is typically used for this is <code>pfopen</code>, which +takes a resource pool and two strings as arguments; the strings are +the same as the typical arguments to <code>fopen</code>, e.g., + +<pre> + ... + FILE *f = pfopen (r->pool, r->filename, "r"); + + if (f == NULL) { ... } else { ... } +</pre> + +There is also a <code>popenf</code> routine, which parallels the +lower-level <code>open</code> system call. Both of these routines +arrange for the file to be closed when the resource pool in question +is cleared. <p> + +Unlike the case for memory, there <em>are</em> functions to close +files allocated with <code>pfopen</code>, and <code>popenf</code>, +namely <code>pfclose</code> and <code>pclosef</code>. (This is +because, on many systems, the number of files which a single process +can have open is quite limited). It is important to use these +functions to close files allocated with <code>pfopen</code> and +<code>popenf</code>, since to do otherwise could cause fatal errors on +systems such as Linux, which react badly if the same +<code>FILE*</code> is closed more than once. <p> + +(Using the <code>close</code> functions is not mandatory, since the +file will eventually be closed regardless, but you should consider it +in cases where your module is opening, or could open, a lot of files). + +<h3>Other sorts of resources --- cleanup functions</h3> + +More text goes here. Describe the the cleanup primitives in terms of +which the file stuff is implemented; also, <code>spawn_process</code>. + +<h3>Fine control --- creating and dealing with sub-pools, with a note +on sub-requests</h3> + +On rare occasions, too-free use of <code>palloc()</code> and the +associated primitives may result in undesirably profligate resource +allocation. You can deal with such a case by creating a +<em>sub-pool</em>, allocating within the sub-pool rather than the main +pool, and clearing or destroying the sub-pool, which releases the +resources which were associated with it. (This really <em>is</em> a +rare situation; the only case in which it comes up in the standard +module set is in case of listing directories, and then only with +<em>very</em> large directories. Unnecessary use of the primitives +discussed here can hair up your code quite a bit, with very little +gain). <p> + +The primitive for creating a sub-pool is <code>make_sub_pool</code>, +which takes another pool (the parent pool) as an argument. When the +main pool is cleared, the sub-pool will be destroyed. The sub-pool +may also be cleared or destroyed at any time, by calling the functions +<code>clear_pool</code> and <code>destroy_pool</code>, respectively. +(The difference is that <code>clear_pool</code> frees resources +associated with the pool, while <code>destroy_pool</code> also +deallocates the pool itself. In the former case, you can allocate new +resources within the pool, and clear it again, and so forth; in the +latter case, it is simply gone). <p> + +One final note --- sub-requests have their own resource pools, which +are sub-pools of the resource pool for the main request. The polite +way to reclaim the resources associated with a sub request which you +have allocated (using the <code>sub_req_lookup_...</code> functions) +is <code>destroy_sub_request</code>, which frees the resource pool. +Before calling this function, be sure to copy anything that you care +about which might be allocated in the sub-request's resource pool into +someplace a little less volatile (for instance, the filename in its +<code>request_rec</code> structure). <p> + +(Again, under most circumstances, you shouldn't feel obliged to call +this function; only 2K of memory or so are allocated for a typical sub +request, and it will be freed anyway when the main request pool is +cleared. It is only when you are allocating many, many sub-requests +for a single main request that you should seriously consider the +<code>destroy...</code> functions). + +<h2><a name="config">Configuration, commands and the like</a></h2> + +One of the design goals for this server was to maintain external +compatibility with the NCSA 1.3 server --- that is, to read the same +configuration files, to process all the directives therein correctly, +and in general to be a drop-in replacement for NCSA. On the other +hand, another design goal was to move as much of the server's +functionality into modules which have as little as possible to do with +the monolithic server core. The only way to reconcile these goals is +to move the handling of most commands from the central server into the +modules. <p> + +However, just giving the modules command tables is not enough to +divorce them completely from the server core. The server has to +remember the commands in order to act on them later. That involves +maintaining data which is private to the modules, and which can be +either per-server, or per-directory. Most things are per-directory, +including in particular access control and authorization information, +but also information on how to determine file types from suffixes, +which can be modified by <code>AddType</code> and +<code>DefaultType</code> directives, and so forth. In general, the +governing philosophy is that anything which <em>can</em> be made +configurable by directory should be; per-server information is +generally used in the standard set of modules for information like +<code>Alias</code>es and <code>Redirect</code>s which come into play +before the request is tied to a particular place in the underlying +file system. <p> + +Another requirement for emulating the NCSA server is being able to +handle the per-directory configuration files, generally called +<code>.htaccess</code> files, though even in the NCSA server they can +contain directives which have nothing at all to do with access +control. Accordingly, after URI -> filename translation, but before +performing any other phase, the server walks down the directory +hierarchy of the underlying filesystem, following the translated +pathname, to read any <code>.htaccess</code> files which might be +present. The information which is read in then has to be +<em>merged</em> with the applicable information from the server's own +config files (either from the <code><Directory></code> sections +in <code>access.conf</code>, or from defaults in +<code>srm.conf</code>, which actually behaves for most purposes almost +exactly like <code><Directory /></code>).<p> + +Finally, after having served a request which involved reading +<code>.htaccess</code> files, we need to discard the storage allocated +for handling them. That is solved the same way it is solved wherever +else similar problems come up, by tying those structures to the +per-transaction resource pool. <p> + +<h3><a name="per-dir">Per-directory configuration structures</a></h3> + +Let's look out how all of this plays out in <code>mod_mime.c</code>, +which defines the file typing handler which emulates the NCSA server's +behavior of determining file types from suffixes. What we'll be +looking at, here, is the code which implements the +<code>AddType</code> and <code>AddEncoding</code> commands. These +commands can appear in <code>.htaccess</code> files, so they must be +handled in the module's private per-directory data, which in fact, +consists of two separate <code>table</code>s for MIME types and +encoding information, and is declared as follows: + +<pre> +typedef struct { + table *forced_types; /* Additional AddTyped stuff */ + table *encoding_types; /* Added with AddEncoding... */ +} mime_dir_config; +</pre> + +When the server is reading a configuration file, or +<code><Directory></code> section, which includes one of the MIME +module's commands, it needs to create a <code>mime_dir_config</code> +structure, so those commands have something to act on. It does this +by invoking the function it finds in the module's `create per-dir +config slot', with two arguments: the name of the directory to which +this configuration information applies (or <code>NULL</code> for +<code>srm.conf</code>), and a pointer to a resource pool in which the +allocation should happen. <p> + +(If we are reading a <code>.htaccess</code> file, that resource pool +is the per-request resource pool for the request; otherwise it is a +resource pool which is used for configuration data, and cleared on +restarts. Either way, it is important for the structure being created +to vanish when the pool is cleared, by registering a cleanup on the +pool if necessary). <p> + +For the MIME module, the per-dir config creation function just +<code>palloc</code>s the structure above, and a creates a couple of +<code>table</code>s to fill it. That looks like this: + +<pre> +void *create_mime_dir_config (pool *p, char *dummy) +{ + mime_dir_config *new = + (mime_dir_config *) palloc (p, sizeof(mime_dir_config)); + + new->forced_types = make_table (p, 4); + new->encoding_types = make_table (p, 4); + + return new; +} +</pre> + +Now, suppose we've just read in a <code>.htaccess</code> file. We +already have the per-directory configuration structure for the next +directory up in the hierarchy. If the <code>.htaccess</code> file we +just read in didn't have any <code>AddType</code> or +<code>AddEncoding</code> commands, its per-directory config structure +for the MIME module is still valid, and we can just use it. +Otherwise, we need to merge the two structures somehow. <p> + +To do that, the server invokes the module's per-directory config merge +function, if one is present. That function takes three arguments: +the two structures being merged, and a resource pool in which to +allocate the result. For the MIME module, all that needs to be done +is overlay the tables from the new per-directory config structure with +those from the parent: + +<pre> +void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv) +{ + mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv; + mime_dir_config *subdir = (mime_dir_config *)subdirv; + mime_dir_config *new = + (mime_dir_config *)palloc (p, sizeof(mime_dir_config)); + + new->forced_types = overlay_tables (p, subdir->forced_types, + parent_dir->forced_types); + new->encoding_types = overlay_tables (p, subdir->encoding_types, + parent_dir->encoding_types); + + return new; +} +</pre> + +As a note --- if there is no per-directory merge function present, the +server will just use the subdirectory's configuration info, and ignore +the parent's. For some modules, that works just fine (e.g., for the +includes module, whose per-directory configuration information +consists solely of the state of the <code>XBITHACK</code>), and for +those modules, you can just not declare one, and leave the +corresponding structure slot in the module itself <code>NULL</code>.<p> + +<h3><a name="commands">Command handling</a></h3> + +Now that we have these structures, we need to be able to figure out +how to fill them. That involves processing the actual +<code>AddType</code> and <code>AddEncoding</code> commands. To find +commands, the server looks in the module's <code>command table</code>. +That table contains information on how many arguments the commands +take, and in what formats, where it is permitted, and so forth. That +information is sufficient to allow the server to invoke most +command-handling functions with pre-parsed arguments. Without further +ado, let's look at the <code>AddType</code> command handler, which +looks like this (the <code>AddEncoding</code> command looks basically +the same, and won't be shown here): + +<pre> +char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext) +{ + if (*ext == '.') ++ext; + table_set (m->forced_types, ext, ct); + return NULL; +} +</pre> + +This command handler is unusually simple. As you can see, it takes +four arguments, two of which are pre-parsed arguments, the third being +the per-directory configuration structure for the module in question, +and the fourth being a pointer to a <code>cmd_parms</code> structure. +That structure contains a bunch of arguments which are frequently of +use to some, but not all, commands, including a resource pool (from +which memory can be allocated, and to which cleanups should be tied), +and the (virtual) server being configured, from which the module's +per-server configuration data can be obtained if required.<p> + +Another way in which this particular command handler is unusually +simple is that there are no error conditions which it can encounter. +If there were, it could return an error message instead of +<code>NULL</code>; this causes an error to be printed out on the +server's <code>stderr</code>, followed by a quick exit, if it is in +the main config files; for a <code>.htaccess</code> file, the syntax +error is logged in the server error log (along with an indication of +where it came from), and the request is bounced with a server error +response (HTTP error status, code 500). <p> + +The MIME module's command table has entries for these commands, which +look like this: + +<pre> +command_rec mime_cmds[] = { +{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2, + "a mime type followed by a file extension" }, +{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, + "an encoding (e.g., gzip), followed by a file extension" }, +{ NULL } +}; +</pre> + +The entries in these tables are: + +<ul> + <li> The name of the command + <li> The function which handles it + <li> a <code>(void *)</code> pointer, which is passed in the + <code>cmd_parms</code> structure to the command handler --- + this is useful in case many similar commands are handled by the + same function. + <li> A bit mask indicating where the command may appear. There are + mask bits corresponding to each <code>AllowOverride</code> + option, and an additional mask bit, <code>RSRC_CONF</code>, + indicating that the command may appear in the server's own + config files, but <em>not</em> in any <code>.htaccess</code> + file. + <li> A flag indicating how many arguments the command handler wants + pre-parsed, and how they should be passed in. + <code>TAKE2</code> indicates two pre-parsed arguments. Other + options are <code>TAKE1</code>, which indicates one pre-parsed + argument, <code>FLAG</code>, which indicates that the argument + should be <code>On</code> or <code>Off</code>, and is passed in + as a boolean flag, <code>RAW_ARGS</code>, which causes the + server to give the command the raw, unparsed arguments + (everything but the command name itself). There is also + <code>ITERATE</code>, which means that the handler looks the + same as <code>TAKE1</code>, but that if multiple arguments are + present, it should be called multiple times, and finally + <code>ITERATE2</code>, which indicates that the command handler + looks like a <code>TAKE2</code>, but if more arguments are + present, then it should be called multiple times, holding the + first argument constant. + <li> Finally, we have a string which describes the arguments that + should be present. If the arguments in the actual config file + are not as required, this string will be used to help give a + more specific error message. (You can safely leave this + <code>NULL</code>). +</ul> + +Finally, having set this all up, we have to use it. This is +ultimately done in the module's handlers, specifically for its +file-typing handler, which looks more or less like this; note that the +per-directory configuration structure is extracted from the +<code>request_rec</code>'s per-directory configuration vector by using +the <code>get_module_config</code> function. + +<pre> +int find_ct(request_rec *r) +{ + int i; + char *fn = pstrdup (r->pool, r->filename); + mime_dir_config *conf = (mime_dir_config *) + get_module_config(r->per_dir_config, &mime_module); + char *type; + + if (S_ISDIR(r->finfo.st_mode)) { + r->content_type = DIR_MAGIC_TYPE; + return OK; + } + + if((i=rind(fn,'.')) < 0) return DECLINED; + ++i; + + if ((type = table_get (conf->encoding_types, &fn[i]))) + { + r->content_encoding = type; + + /* go back to previous extension to try to use it as a type */ + + fn[i-1] = '\0'; + if((i=rind(fn,'.')) < 0) return OK; + ++i; + } + + if ((type = table_get (conf->forced_types, &fn[i]))) + { + r->content_type = type; + } + + return OK; +} + +</pre> + +<h3><a name="servconf">Side notes --- per-server configuration, virtual servers, etc.</a></h3> + +The basic ideas behind per-server module configuration are basically +the same as those for per-directory configuration; there is a creation +function and a merge function, the latter being invoked where a +virtual server has partially overridden the base server configuration, +and a combined structure must be computed. (As with per-directory +configuration, the default if no merge function is specified, and a +module is configured in some virtual server, is that the base +configuration is simply ignored). <p> + +The only substantial difference is that when a command needs to +configure the per-server private module data, it needs to go to the +<code>cmd_parms</code> data to get at it. Here's an example, from the +alias module, which also indicates how a syntax error can be returned +(note that the per-directory configuration argument to the command +handler is declared as a dummy, since the module doesn't actually have +per-directory config data): + +<pre> +char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url) +{ + server_rec *s = cmd->server; + alias_server_conf *conf = (alias_server_conf *) + get_module_config(s->module_config,&alias_module); + alias_entry *new = push_array (conf->redirects); + + if (!is_url (url)) return "Redirect to non-URL"; + + new->fake = f; new->real = url; + return NULL; +} +</pre> +<!--%hypertext --> +</body></html> +<!--/%hypertext --> diff --git a/docs/manual/misc/FAQ.html b/docs/manual/misc/FAQ.html new file mode 100644 index 0000000000..b630a283f0 --- /dev/null +++ b/docs/manual/misc/FAQ.html @@ -0,0 +1,162 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> +<HTML> +<HEAD> +<TITLE>Apache server Frequently Asked Questions</TITLE> +</HEAD> + +<BODY> +<IMG SRC="../images/apache_sub.gif" ALT=""> +<H1>Apache server Frequently Asked Questions</H1> + +<H2>The Questions</H2> +<OL> +<LI><A HREF="#what">What is Apache ?</A> +<LI><A HREF="#why">Why was Apache created ?</A> +<LI><A HREF="#relate">How does the Apache group relate to other servers ?</A> +<LI><A HREF="#name">Why the name "Apache" ?</A> +<LI><A HREF="#compatible">How compatible is Apache with my existing NCSA 1.3 setup ?</A> +<LI><A HREF="#compare">OK, so how does Apache compare to other servers ?</A> +<LI><A HREF="#tested">How thoroughly tested is Apache?</A> +<LI><A HREF="#proxy">Does or will Apache act as a Proxy server?</A> +<LI><A HREF="#future">What are the future plans for Apache ?</A> +<LI><A HREF="#support">Who do I contact for support ?</A> +<LI><A HREF="#more">Is there any more information on Apache ?</A> +<LI><A HREF="#where">Where can get Apache ?</A> +</OL> + +<HR> + +<H2>The Answers</H2> +<OL> +<LI><A name="what">What is Apache ?</A> +<P> + Apache was originally based on code and ideas found in the most +popular HTTP server of the time.. NCSA httpd 1.3 (early 1995). It has +since evolved into a far superior system which can rival (and probably +surpass) almost any other UNIX based HTTP server in terms of functionality, +efficiency and speed. +<p>Since it began, it has been completely rewritten, and includes many new +features. Apache is, as of June 1996, the most popular WWW server on +the Internet, according to the <a +href="http://www.netcraft.com/Survey/">Netcraft Survey</a>. + +</P> +<HR> +<LI><A name="relate">How does the Apache group relate to other +server efforts, such as NCSA's?</A> +<P> +We, of course, owe a great debt to NCSA and their programmers for +making the server Apache was based on. We now, however, have our own +server, and our project is mostly our own. The Apache Project is an +entitely independent venture. +</P> +<HR> + +<LI><A name="why">Why was Apache created ?</A> +<P>to address concerns of a group of www providers and part time httpd +programmers, that httpd didn't behave as they wanted it +to. Apache is an entirely volunteer effort, completely funded by its +members, not by commercial sales. +</P> + +<HR> + +<LI><A name="name">Why the name "Apache" ?</A> +<P>A cute name which stuck. Apache is "<B>A PA</B>t<B>CH</B>y server". It was + based on some existing code and a series of "patch files". +</P> +<HR> + + +<LI><A name="compatible">How compatible is Apache with my existing NCSA 1.3 +setup ?</A><P> + +Apache attempts to offer all the features and configuration options +of NCSA httpd 1.3, as well as many of the additional features found in +NCSA httpd 1.4 and NCSA httpd 1.5.<P> + +NCSA httpd appears to be moving toward adding experimental features +which are not generally required at the moment. Some of the experiments +will succeed while others will inevitably be dropped. The Apache philosophy is +to add what's needed as and when it is needed.<p> + +Friendly interaction between Apache and NCSA developers should ensure +that fundamental feature enhancments stay consistent between the two +servers for the foreseeable future.<p> + +<HR> + +<LI><A name="compare">OK, so how does Apache compare to other servers ?</A> +<P> +For an independent assessment, see <A HREF="http://www.webcompare.com/server-main.html">http://www.webcompare.com/server-main.html</A> +</P> + +<P>Apache has been shown to be substantially faster than many other +free servers. Although certain commercial servers have claimed to +surpass Apache's speed (it has not been demonstrated that any of these +"benchmarks" are a good way of measuring WWW server speed at any +rate), we feel that it is better to have a mostly-fast free server +than an extremely-fast server that costs thousands of dollars. Apache +is run on sites that get millions of hits per day, and they have +experienced no performance difficulties.</p> + +<HR> +<LI><A name="tested">How thoroughly tested is Apache?</A> + +<p>Apache is run on over 100,000 Internet servers (as of July 1996). It has +been tested thoroughly by both developers and users. The Apache Group +maintains rigorous standards before releasing new versions of their +server, and our server runs without a hitch on over one third of all +WWW servers. When bugs do show up, we release patches and new +versions, as soon as they are available.</a> + +<P>See <A HREF="../info/apache_users.html">http://www.apache.org/info/apache_users.html</A> for an incomplete list of sites running Apache.</P> + +<hr> + +<LI><A name="proxy">Does or will Apache act as a Proxy server? +<p>Apache version 1.1 +and above will come with a proxy module. If compiled in, this will make +Apache act as a caching-proxy server +<p> +<HR> + +<LI><A name="future">What are the future plans for Apache ?</A> +<P><UL> +<LI>to continue as a public domain HTTP server, +<LI>to keep up with advances in HTTP protocol and web developments in general +<LI>to collect suggestions for fixes/improvements from its users, +<LI>to respond to needs of large volume providers as well as occasional users. +</UL> +</P><HR> + +<LI><A name="support">Who do I contact for support ?</A> +<P>There is no official support for Apache. None of the developers want to +be swamped by a flood of trivial questions that can be resolved elsewhere. +Bug reports and suggestions should be sent via <A HREF="http://www.apache.org/bug_report.html">the bug report page.</A> +Other questions should be directed to +<A HREF="news:comp.infosystems.www.servers.unix">comp.infosystems.www.servers.unix</A>, where some of the Apache team lurk, +in the company of many other httpd gurus who should be able +to help. +<p> +Commercial support for Apache is, however, available from a number +third parties. +</p> +<HR> + +<LI><A name="more">Is there any more information on Apache ?</A> +<P>Indeed there is. See <A HREF="http://www.apache.org/">http://www.apache.org/</A>. +</P> +<HR> + +<LI><A name="where">Where can get Apache ?</A> +<P> +You can find the source for Apache at <A HREF="http://www.apache.org/">http://www.apache.org/</A>. +</P> +<HR> +</OL> + +<A HREF="../"><IMG SRC="../images/apache_home.gif" ALT="Home"></A> +<A HREF="./"><IMG SRC="../images/apache_index.gif" ALT="Index"></A> +</BODY> +</HTML> diff --git a/docs/manual/misc/client_block_api.html b/docs/manual/misc/client_block_api.html new file mode 100644 index 0000000000..c70ee37a66 --- /dev/null +++ b/docs/manual/misc/client_block_api.html @@ -0,0 +1,70 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> +<HTML> +<HEAD> +<TITLE>Reading Client Input in Apache 1.2</TITLE> +</HEAD> + +<BODY> +<IMG SRC="../images/apache_sub.gif" ALT=""> +<H1>Reading Client Input in Apache 1.2</h1> + +<hr> + +<p>Apache 1.1 and earlier let modules handle POST and PUT requests by +themselves. The module would, on its own, determine whether the +request had an entity, how many bytes it was, and then called a +function (<code>read_client_block</code>) to get the data. + +<p>However, HTTP/1.1 requires several things of POST and PUT request +handlers that did not fit into this module, and all existing modules +have to be rewritten. The API calls for handling this have been +furthur abstracted, so that future HTTP protocol changes can be +accomplished while remaining backwards-compatible.</p> + +<hr> + +<h3>The New API Functions</h3> + +<pre> + int setup_client_block (request_rec *); + int should_client_block (request_rec *); + long get_client_block (request_rec *, char *buffer, int buffer_size); +</pre> + +<ol> +<li>Call <code>setup_client_block()</code> near the beginning of the request + handler. This will set up all the neccessary properties, and + will return either OK, or an error code. If the latter, + the module should return that error code. + +<li>When you are ready to possibly accept input, call + <code>should_client_block()</code>. + This will tell the module whether or not to read input. If it is 0, + the module should assume that the input is of a non-entity type + (e.g. a GET request). A nonzero response indicates that the module + should proceed (to step 3). + This step also sends a 100 Continue response + to HTTP/1.1 clients, so should not be called until the module + is *defenitely* ready to read content. (otherwise, the point of the + 100 response is defeated). Never call this function more than once. + +<li>Finally, call <code>get_client_block</code> in a loop. Pass it a + buffer and its + size. It will put data into the buffer (not neccessarily the full + buffer, in the case of chunked inputs), and return the length of + the input block. When it is done reading, it will return 0, and + the module should proceed. + +</ol> + +<p>As an example, please look at the code in +<code>mod_cgi.c</code>. This is properly written to the new API +guidelines.</p> + +<hr> + +<A HREF="../"><IMG SRC="../images/apache_home.gif" ALT="Home"></A> +<A HREF="./"><IMG SRC="../images/apache_index.gif" ALT="Index"></A> + +</BODY> +</HTML> diff --git a/docs/manual/misc/compat_notes.html b/docs/manual/misc/compat_notes.html new file mode 100644 index 0000000000..efa641f8b7 --- /dev/null +++ b/docs/manual/misc/compat_notes.html @@ -0,0 +1,108 @@ +<HTML><HEAD> +<TITLE>Apache HTTP Server: Compatibility Notes with NCSA's Server</TITLE> +</HEAD> +<BODY> +<IMG SRC="../images/apache_sub.gif" ALT=""> +<H3>Compatibility Notes with NCSA's Server</H3> + +<HR> + +While Apache 0.8.x and beyond are for the most part a drop-in +replacement for NCSA's httpd and earlier versions of Apache, there are +a couple gotcha's to watch out for. These are mostly due to the fact +that the parser for config and access control files was rewritten from +scratch, so certain liberties the earlier servers took may not be +available here. These are all easily fixable. If you know of other +non-fatal problems that belong here, <a +href="mailto:apache-bugs@apache.org">let us know.</a> + +<P>Please also check the <A HREF="known_bugs.html">known bugs</A> page. + + + +<OL> + +<LI><CODE>AddType</CODE> only accepts one file extension per line, without +any dots (<code>.</code>) in the extension, and does not take full filenames. +If you need multiple extensions per type, use multiple lines, e.g. +<blockquote><code> +AddType application/foo foo<br> +AddType application/foo bar +</code></blockquote> +To map <code>.foo</code> and <code>.bar</code> to <code>application/foo</code> +<p> + + + + <LI><P>If you follow the NCSA guidelines for setting up access restrictions + based on client domain, you may well have added entries for, + <CODE>AuthType, AuthName, AuthUserFile</CODE> or <CODE>AuthGroupFile</CODE>. + <B>None</B> of these are needed (or appropriate) for restricting access + based on client domain. + + <P>When Apache sees <CODE>AuthType</CODE> it (reasonably) assumes you + are using some authorization type based on username and password. + + <P>Please remove <CODE>AuthType</CODE>, it's unnecessary even for NCSA. + + <P> + + <LI><CODE>AuthUserFile</CODE> requires a full pathname. In earlier + versions of NCSA httpd and Apache, you could use a filename + relative to the .htaccess file. This could be a major security hole, + as it made it trivially easy to make a ".htpass" file in the a + directory easily accessable by the world. We recommend you store + your passwords outside your document tree. + + <P> + + <LI><CODE>OldScriptAlias</CODE> is no longer supported. + + <P> + + <LI><CODE>exec cgi=""</CODE> produces reasonable <B>malformed header</B> + responses when used to invoke non-CGI scripts.<BR> + The NCSA code ignores the missing header. (bad idea)<BR> + Solution: write CGI to the CGI spec or use <CODE>exec cmd=""</CODE> instead. + <P>We might add <CODE>virtual</CODE> support to <CODE>exec cmd</CODE> to + make up for this difference. + + <P> + + <LI><Limit> sillyness - in the old Apache 0.6.5, a + directive of <Limit GET> would also restrict POST methods - Apache 0.8.8's new + core is correct in not presuming a limit on a GET is the same limit on a POST, + so if you are relying on that behavior you need to change your access configurations + to reflect that. + + <P> + + <LI>Icons for FancyIndexing broken - well, no, they're not broken, we've just upgraded the + icons from flat .xbm files to pretty and much smaller .gif files, courtesy of +<a href="mailto:kevinh@eit.com">Kevin Hughes</a> at +<a href="http://www.eit.com">EIT</a>. + If you are using the same srm.conf from an old distribution, make sure you add the new + AddIcon, AddIconByType, and DefaultIcon commands. + + <P> + + <LI>Under IRIX, the "Group" directive in httpd.conf needs to be a valid group name + (i.e. "nogroup") not the numeric group ID. The distribution httpd.conf, and earlier + ones, had the default Group be "#-1", which was causing silent exits at startup.<p> + +<li><code>.asis</code> files: Apache 0.6.5 did not require a Status header; +it added one automatically if the .asis file contained a Location header. +0.8.14 requires a Status header. <p> + +</OL> + +More to come when we notice them.... + + +<hr> + +<A HREF="../"><IMG SRC="../images/apache_home.gif" ALT="Home"></A> +<A HREF="./"><IMG SRC="../images/apache_index.gif" ALT="Index"></A> + +</BODY> +</HTML> diff --git a/docs/manual/misc/security_tips.html b/docs/manual/misc/security_tips.html new file mode 100644 index 0000000000..a805d8cbed --- /dev/null +++ b/docs/manual/misc/security_tips.html @@ -0,0 +1,92 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> +<HTML> +<HEAD> +<TITLE>Apache HTTP Server Documentation</TITLE> +</HEAD> + +<BODY> +<IMG SRC="../images/apache_sub.gif" ALT=""> +<H1>Security tips for server configuration</H1> + +<hr> + +<P>Some hints and tips on security issues in setting up a web server. Some of +the suggestions will be general, other, specific to Apache + +<HR> + +<H2>Server Side Includes</H2> +<P>Server side includes (SSI) can be configured so that users can execute +arbitrary programs on the server. That thought alone should send a shiver +down the spine of any sys-admin.<p> + +One solution is to disable that part of SSI. To do that you use the +IncludesNOEXEC option to the <A HREF="core.html#options">Options</A> +directive.<p> + +<HR> + +<H2>Non Script Aliased CGI</H2> +<P>Allowing users to execute <B>CGI</B> scripts in any directory should only +be considered if; +<OL> + <LI>You trust your users not to write scripts which will deliberately or +accidentally expose your system to an attack. + <LI>You consider security at your site to be so feeble in other areas, as to +make one more potential hole irrelevant. + <LI>You have no users, and nobody ever visits your server. +</OL><p> +<HR> + +<H2>Script Alias'ed CGI</H2> +<P>Limiting <B>CGI</B> to special directories gives the admin control over +what goes into those directories. This is inevitably more secure than +non script aliased CGI, but <strong>only if users with write access to the +directories are trusted</strong> or the admin is willing to test each new CGI +script/program for potential security holes.<P> + +Most sites choose this option over the non script aliased CGI approach.<p> + +<HR> +<H2>CGI in general</H2> +<P>Always remember that you must trust the writers of the CGI script/programs +or your ability to spot potential security holes in CGI, whether they were +deliberate or accidental.<p> + +All the CGI scripts will run as the same user, so they have potential to +conflict (accidentally or deliberately) with other scripts e.g. User A hates +User B, so he writes a script to trash User B's CGI database.<P> + +<HR> + +Please send any other useful security tips to +<A HREF="mailto:apache-bugs@mail.apache.org">apache-bugs@mail.apache.org</A> +<p> +<HR> + +<H2>Stopping users overriding system wide settings...</H2> +<P>To run a really tight ship, you'll want to stop users from setting +up <CODE>.htaccess</CODE> files which can override security features +you've configured. Here's one way to do it...<p> + +In the server configuration file, put +<blockquote><code> +<Directory> <br> +AllowOverride None <br> +Options None <br> +<Limit GET PUT POST> <br> +allow from all <br> +</Limit> <br> +</Directory> <br> +</code></blockquote> + +Then setup for specific directories<P> + +This stops all overrides, Includes and accesses in all directories apart +from those named.<p><hr> + +<A HREF="../"><IMG SRC="../images/apache_home.gif" ALT="Home"></A> +<A HREF="./"><IMG SRC="../images/apache_index.gif" ALT="Index"></A> + +</BODY> +</HTML> diff --git a/docs/manual/platform/perf-bsd44.html b/docs/manual/platform/perf-bsd44.html new file mode 100644 index 0000000000..1f3a6010c8 --- /dev/null +++ b/docs/manual/platform/perf-bsd44.html @@ -0,0 +1,215 @@ +<html> +<head> +<title>Running a High-Performance Web Server for BSD</title> +</head> + +<body> +<A NAME="initial"> +<IMG SRC="../images/apache_sub.gif" ALT=""> +</A> +<H2>Running a High-Performance Web Server for BSD</H2> + +Like other OS's, the listen queue is often the <b>first limit hit</b>. The +following are comments from "Aaron Gifford <agifford@InfoWest.COM>" +on how to fix this on BSDI 1.x, 2.x, and FreeBSD 2.0 (and earlier): + +<p> + +Edit the following two files: +<blockquote><code> /usr/include/sys/socket.h <br> + /usr/src/sys/sys/socket.h </code></blockquote> +In each file, look for the following: +<pre> + /* + * Maximum queue length specifiable by listen. + */ + #define SOMAXCONN 5 +</pre> + +Just change the "5" to whatever appears to work. I bumped the two +machines I was having problems with up to 32 and haven't noticed the +problem since. + +<p> + +After the edit, recompile the kernel and recompile the Apache server +then reboot. + +<P> + +FreeBSD 2.1 seems to be perfectly happy, with SOMAXCONN +set to 32 already. + +<p> + +<A NAME="detail"> +<b>Addendum for <i>very</i> heavily loaded BSD servers</b><br> +</A> +from Chuck Murcko <chuck@telebase.com> + +<p> + +If you're running a really busy BSD Apache server, the following are useful +things to do if the system is acting sluggish:<p> + +<ul> + +<li> Run vmstat to check memory usage, page/swap rates, etc. + +<li> Run netstat -m to check mbuf usage + +<li> Run fstat to check file descriptor usage + +</ul> + +These utilities give you an idea what you'll need to tune in your kernel, +and whether it'll help to buy more RAM. + +Here are some BSD kernel config parameters (actually BSDI, but pertinent to +FreeBSD and other 4.4-lite derivatives) from a system getting heavy usage. +The tools mentioned above were used, and the system memory was increased to +48 MB before these tuneups. Other system parameters remained unchanged. + +<p> + +<pre> +maxusers 256 +</pre> + +Maxusers drives a <i>lot</i> of other kernel parameters: + +<ul> + +<li> Maximum # of processes + +<li> Maximum # of processes per user + +<li> System wide open files limit + +<li> Per-process open files limit + +<li> Maximum # of mbuf clusters + +<li> Proc/pgrp hash table size + +</ul> + +The actual formulae for these derived parameters are in +<i>/usr/src/sys/conf/param.c</i>. +These calculated parameters can also be overridden (in part) by specifying +your own values in the kernel configuration file: + +<pre> +# Network options. NMBCLUSTERS defines the number of mbuf clusters and +# defaults to 256. This machine is a server that handles lots of traffic, +# so we crank that value. +options SOMAXCONN=256 # max pending connects +options NMBCLUSTERS=4096 # mbuf clusters at 4096 + +# +# Misc. options +# +options CHILD_MAX=512 # maximum number of child processes +options OPEN_MAX=512 # maximum fds (breaks RPC svcs) +</pre> + +SOMAXCONN is not derived from maxusers, so you'll always need to increase +that yourself. We used a value guaranteed to be larger than Apache's +default for the listen() of 128, currently. + +<p> + +In many cases, NMBCLUSTERS must be set much larger than would appear +necessary at first glance. The reason for this is that if the browser +disconnects in mid-transfer, the socket fd associated with that particular +connection ends up in the TIME_WAIT state for several minutes, during +which time its mbufs are not yet freed. + +<p> + +Some more info on mbuf clusters (from sys/mbuf.h): +<pre> +/* + * Mbufs are of a single size, MSIZE (machine/machparam.h), which + * includes overhead. An mbuf may add a single "mbuf cluster" of size + * MCLBYTES (also in machine/machparam.h), which has no additional overhead + * and is used instead of the internal data area; this is done when + * at least MINCLSIZE of data must be stored. + */ +</pre> + +<p> + +CHILD_MAX and OPEN_MAX are set to allow up to 512 child processes (different +than the maximum value for processes per user ID) and file descriptors. +These values may change for your particular configuration (a higher OPEN_MAX +value if you've got modules or CGI scripts opening lots of connections or +files). If you've got a lot of other activity besides httpd on the same +machine, you'll have to set NPROC higher still. In this example, the NPROC +value derived from maxusers proved sufficient for our load. + +<p> + +<b>Caveats</b> + +<p> + +Be aware that your system may not boot with a kernel that is configured +to use more resources than you have available system RAM. <b>ALWAYS</b> +have a known bootable kernel available when tuning your system this way, +and use the system tools beforehand to learn if you need to buy more +memory before tuning. + +<p> + +RPC services will fail when the value of OPEN_MAX is larger than 256. +This is a function of the original implementations of the RPC library, +which used a byte value for holding file descriptors. BSDI has partially +addressed this limit in its 2.1 release, but a real fix may well await +the redesign of RPC itself. + +<p> + +Finally, there's the hard limit of child processes configured in Apache. + +<p> + +For versions of Apache later than 1.0.5 you'll need to change the +definition for <b>HARD_SERVER_LIMIT</b> in <i>httpd.h</i> and recompile +if you need to run more than the default 150 instances of httpd. + +<p> + +From conf/httpd.conf-dist: + +<pre> +# Limit on total number of servers running, i.e., limit on the number +# of clients who can simultaneously connect --- if this limit is ever +# reached, clients will be LOCKED OUT, so it should NOT BE SET TOO LOW. +# It is intended mainly as a brake to keep a runaway server from taking +# Unix with it as it spirals down... + +MaxClients 150 +</pre> + +Know what you're doing if you bump this value up, and make sure you've +done your system monitoring, RAM expansion, and kernel tuning beforehand. +Then you're ready to service some serious hits! + +<p> + +Thanks to <i>Tony Sanders</i> and <i>Chris Torek</i> at BSDI for their +helpful suggestions and information. + +<P><HR> + +<H3>More welcome!</H3> + +If you have tips to contribute, send mail to <a +href="mailto:brian@organic.com">brian@organic.com</a> + +<P><HR><P> +<A HREF="/"><IMG SRC="../images/apache_home.gif" ALT="Home"></A> +<A HREF="."><IMG SRC="../images/apache_index.gif" ALT="Index"></A> +</body></html> + diff --git a/docs/manual/platform/perf-dec.html b/docs/manual/platform/perf-dec.html new file mode 100644 index 0000000000..cd027bfc60 --- /dev/null +++ b/docs/manual/platform/perf-dec.html @@ -0,0 +1,267 @@ +<HEAD> +<TITLE>Performance Tuning Tips for Digital Unix</TITLE> +</HEAD> +<BODY> +<H1>Performance Tuning Tips for Digital Unix</H1> + +Below is a set of newsgroup posts made by an engineer from DEC in +response to queries about how to modify DEC's Digital Unix OS for more +heavily loaded web sites. Copied with permission. + +<HR> + +<H2>Update</H2> +From: Jeffrey Mogul <mogul@pa.dec.com><BR> +Date: Fri, 28 Jun 96 16:07:56 MDT<BR> + +<OL> +<LI> The advice given in the README file regarding the + "tcbhashsize" variable is incorrect. The largest value + this should be set to is 1024. Setting it any higher + will have the perverse result of disabling the hashing + mechanism. + +<LI>Patch ID OSF350-146 has been superseded by +<blockquote> + Patch ID OSF350-195 for V3.2C<BR> + Patch ID OSF360-350195 for V3.2D +</blockquote> + Patch IDs for V3.2E and V3.2F should be available soon. + There is no known reason why the Patch ID OSF360-350195 + won't work on these releases, but such use is not officially + supported by Digital. This patch kit will not be needed for + V3.2G when it is released. +</UL> + +<HR> + + +<PRE> +From mogul@pa.dec.com (Jeffrey Mogul) +Organization DEC Western Research +Date 30 May 1996 00:50:25 GMT +Newsgroups <A HREF="news:comp.unix.osf.osf1">comp.unix.osf.osf1</A> +Message-ID <A HREF="news:4oirch$bc8@usenet.pa.dec.com"><4oirch$bc8@usenet.pa.dec.com></A> +Subject Re: Web Site Performance +References 1 + + + +In article <skoogDs54BH.9pF@netcom.com> skoog@netcom.com (Jim Skoog) writes: +>Where are the performance bottlenecks for Alpha AXP running the +>Netscape Commerce Server 1.12 with high volume internet traffic? +>We are evaluating network performance for a variety of Alpha AXP +>runing DEC UNIX 3.2C, which run DEC's seal firewall and behind +>that Alpha 1000 and 2100 webservers. + +Our experience (running such Web servers as <A HREF="http://altavista.digital.com">altavista.digital.com</A> +and <A HREF="http://www.digital.com">www.digital.com</A>) is that there is one important kernel tuning +knob to adjust in order to get good performance on V3.2C. You +need to patch the kernel global variable "somaxconn" (use dbx -k +to do this) from its default value of 8 to something much larger. + +How much larger? Well, no larger than 32767 (decimal). And +probably no less than about 2048, if you have a really high volume +(millions of hits per day), like AltaVista does. + +This change allows the system to maintain more than 8 TCP +connections in the SYN_RCVD state for the HTTP server. (You +can use "netstat -An |grep SYN_RCVD" to see how many such +connections exist at any given instant). + +If you don't make this change, you might find that as the load gets +high, some connection attempts take a very long time. And if a lot +of your clients disconnect from the Internet during the process of +TCP connection establishment (this happens a lot with dialup +users), these "embryonic" connections might tie up your somaxconn +quota of SYN_RCVD-state connections. Until the kernel times out +these embryonic connections, no other connections will be accepted, +and it will appear as if the server has died. + +The default value for somaxconn in Digital UNIX V4.0 will be quite +a bit larger than it has been in previous versions (we inherited +this default from 4.3BSD). + +Digital UNIX V4.0 includes some other performance-related changes +that significantly improve its maximum HTTP connection rate. However, +we've been using V3.2C systems to front-end for altavista.digital.com +with no obvious performance bottlenecks at the millions-of-hits-per-day +level. + +We have some Webstone performance results available at + <A HREF="http://www.digital.com/info/alphaserver/news/webff.html">http://www.digital.com/info/alphaserver/news/webff.html</A> +I'm not sure if these were done using V4.0 or an earlier version +of Digital UNIX, although I suspect they were done using a test +version of V4.0. + +-Jeff + +<HR> + +---------------------------------------------------------------------------- + +From mogul@pa.dec.com (Jeffrey Mogul) +Organization DEC Western Research +Date 31 May 1996 21:01:01 GMT +Newsgroups <A HREF="news:comp.unix.osf.osf1">comp.unix.osf.osf1</A> +Message-ID <A HREF="news:4onmmd$mmd@usenet.pa.dec.com"><4onmmd$mmd@usenet.pa.dec.com></A> +Subject Digital UNIX V3.2C Internet tuning patch info + +---------------------------------------------------------------------------- + +Something that probably few people are aware of is that Digital +has a patch kit available for Digital UNIX V3.2C that may improve +Internet performance, especially for busy web servers. + +This patch kit is one way to increase the value of somaxconn, +which I discussed in a message here a day or two ago. + +I've included in this message the revised README file for this +patch kit below. Note that the original README file in the patch +kit itself may be an earlier version; I'm told that the version +below is the right one. + +Sorry, this patch kit is NOT available for other versions of Digital +UNIX. Most (but not quite all) of these changes also made it into V4.0, +so the description of the various tuning parameters in this README +file might be useful to people running V4.0 systems. + +This patch kit does not appear to be available (yet?) from + <A HREF="http://www.service.digital.com/html/patch_service.html">http://www.service.digital.com/html/patch_service.html</A> +so I guess you'll have to call Digital's Customer Support to get it. + +-Jeff + +DESCRIPTION: Digital UNIX Network tuning patch + + Patch ID: OSF350-146 + + SUPERSEDED PATCHES: OSF350-151, OSF350-158 + + This set of files improves the performance of the network + subsystem on a system being used as a web server. There are + additional tunable parameters included here, to be used + cautiously by an informed system administrator. + +TUNING + + To tune the web server, the number of simultaneous socket + connection requests are limited by: + + somaxconn Sets the maximum number of pending requests + allowed to wait on a listening socket. The + default value in Digital UNIX V3.2 is 8. + This patch kit increases the default to 1024, + which matches the value in Digital UNIX V4.0. + + sominconn Sets the minimum number of pending connections + allowed on a listening socket. When a user + process calls listen with a backlog less + than sominconn, the backlog will be set to + sominconn. sominconn overrides somaxconn. + The default value is 1. + + The effectiveness of tuning these parameters can be monitored by + the sobacklog variables available in the kernel: + + sobacklog_hiwat Tracks the maximum pending requests to any + socket. The initial value is 0. + + sobacklog_drops Tracks the number of drops exceeding the + socket set backlog limit. The initial + value is 0. + + somaxconn_drops Tracks the number of drops exceeding the + somaxconn limit. When sominconn is larger + than somaxconn, tracks the number of drops + exceeding sominconn. The initial value is 0. + + TCP timer parameters also affect performance. Tuning the following + require some knowledge of the characteristics of the network. + + tcp_msl Sets the tcp maximum segment lifetime. + This is the maximum lifetime in half + seconds that a packet can be in transit + on the network. This value, when doubled, + is the length of time a connection remains + in the TIME_WAIT state after a incoming + close request is processed. The unit is + specified in 1/2 seconds, the initial + value is 60. + + tcp_rexmit_interval_min + Sets the minimum TCP retransmit interval. + For some WAN networks the default value may + be too short, causing unnecessary duplicate + packets to be sent. The unit is specified + in 1/2 seconds, the initial value is 1. + + tcp_keepinit This is the amount of time a partially + established connection will sit on the listen + queue before timing out (e.g. if a client + sends a SYN but never answers our SYN/ACK). + Partially established connections tie up slots + on the listen queue. If the queue starts to + fill with connections in SYN_RCVD state, + tcp_keepinit can be decreased to make those + partial connects time out sooner. This should + be used with caution, since there might be + legitimate clients that are taking a while + to respond to SYN/ACK. The unit is specified + in 1/2 seconds, the default value is 150 + (ie. 75 seconds). + + The hashlist size for the TCP inpcb lookup table is regulated by: + + tcbhashsize The number of hash buckets used for the + TCP connection table used in the kernel. + The initial value is 32. For best results, + should be specified as a power of 2. For + busy Web servers, set this to 2048 or more. + + The hashlist size for the interface alias table is regulated by: + + inifaddr_hsize The number of hash buckets used for the + interface alias table used in the kernel. + The initial value is 32. For best results, + should be specified as a power of 2. + + ipport_userreserved The maximum number of concurrent non-reserved, + dynamically allocated ports. Default range + is 1025-5000. The maximum value is 65535. + This limits the numer of times you can + simultaneously telnet or ftp out to connect + to other systems. + + tcpnodelack Don't delay acknowledging TCP data; this + can sometimes improve performance of locally + run CAD packages. Default is value is 0, + the enabled value is 1. + + Digital UNIX version: + + V3.2C +Feature V3.2C patch V4.0 + ======= ===== ===== ==== +somaxconn X X X +sominconn - X X +sobacklog_hiwat - X - +sobacklog_drops - X - +somaxconn_drops - X - +tcpnodelack X X X +tcp_keepidle X X X +tcp_keepintvl X X X +tcp_keepcnt - X X +tcp_keepinit - X X +TCP keepalive per-socket - - X +tcp_msl - X - +tcp_rexmit_interval_min - X - +TCP inpcb hashing - X X +tcbhashsize - X X +interface alias hashing - X X +inifaddr_hsize - X X +ipport_userreserved - X - +sysconfig -q inet - - X +sysconfig -q socket - - X + +</PRE> diff --git a/docs/manual/platform/perf.html b/docs/manual/platform/perf.html new file mode 100644 index 0000000000..d2a88e23b3 --- /dev/null +++ b/docs/manual/platform/perf.html @@ -0,0 +1,134 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> +<html> +<head> +<title>Hints on Running a High-Performance Web Server</title> +</head> + +<body> +<IMG SRC="../images/apache_sub.gif" ALT=""> +<h2>Hints on Running a High-Performance Web Server</H2> + +Running Apache on a heavily loaded web server, one often encounters +problems related to the machine and OS configuration. "Heavy" is +relative, of course - but if you are seeing more than a couple hits +per second on a sustained basis you should consult the pointers on +this page. In general the suggestions involve how to tune your kernel +for the heavier TCP load, hardware/software conflicts that arise, etc. + +<UL> +<LI><A HREF="#AUX">A/UX (Apple's UNIX)</A> +<LI><A HREF="#BSD">BSD-based (BSDI, FreeBSD, etc)</A> +<LI><A HREF="#DEC">Digital UNIX</A> +<LI><A HREF="#HP">Hewlett-Packard</A> +<LI><A HREF="#Linux">Linux</A> +<LI><A HREF="#SGI">SGI</A> +<LI><A HREF="#Solaris">Solaris</A> +<LI><A HREF="#SunOS">SunOS 4.x</A> +</UL> + +<HR> + +<A NAME="AUX"> +<H3>A/UX (Apple's UNIX)</H3> +</A> + +If you are running Apache on A/UX, a page that gives some helpful +performance hints (concerning the <I>listen()</I> queue and using +virtual hosts) +<A HREF="http://www.jaguNET.com/apache.html">can be found here</A> + +<P><HR> + +<A NAME="BSD"> +<H3>BSD-based (BSDI, FreeBSD, etc)</H3> +</A> + +<A HREF="perf-bsd44.html#initial">Quick</A> and +<A HREF="perf-bsd44.html#detail">detailed</A> +performance tuning hints for BSD-derived systems. + +<P><HR> + +<A NAME="DEC"> +<H3>Digital UNIX</H3> +</A> + +We have some <A HREF="perf-dec.html">newsgroup postings</A> on how to +tune Digital UNIX 3.2 and 4.0. + +<P><HR> + +<A NAME="HP"> +<H3>Hewlett-Packard</H3> +</A> + +Some documentation on tuning HP machines can be found at <A +HREF="http://www.software.hp.com/internet/perf/tuning.html">http://www.software.hp.com/internet/perf/tuning.html</A>. + +<P><HR> + +<A NAME="Linux"> +<H3>Linux</H3> +</A> + +The most common problem on Linux shows up on heavily-loaded systems +where the whole server will appear to freeze for a couple of minutes +at a time, and then come back to life. This has been traced to a +listen() queue overload - certain Linux implementations have a low +value set for the incoming connection queue which can cause problems. +Please see our <a +href="http://www.qosina.com/~awm/apache/linux-tcp.html">Using Apache on +Linux</a> page for more info on how to fix this. + +<P><HR> + +<A NAME="SGI"> +<H3>SGI</H3> + +<UL> +<LI><A HREF="http://www.sgi.com/Products/WebFORCE/TuningGuide.html"> +WebFORCE Web Server Tuning Guidelines for IRIX 5.3, +<http://www.sgi.com/Products/WebFORCE/TuningGuide.html></A> +</UL> + +<P><HR> + +<A NAME="Solaris"> +<H3>Solaris 2.4</H3> +</A> + +The Solaris 2.4 TCP implementation has a few inherent limitations that +only became apparent under heavy loads. This has been fixed to some +extent in 2.5 (and completely revamped in 2.6), but for now consult +the following URL for tips on how to expand the capabilities if you +are finding slowdowns and lags are hurting performance. + +<UL> + +<LI><A href="http://www.sun.com/cgi-bin/show?sun-on-net/Sun.Internet.Solutions/performance/"> +World Wide Web Server Performance, +<http://www.sun.com/cgi-bin/show?sun-on-net/Sun.Internet.Solutions/performance/></a> +</UL> + +<P><HR> + +<A NAME="SunOS"> +<H3>SunOS 4.x</H3> +</A> + +More information on tuning SOMAXCONN on SunOS can be found at +<A HREF="http://www.islandnet.com/~mark/somaxconn.html"> +http://www.islandnet.com/~mark/somaxconn.html</A>. + +<P><HR> + +<H3>More welcome!</H3> + +If you have tips to contribute, send mail to <a +href="mailto:brian@organic.com">brian@organic.com</a> + +<P><HR><P> +<A HREF="/"><IMG SRC="../images/apache_home.gif" ALT="Home"></A> +<A HREF="."><IMG SRC="../images/apache_index.gif" ALT="Index"></A> +</body></html> + |