From 65692a13c06e07a4b5293993e8f865d041000909 Mon Sep 17 00:00:00 2001
From: brian
+
+A few notes on general pedagogical style here. In the interest of
+conciseness, all structure declarations here are incomplete --- the
+real ones have more slots that I'm not telling you about. For the
+most part, these are reserved to one component of the server core or
+another, and should be altered by modules with caution. However, in
+some cases, they really are things I just haven't gotten around to
+yet. Welcome to the bleeding edge.
+
+Finally, here's an outline, to give you some bare idea of what's
+coming up, and in what order:
+
+
+
+The handlers themselves are functions of one argument (a
+
+
+
+
+Let's begin with handlers. In order to handle the CGI scripts, the
+module declares a response handler for them. Because of
+
+
+The module needs to maintain some per (virtual)
+server information, namely, the
+
+Finally, this module contains code to handle the
+
+
+A final note on the declared types of the arguments of some of these
+commands: a
+
+
+
+The most important such information is a small set of character
+strings describing attributes of the object being requested, including
+its URI, filename, content-type and content-encoding (these being filled
+in by the translation and type-check handlers which handle the
+request, respectively).
+
+Other commonly used data items are tables giving the MIME headers on
+the client's original request, MIME headers to be sent back with the
+response (which modules can add to at will), and environment variables
+for any subprocesses which are spawned off in the course of servicing
+the request. These tables are manipulated using the
+
+
+Finally, there are pointers to two data structures which, in turn,
+point to per-module configuration structures. Specifically, these
+hold pointers to the data structures which the module has built to
+describe the way it has been configured to operate in a given
+directory (via
+
+Here is an abridged declaration, giving the fields most commonly used:
+
+
+
+
+
+
+
+ Such handlers can construct a sub-request, using the
+ functions
+
+ (Server-side includes work by building sub-requests and then
+ actually invoking the response handler for them, via the
+ function
+
+
+
+They should begin by sending an HTTP response header, using the
+function
+
+Otherwise, they should produce a request body which responds to the
+client as appropriate. The primitives for this are
+
+At this point, you should more or less understand the following piece
+of code, which is the handler which handles
+
+
+
+(Invoking
+
+The way this works is as follows: the memory which is allocated, file
+opened, etc., to deal with a particular request are tied to a
+resource pool which is allocated for the request. The pool
+is a data structure which itself tracks the resources in question.
+
+When the request has been processed, the pool is cleared. At
+that point, all the memory associated with it is released for reuse,
+all files associated with it are closed, and any other clean-up
+functions which are associated with the pool are run. When this is
+over, we can be confident that all the resource tied to the pool have
+been released, and that none of them have leaked.
+
+Server restarts, and allocation of memory and resources for per-server
+configuration, are handled in a similar way. There is a
+configuration pool, which keeps track of resources which were
+allocated while reading the server configuration files, and handling
+the commands therein (for instance, the memory that was allocated for
+per-server module configuration, log files and other files that were
+opened, and so forth). When the server restarts, and has to reread
+the configuration files, the configuration pool is cleared, and so the
+memory and file descriptors which were taken up by reading them the
+last time are made available for reuse.
+
+It should be noted that use of the pool machinery isn't generally
+obligatory, except for situations like logging handlers, where you
+really need to register cleanups to make sure that the log file gets
+closed when the server restarts (this is most easily done by using the
+function
+
+We begin here by describing how memory is allocated to pools, and then
+discuss how other resources are tracked by the resource pool
+machinery.
+
+
+
+(It also raises the possibility that heavy use of
+
+Unlike the case for memory, there are functions to close
+files allocated with
+
+(Using the
+
+The primitive for creating a sub-pool is
+
+One final note --- sub-requests have their own resource pools, which
+are sub-pools of the resource pool for the main request. The polite
+way to reclaim the resources associated with a sub request which you
+have allocated (using the
+
+(Again, under most circumstances, you shouldn't feel obliged to call
+this function; only 2K of memory or so are allocated for a typical sub
+request, and it will be freed anyway when the main request pool is
+cleared. It is only when you are allocating many, many sub-requests
+for a single main request that you should seriously consider the
+
+
+However, just giving the modules command tables is not enough to
+divorce them completely from the server core. The server has to
+remember the commands in order to act on them later. That involves
+maintaining data which is private to the modules, and which can be
+either per-server, or per-directory. Most things are per-directory,
+including in particular access control and authorization information,
+but also information on how to determine file types from suffixes,
+which can be modified by
+
+Another requirement for emulating the NCSA server is being able to
+handle the per-directory configuration files, generally called
+
+
+Finally, after having served a request which involved reading
+
+
+
+
+(If we are reading a
+
+For the MIME module, the per-dir config creation function just
+
+
+To do that, the server invokes the module's per-directory config merge
+function, if one is present. That function takes three arguments:
+the two structures being merged, and a resource pool in which to
+allocate the result. For the MIME module, all that needs to be done
+is overlay the tables from the new per-directory config structure with
+those from the parent:
+
+
+
+
+
+Another way in which this particular command handler is unusually
+simple is that there are no error conditions which it can encounter.
+If there were, it could return an error message instead of
+
+
+The MIME module's command table has entries for these commands, which
+look like this:
+
+
+
+The only substantial difference is that when a command needs to
+configure the per-server private module data, it needs to go to the
+
+
+A few notes on general pedagogical style here. In the interest of
+conciseness, all structure declarations here are incomplete --- the
+real ones have more slots that I'm not telling you about. For the
+most part, these are reserved to one component of the server core or
+another, and should be altered by modules with caution. However, in
+some cases, they really are things I just haven't gotten around to
+yet. Welcome to the bleeding edge.
+
+Finally, here's an outline, to give you some bare idea of what's
+coming up, and in what order:
+
+
+
+The handlers themselves are functions of one argument (a
+
+
+
+
+Let's begin with handlers. In order to handle the CGI scripts, the
+module declares a response handler for them. Because of
+
+
+The module needs to maintain some per (virtual)
+server information, namely, the
+
+Finally, this module contains code to handle the
+
+
+A final note on the declared types of the arguments of some of these
+commands: a
+
+
+
+The most important such information is a small set of character
+strings describing attributes of the object being requested, including
+its URI, filename, content-type and content-encoding (these being filled
+in by the translation and type-check handlers which handle the
+request, respectively).
+
+Other commonly used data items are tables giving the MIME headers on
+the client's original request, MIME headers to be sent back with the
+response (which modules can add to at will), and environment variables
+for any subprocesses which are spawned off in the course of servicing
+the request. These tables are manipulated using the
+
+
+Finally, there are pointers to two data structures which, in turn,
+point to per-module configuration structures. Specifically, these
+hold pointers to the data structures which the module has built to
+describe the way it has been configured to operate in a given
+directory (via
+
+Here is an abridged declaration, giving the fields most commonly used:
+
+
+
+
+
+
+
+ Such handlers can construct a sub-request, using the
+ functions
+
+ (Server-side includes work by building sub-requests and then
+ actually invoking the response handler for them, via the
+ function
+
+
+
+They should begin by sending an HTTP response header, using the
+function
+
+Otherwise, they should produce a request body which responds to the
+client as appropriate. The primitives for this are
+
+At this point, you should more or less understand the following piece
+of code, which is the handler which handles
+
+
+
+(Invoking
+
+The way this works is as follows: the memory which is allocated, file
+opened, etc., to deal with a particular request are tied to a
+resource pool which is allocated for the request. The pool
+is a data structure which itself tracks the resources in question.
+
+When the request has been processed, the pool is cleared. At
+that point, all the memory associated with it is released for reuse,
+all files associated with it are closed, and any other clean-up
+functions which are associated with the pool are run. When this is
+over, we can be confident that all the resource tied to the pool have
+been released, and that none of them have leaked.
+
+Server restarts, and allocation of memory and resources for per-server
+configuration, are handled in a similar way. There is a
+configuration pool, which keeps track of resources which were
+allocated while reading the server configuration files, and handling
+the commands therein (for instance, the memory that was allocated for
+per-server module configuration, log files and other files that were
+opened, and so forth). When the server restarts, and has to reread
+the configuration files, the configuration pool is cleared, and so the
+memory and file descriptors which were taken up by reading them the
+last time are made available for reuse.
+
+It should be noted that use of the pool machinery isn't generally
+obligatory, except for situations like logging handlers, where you
+really need to register cleanups to make sure that the log file gets
+closed when the server restarts (this is most easily done by using the
+function
+
+We begin here by describing how memory is allocated to pools, and then
+discuss how other resources are tracked by the resource pool
+machinery.
+
+
+
+(It also raises the possibility that heavy use of
+
+Unlike the case for memory, there are functions to close
+files allocated with
+
+(Using the
+
+The primitive for creating a sub-pool is
+
+One final note --- sub-requests have their own resource pools, which
+are sub-pools of the resource pool for the main request. The polite
+way to reclaim the resources associated with a sub request which you
+have allocated (using the
+
+(Again, under most circumstances, you shouldn't feel obliged to call
+this function; only 2K of memory or so are allocated for a typical sub
+request, and it will be freed anyway when the main request pool is
+cleared. It is only when you are allocating many, many sub-requests
+for a single main request that you should seriously consider the
+
+
+However, just giving the modules command tables is not enough to
+divorce them completely from the server core. The server has to
+remember the commands in order to act on them later. That involves
+maintaining data which is private to the modules, and which can be
+either per-server, or per-directory. Most things are per-directory,
+including in particular access control and authorization information,
+but also information on how to determine file types from suffixes,
+which can be modified by
+
+Another requirement for emulating the NCSA server is being able to
+handle the per-directory configuration files, generally called
+
+
+Finally, after having served a request which involved reading
+
+
+
+
+(If we are reading a
+
+For the MIME module, the per-dir config creation function just
+
+
+To do that, the server invokes the module's per-directory config merge
+function, if one is present. That function takes three arguments:
+the two structures being merged, and a resource pool in which to
+allocate the result. For the MIME module, all that needs to be done
+is overlay the tables from the new per-directory config structure with
+those from the parent:
+
+
+
+
+
+Another way in which this particular command handler is unusually
+simple is that there are no error conditions which it can encounter.
+If there were, it could return an error message instead of
+
+
+The MIME module's command table has entries for these commands, which
+look like this:
+
+
+
+The only substantial difference is that when a command needs to
+configure the per-server private module data, it needs to go to the
+
+ Apache was originally based on code and ideas found in the most
+popular HTTP server of the time.. NCSA httpd 1.3 (early 1995). It has
+since evolved into a far superior system which can rival (and probably
+surpass) almost any other UNIX based HTTP server in terms of functionality,
+efficiency and speed.
+ Since it began, it has been completely rewritten, and includes many new
+features. Apache is, as of June 1996, the most popular WWW server on
+the Internet, according to the Netcraft Survey.
+
+
+We, of course, owe a great debt to NCSA and their programmers for
+making the server Apache was based on. We now, however, have our own
+server, and our project is mostly our own. The Apache Project is an
+entitely independent venture.
+ to address concerns of a group of www providers and part time httpd
+programmers, that httpd didn't behave as they wanted it
+to. Apache is an entirely volunteer effort, completely funded by its
+members, not by commercial sales.
+ A cute name which stuck. Apache is "A PAtCHy server". It was
+ based on some existing code and a series of "patch files".
+
+
+Apache attempts to offer all the features and configuration options
+of NCSA httpd 1.3, as well as many of the additional features found in
+NCSA httpd 1.4 and NCSA httpd 1.5.
+
+NCSA httpd appears to be moving toward adding experimental features
+which are not generally required at the moment. Some of the experiments
+will succeed while others will inevitably be dropped. The Apache philosophy is
+to add what's needed as and when it is needed.
+
+Friendly interaction between Apache and NCSA developers should ensure
+that fundamental feature enhancments stay consistent between the two
+servers for the foreseeable future.
+
+
+For an independent assessment, see http://www.webcompare.com/server-main.html
+ Apache has been shown to be substantially faster than many other
+free servers. Although certain commercial servers have claimed to
+surpass Apache's speed (it has not been demonstrated that any of these
+"benchmarks" are a good way of measuring WWW server speed at any
+rate), we feel that it is better to have a mostly-fast free server
+than an extremely-fast server that costs thousands of dollars. Apache
+is run on sites that get millions of hits per day, and they have
+experienced no performance difficulties. Apache is run on over 100,000 Internet servers (as of July 1996). It has
+been tested thoroughly by both developers and users. The Apache Group
+maintains rigorous standards before releasing new versions of their
+server, and our server runs without a hitch on over one third of all
+WWW servers. When bugs do show up, we release patches and new
+versions, as soon as they are available.
+
+ See http://www.apache.org/info/apache_users.html for an incomplete list of sites running Apache. Apache version 1.1
+and above will come with a proxy module. If compiled in, this will make
+Apache act as a caching-proxy server
+
+Apache API notes
+
+These are some notes on the Apache API and the data structures you
+have to deal with, etc. They are not yet nearly complete, but
+hopefully, they will help you get your bearings. Keep in mind that
+the API is still subject to change as we gain experience with it.
+(See the TODO file for what might be coming). However,
+it will be easy to adapt modules to any changes that are made.
+(We have more modules to adapt than you do).
+
+
+
+Basic concepts.
+
+We begin with an overview of the basic concepts behind the
+API, and how they are manifested in the code.
+
+Handlers, Modules, and Requests
+
+Apache breaks down request handling into a series of steps, more or
+less the same way the Netscape server API does (although this API has
+a few more stages than NetSite does, as hooks for stuff I thought
+might be useful in the future). These are:
+
+
+
+
+These phases are handled by looking at each of a succession of
+modules, looking to see if each of them has a handler for the
+phase, and attempting invoking it if so. The handler can typically do
+one of three things:
+
+SetEnv
, which don't really fit well elsewhere.
+
+
+
+Most phases are terminated by the first module that handles them;
+however, for logging, `fixups', and non-access authentication
+checking, all handlers always run (barring an error). Also, the
+response phase is unique in that modules may declare multiple handlers
+for it, via a dispatch table keyed on the MIME type of the requested
+object. Modules may declare a response-phase handler which can handle
+any request, by giving it the key OK
.
+ DECLINED
. In this case, the
+ server behaves in all respects as if the handler simply hadn't
+ been there.
+ */*
(i.e., a
+wildcard MIME type specification). However, wildcard handlers are
+only invoked if the server has already tried and failed to find a more
+specific response handler for the MIME type of the requested object
+(either none existed, or they all declined).request_rec
structure. vide infra), which returns an
+integer, as above.A brief tour of a module
+
+At this point, we need to explain the structure of a module. Our
+candidate will be one of the messier ones, the CGI module --- this
+handles both CGI scripts and the ScriptAlias
config file
+command. It's actually a great deal more complicated than most
+modules, but if we're going to have only one example, it might as well
+be the one with its fingers in every place.ScriptAlias
, it also has handlers for the name
+translation phase (to recognise ScriptAlias
ed URIs), the
+type-checking phase (any ScriptAlias
ed request is typed
+as a CGI script).ScriptAlias
es in effect;
+the module structure therefore contains pointers to a functions which
+builds these structures, and to another which combines two of them (in
+case the main server and a virtual server both have
+ScriptAlias
es declared).ScriptAlias
command itself. This particular module only
+declares one command, but there could be more, so modules have
+command tables which declare their commands, and describe
+where they are permitted, and how they are to be invoked. pool
is a pointer to a resource pool
+structure; these are used by the server to keep track of the memory
+which has been allocated, files opened, etc., either to service a
+particular request, or to handle the process of configuring itself.
+That way, when the request is over (or, for the configuration pool,
+when the server is restarting), the memory can be freed, and the files
+closed, en masse, without anyone having to write explicit code to
+track them all down and dispose of them. Also, a
+cmd_parms
structure contains various information about
+the config file being read, and other status information, which is
+sometimes of use to the function which processes a config-file command
+(such as ScriptAlias
).
+
+With no further ado, the module itself:
+
+
+/* Declarations of handlers. */
+
+int translate_scriptalias (request_rec *);
+int type_scriptalias (request_rec *);
+int cgi_handler (request_rec *);
+
+/* Subsidiary dispatch table for response-phase handlers, by MIME type */
+
+handler_rec cgi_handlers[] = {
+{ "application/x-httpd-cgi", cgi_handler },
+{ NULL }
+};
+
+/* Declarations of routines to manipulate the module's configuration
+ * info. Note that these are returned, and passed in, as void *'s;
+ * the server core keeps track of them, but it doesn't, and can't,
+ * know their internal structure.
+ */
+
+void *make_cgi_server_config (pool *);
+void *merge_cgi_server_config (pool *, void *, void *);
+
+/* Declarations of routines to handle config-file commands */
+
+extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake,
+ char *real);
+
+command_rec cgi_cmds[] = {
+{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2,
+ "a fakename and a realname"},
+{ NULL }
+};
+
+module cgi_module = {
+ STANDARD_MODULE_STUFF,
+ NULL, /* initializer */
+ NULL, /* dir config creator */
+ NULL, /* dir merger --- default is to override */
+ make_cgi_server_config, /* server config */
+ merge_cgi_server_config, /* merge server config */
+ cgi_cmds, /* command table */
+ cgi_handlers, /* handlers */
+ translate_scriptalias, /* filename translation */
+ NULL, /* check_user_id */
+ NULL, /* check auth */
+ NULL, /* check access */
+ type_scriptalias, /* type_checker */
+ NULL, /* fixups */
+ NULL /* logger */
+};
+
+
+How handlers work
+
+The sole argument to handlers is a request_rec
structure.
+This structure describes a particular request which has been made to
+the server, on behalf of a client. In most cases, each connection to
+the client generates only one request_rec
structure.A brief tour of the
+
+The request_rec
request_rec
contains pointers to a resource pool
+which will be cleared when the server is finished handling the
+request; to structures containing per-server and per-connection
+information, and most importantly, information on the request itself.table_get
and table_set
routines. .htaccess
files or
+<Directory>
sections), for private data it has
+built in the course of servicing the request (so modules' handlers for
+one phase can pass `notes' to their handlers for other phases). There
+is another such configuration vector in the server_rec
+data structure pointed to by the request_rec
, which
+contains per (virtual) server configuration data.
+struct request_rec {
+
+ pool *pool;
+ conn_rec *connection;
+ server_rec *server;
+
+ /* What object is being requested */
+
+ char *uri;
+ char *filename;
+ char *path_info;
+ char *args; /* QUERY_ARGS, if any */
+ struct stat finfo; /* Set by server core;
+ * st_mode set to zero if no such file */
+
+ char *content_type;
+ char *content_encoding;
+
+ /* MIME header environments, in and out. Also, an array containing
+ * environment variables to be passed to subprocesses, so people can
+ * write modules to add to that environment.
+ *
+ * The difference between headers_out and err_headers_out is that
+ * the latter are printed even on error, and persist across internal
+ * redirects (so the headers printed for ErrorDocument handlers will
+ * have them).
+ */
+
+ table *headers_in;
+ table *headers_out;
+ table *err_headers_out;
+ table *subprocess_env;
+
+ /* Info about the request itself... */
+
+ int header_only; /* HEAD request, as opposed to GET */
+ char *protocol; /* Protocol, as given to us, or HTTP/0.9 */
+ char *method; /* GET, HEAD, POST, etc. */
+ int method_number; /* M_GET, M_POST, etc. */
+
+ /* Info for logging */
+
+ char *the_request;
+ int bytes_sent;
+
+ /* A flag which modules can set, to indicate that the data being
+ * returned is volatile, and clients should be told not to cache it.
+ */
+
+ int no_cache;
+
+ /* Various other config info which may change with .htaccess files
+ * These are config vectors, with one void* pointer for each module
+ * (the thing pointed to being the module's business).
+ */
+
+ void *per_dir_config; /* Options set in config files, etc. */
+ void *request_config; /* Notes on *this* request */
+
+};
+
+
+
+Where request_rec structures come from
+
+Most request_rec
structures are built by reading an HTTP
+request from a client, and filling in the fields. However, there are
+a few exceptions:
+
+
+
+
+*.var
file), or a CGI script which returned a
+ local `Location:', then the resource which the user requested
+ is going to be ultimately located by some URI other than what
+ the client originally supplied. In this case, the server does
+ an internal redirect, constructing a new
+ request_rec
for the new URI, and processing it
+ almost exactly as if the client had requested the new URI
+ directly. ErrorDocument
is in scope, the same internal
+ redirect machinery comes into play.sub_req_lookup_file
and
+ sub_req_lookup_uri
; this constructs a new
+ request_rec
structure and processes it as you
+ would expect, up to but not including the point of actually
+ sending a response. (These functions skip over the access
+ checks if the sub-request is for a file in the same directory
+ as the original request).run_sub_request
).
+Handling requests, declining, and returning error codes
+
+As discussed above, each handler, when invoked to handle a particular
+request_rec
, has to return an int
to
+indicate what happened. That can either be
+
+
+
+
+Note that if the error code returned is REDIRECT
, then
+the module should put a Location
in the request's
+headers_out
, to indicate where the client should be
+redirected to. Special considerations for response handlers
+
+Handlers for most phases do their work by simply setting a few fields
+in the request_rec
structure (or, in the case of access
+checkers, simply by returning the correct error code). However,
+response handlers have to actually send a request back to the client. send_http_header
. (You don't have to do
+anything special to skip sending the header for HTTP/0.9 requests; the
+function figures out on its own that it shouldn't do anything). If
+the request is marked header_only
, that's all they should
+do; they should return after that, without attempting any further
+output. rputc
+and rprintf
, for internally generated output, and
+send_fd
, to copy the contents of some FILE *
+straight to the client. GET
requests
+which have no more specific handler; it also shows how conditional
+GET
s can be handled, if it's desirable to do so in a
+particular response handler --- set_last_modified
checks
+against the If-modified-since
value supplied by the
+client, if any, and returns an appropriate code (which will, if
+nonzero, be USE_LOCAL_COPY). No similar considerations apply for
+set_content_length
, but it returns an error code for
+symmetry.
+int default_handler (request_rec *r)
+{
+ int errstatus;
+ FILE *f;
+
+ if (r->method_number != M_GET) return DECLINED;
+ if (r->finfo.st_mode == 0) return NOT_FOUND;
+
+ if ((errstatus = set_content_length (r, r->finfo.st_size))
+ || (errstatus = set_last_modified (r, r->finfo.st_mtime)))
+ return errstatus;
+
+ f = fopen (r->filename, "r");
+
+ if (f == NULL) {
+ log_reason("file permissions deny server access",
+ r->filename, r);
+ return FORBIDDEN;
+ }
+
+ register_timeout ("send", r);
+ send_http_header (r);
+
+ if (!r->header_only) send_fd (f, r);
+ pfclose (r->pool, f);
+ return OK;
+}
+
+
+Finally, if all of this is too much of a challenge, there are a few
+ways out of it. First off, as shown above, a response handler which
+has not yet produced any output can simply return an error code, in
+which case the server will automatically produce an error response.
+Secondly, it can punt to some other handler by invoking
+internal_redirect
, which is how the internal redirection
+machinery discussed above is invoked. A response handler which has
+internally redirected should always return OK
. internal_redirect
from handlers which are
+not response handlers will lead to serious confusion).
+
+Special considerations for authentication handlers
+
+Stuff that should be discussed here in detail:
+
+
+
+
+auth_type
,
+ auth_name
, and requires
.
+ get_basic_auth_pw
,
+ which sets the connection->user
structure field
+ automatically, and note_basic_auth_failure
, which
+ arranges for the proper WWW-Authenticate:
header
+ to be sent back).
+Special considerations for logging handlers
+
+When a request has internally redirected, there is the question of
+what to log. Apache handles this by bundling the entire chain of
+redirects into a list of request_rec
structures which are
+threaded through the r->prev
and r->next
+pointers. The request_rec
which is passed to the logging
+handlers in such cases is the one which was originally built for the
+initial request from the client; note that the bytes_sent field will
+only be correct in the last request in the chain (the one for which a
+response was actually sent).
+
+Resource allocation and resource pools
+
+One of the problems of writing and designing a server-pool server is
+that of preventing leakage, that is, allocating resources (memory,
+open files, etc.), without subsequently releasing them. The resource
+pool machinery is designed to make it easy to prevent this from
+happening, by allowing resource to be allocated in such a way that
+they are automatically released when the server is done with
+them. pfopen
, which also
+arranges for the underlying file descriptor to be closed before any
+child processes, such as for CGI scripts, are exec
ed), or
+in case you are using the timeout machinery (which isn't yet even
+documented here). However, there are two benefits to using it:
+resources allocated to a pool never leak (even if you allocate a
+scratch string, and just forget about it); also, for memory
+allocation, palloc
is generally faster than
+malloc
.Allocation of memory in pools
+
+Memory is allocated to pools by calling the function
+palloc
, which takes two arguments, one being a pointer to
+a resource pool structure, and the other being the amount of memory to
+allocate (in char
s). Within handlers for handling
+requests, the most common way of getting a resource pool structure is
+by looking at the pool
slot of the relevant
+request_rec
; hence the repeated appearance of the
+following idiom in module code:
+
+
+int my_handler(request_rec *r)
+{
+ struct my_structure *foo;
+ ...
+
+ foo = (foo *)palloc (r->pool, sizeof(my_structure));
+}
+
+
+Note that there is no pfree
---
+palloc
ed memory is freed only when the associated
+resource pool is cleared. This means that palloc
does not
+have to do as much accounting as malloc()
; all it does in
+the typical case is to round up the size, bump a pointer, and do a
+range check.palloc
+could cause a server process to grow excessively large. There are
+two ways to deal with this, which are dealt with below; briefly, you
+can use malloc
, and try to be sure that all of the memory
+gets explicitly free
d, or you can allocate a sub-pool of
+the main pool, allocate your memory in the sub-pool, and clear it out
+periodically. The latter technique is discussed in the section on
+sub-pools below, and is used in the directory-indexing code, in order
+to avoid excessive storage allocation when listing directories with
+thousands of files).
+
+Allocating initialized memory
+
+There are functions which allocate initialized memory, and are
+frequently useful. The function pcalloc
has the same
+interface as palloc
, but clears out the memory it
+allocates before it returns it. The function pstrdup
+takes a resource pool and a char *
as arguments, and
+allocates memory for a copy of the string the pointer points to,
+returning a pointer to the copy. Finally pstrcat
is a
+varargs-style function, which takes a pointer to a resource pool, and
+at least two char *
arguments, the last of which must be
+NULL
. It allocates enough memory to fit copies of each
+of the strings, as a unit; for instance:
+
+
+ pstrcat (r->pool, "foo", "/", "bar", NULL);
+
+
+returns a pointer to 8 bytes worth of memory, initialized to
+"foo/bar"
.
+
+Tracking open files, etc.
+
+As indicated above, resource pools are also used to track other sorts
+of resources besides memory. The most common are open files. The
+routine which is typically used for this is pfopen
, which
+takes a resource pool and two strings as arguments; the strings are
+the same as the typical arguments to fopen
, e.g.,
+
+
+ ...
+ FILE *f = pfopen (r->pool, r->filename, "r");
+
+ if (f == NULL) { ... } else { ... }
+
+
+There is also a popenf
routine, which parallels the
+lower-level open
system call. Both of these routines
+arrange for the file to be closed when the resource pool in question
+is cleared. pfopen
, and popenf
,
+namely pfclose
and pclosef
. (This is
+because, on many systems, the number of files which a single process
+can have open is quite limited). It is important to use these
+functions to close files allocated with pfopen
and
+popenf
, since to do otherwise could cause fatal errors on
+systems such as Linux, which react badly if the same
+FILE*
is closed more than once. close
functions is not mandatory, since the
+file will eventually be closed regardless, but you should consider it
+in cases where your module is opening, or could open, a lot of files).
+
+Other sorts of resources --- cleanup functions
+
+More text goes here. Describe the the cleanup primitives in terms of
+which the file stuff is implemented; also, spawn_process
.
+
+Fine control --- creating and dealing with sub-pools, with a note
+on sub-requests
+
+On rare occasions, too-free use of palloc()
and the
+associated primitives may result in undesirably profligate resource
+allocation. You can deal with such a case by creating a
+sub-pool, allocating within the sub-pool rather than the main
+pool, and clearing or destroying the sub-pool, which releases the
+resources which were associated with it. (This really is a
+rare situation; the only case in which it comes up in the standard
+module set is in case of listing directories, and then only with
+very large directories. Unnecessary use of the primitives
+discussed here can hair up your code quite a bit, with very little
+gain). make_sub_pool
,
+which takes another pool (the parent pool) as an argument. When the
+main pool is cleared, the sub-pool will be destroyed. The sub-pool
+may also be cleared or destroyed at any time, by calling the functions
+clear_pool
and destroy_pool
, respectively.
+(The difference is that clear_pool
frees resources
+associated with the pool, while destroy_pool
also
+deallocates the pool itself. In the former case, you can allocate new
+resources within the pool, and clear it again, and so forth; in the
+latter case, it is simply gone). sub_req_lookup_...
functions)
+is destroy_sub_request
, which frees the resource pool.
+Before calling this function, be sure to copy anything that you care
+about which might be allocated in the sub-request's resource pool into
+someplace a little less volatile (for instance, the filename in its
+request_rec
structure). destroy...
functions).
+
+Configuration, commands and the like
+
+One of the design goals for this server was to maintain external
+compatibility with the NCSA 1.3 server --- that is, to read the same
+configuration files, to process all the directives therein correctly,
+and in general to be a drop-in replacement for NCSA. On the other
+hand, another design goal was to move as much of the server's
+functionality into modules which have as little as possible to do with
+the monolithic server core. The only way to reconcile these goals is
+to move the handling of most commands from the central server into the
+modules. AddType
and
+DefaultType
directives, and so forth. In general, the
+governing philosophy is that anything which can be made
+configurable by directory should be; per-server information is
+generally used in the standard set of modules for information like
+Alias
es and Redirect
s which come into play
+before the request is tied to a particular place in the underlying
+file system. .htaccess
files, though even in the NCSA server they can
+contain directives which have nothing at all to do with access
+control. Accordingly, after URI -> filename translation, but before
+performing any other phase, the server walks down the directory
+hierarchy of the underlying filesystem, following the translated
+pathname, to read any .htaccess
files which might be
+present. The information which is read in then has to be
+merged with the applicable information from the server's own
+config files (either from the <Directory>
sections
+in access.conf
, or from defaults in
+srm.conf
, which actually behaves for most purposes almost
+exactly like <Directory />
)..htaccess
files, we need to discard the storage allocated
+for handling them. That is solved the same way it is solved wherever
+else similar problems come up, by tying those structures to the
+per-transaction resource pool. Per-directory configuration structures
+
+Let's look out how all of this plays out in mod_mime.c
,
+which defines the file typing handler which emulates the NCSA server's
+behavior of determining file types from suffixes. What we'll be
+looking at, here, is the code which implements the
+AddType
and AddEncoding
commands. These
+commands can appear in .htaccess
files, so they must be
+handled in the module's private per-directory data, which in fact,
+consists of two separate table
s for MIME types and
+encoding information, and is declared as follows:
+
+
+typedef struct {
+ table *forced_types; /* Additional AddTyped stuff */
+ table *encoding_types; /* Added with AddEncoding... */
+} mime_dir_config;
+
+
+When the server is reading a configuration file, or
+<Directory>
section, which includes one of the MIME
+module's commands, it needs to create a mime_dir_config
+structure, so those commands have something to act on. It does this
+by invoking the function it finds in the module's `create per-dir
+config slot', with two arguments: the name of the directory to which
+this configuration information applies (or NULL
for
+srm.conf
), and a pointer to a resource pool in which the
+allocation should happen. .htaccess
file, that resource pool
+is the per-request resource pool for the request; otherwise it is a
+resource pool which is used for configuration data, and cleared on
+restarts. Either way, it is important for the structure being created
+to vanish when the pool is cleared, by registering a cleanup on the
+pool if necessary). palloc
s the structure above, and a creates a couple of
+table
s to fill it. That looks like this:
+
+
+void *create_mime_dir_config (pool *p, char *dummy)
+{
+ mime_dir_config *new =
+ (mime_dir_config *) palloc (p, sizeof(mime_dir_config));
+
+ new->forced_types = make_table (p, 4);
+ new->encoding_types = make_table (p, 4);
+
+ return new;
+}
+
+
+Now, suppose we've just read in a .htaccess
file. We
+already have the per-directory configuration structure for the next
+directory up in the hierarchy. If the .htaccess
file we
+just read in didn't have any AddType
or
+AddEncoding
commands, its per-directory config structure
+for the MIME module is still valid, and we can just use it.
+Otherwise, we need to merge the two structures somehow.
+void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv)
+{
+ mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv;
+ mime_dir_config *subdir = (mime_dir_config *)subdirv;
+ mime_dir_config *new =
+ (mime_dir_config *)palloc (p, sizeof(mime_dir_config));
+
+ new->forced_types = overlay_tables (p, subdir->forced_types,
+ parent_dir->forced_types);
+ new->encoding_types = overlay_tables (p, subdir->encoding_types,
+ parent_dir->encoding_types);
+
+ return new;
+}
+
+
+As a note --- if there is no per-directory merge function present, the
+server will just use the subdirectory's configuration info, and ignore
+the parent's. For some modules, that works just fine (e.g., for the
+includes module, whose per-directory configuration information
+consists solely of the state of the XBITHACK
), and for
+those modules, you can just not declare one, and leave the
+corresponding structure slot in the module itself NULL
.Command handling
+
+Now that we have these structures, we need to be able to figure out
+how to fill them. That involves processing the actual
+AddType
and AddEncoding
commands. To find
+commands, the server looks in the module's command table
.
+That table contains information on how many arguments the commands
+take, and in what formats, where it is permitted, and so forth. That
+information is sufficient to allow the server to invoke most
+command-handling functions with pre-parsed arguments. Without further
+ado, let's look at the AddType
command handler, which
+looks like this (the AddEncoding
command looks basically
+the same, and won't be shown here):
+
+
+char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext)
+{
+ if (*ext == '.') ++ext;
+ table_set (m->forced_types, ext, ct);
+ return NULL;
+}
+
+
+This command handler is unusually simple. As you can see, it takes
+four arguments, two of which are pre-parsed arguments, the third being
+the per-directory configuration structure for the module in question,
+and the fourth being a pointer to a cmd_parms
structure.
+That structure contains a bunch of arguments which are frequently of
+use to some, but not all, commands, including a resource pool (from
+which memory can be allocated, and to which cleanups should be tied),
+and the (virtual) server being configured, from which the module's
+per-server configuration data can be obtained if required.NULL
; this causes an error to be printed out on the
+server's stderr
, followed by a quick exit, if it is in
+the main config files; for a .htaccess
file, the syntax
+error is logged in the server error log (along with an indication of
+where it came from), and the request is bounced with a server error
+response (HTTP error status, code 500).
+command_rec mime_cmds[] = {
+{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2,
+ "a mime type followed by a file extension" },
+{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2,
+ "an encoding (e.g., gzip), followed by a file extension" },
+{ NULL }
+};
+
+
+The entries in these tables are:
+
+
+
+
+Finally, having set this all up, we have to use it. This is
+ultimately done in the module's handlers, specifically for its
+file-typing handler, which looks more or less like this; note that the
+per-directory configuration structure is extracted from the
+(void *)
pointer, which is passed in the
+ cmd_parms
structure to the command handler ---
+ this is useful in case many similar commands are handled by the
+ same function.
+ AllowOverride
+ option, and an additional mask bit, RSRC_CONF
,
+ indicating that the command may appear in the server's own
+ config files, but not in any .htaccess
+ file.
+ TAKE2
indicates two pre-parsed arguments. Other
+ options are TAKE1
, which indicates one pre-parsed
+ argument, FLAG
, which indicates that the argument
+ should be On
or Off
, and is passed in
+ as a boolean flag, RAW_ARGS
, which causes the
+ server to give the command the raw, unparsed arguments
+ (everything but the command name itself). There is also
+ ITERATE
, which means that the handler looks the
+ same as TAKE1
, but that if multiple arguments are
+ present, it should be called multiple times, and finally
+ ITERATE2
, which indicates that the command handler
+ looks like a TAKE2
, but if more arguments are
+ present, then it should be called multiple times, holding the
+ first argument constant.
+ NULL
).
+request_rec
's per-directory configuration vector by using
+the get_module_config
function.
+
+
+int find_ct(request_rec *r)
+{
+ int i;
+ char *fn = pstrdup (r->pool, r->filename);
+ mime_dir_config *conf = (mime_dir_config *)
+ get_module_config(r->per_dir_config, &mime_module);
+ char *type;
+
+ if (S_ISDIR(r->finfo.st_mode)) {
+ r->content_type = DIR_MAGIC_TYPE;
+ return OK;
+ }
+
+ if((i=rind(fn,'.')) < 0) return DECLINED;
+ ++i;
+
+ if ((type = table_get (conf->encoding_types, &fn[i])))
+ {
+ r->content_encoding = type;
+
+ /* go back to previous extension to try to use it as a type */
+
+ fn[i-1] = '\0';
+ if((i=rind(fn,'.')) < 0) return OK;
+ ++i;
+ }
+
+ if ((type = table_get (conf->forced_types, &fn[i])))
+ {
+ r->content_type = type;
+ }
+
+ return OK;
+}
+
+
+
+Side notes --- per-server configuration, virtual servers, etc.
+
+The basic ideas behind per-server module configuration are basically
+the same as those for per-directory configuration; there is a creation
+function and a merge function, the latter being invoked where a
+virtual server has partially overridden the base server configuration,
+and a combined structure must be computed. (As with per-directory
+configuration, the default if no merge function is specified, and a
+module is configured in some virtual server, is that the base
+configuration is simply ignored). cmd_parms
data to get at it. Here's an example, from the
+alias module, which also indicates how a syntax error can be returned
+(note that the per-directory configuration argument to the command
+handler is declared as a dummy, since the module doesn't actually have
+per-directory config data):
+
+
+char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url)
+{
+ server_rec *s = cmd->server;
+ alias_server_conf *conf = (alias_server_conf *)
+ get_module_config(s->module_config,&alias_module);
+ alias_entry *new = push_array (conf->redirects);
+
+ if (!is_url (url)) return "Redirect to non-URL";
+
+ new->fake = f; new->real = url;
+ return NULL;
+}
+
+
+
+
diff --git a/docs/manual/misc/API.html b/docs/manual/misc/API.html
new file mode 100644
index 0000000000..f860996e47
--- /dev/null
+++ b/docs/manual/misc/API.html
@@ -0,0 +1,988 @@
+
+
+Apache API notes
+
+These are some notes on the Apache API and the data structures you
+have to deal with, etc. They are not yet nearly complete, but
+hopefully, they will help you get your bearings. Keep in mind that
+the API is still subject to change as we gain experience with it.
+(See the TODO file for what might be coming). However,
+it will be easy to adapt modules to any changes that are made.
+(We have more modules to adapt than you do).
+
+
+
+Basic concepts.
+
+We begin with an overview of the basic concepts behind the
+API, and how they are manifested in the code.
+
+Handlers, Modules, and Requests
+
+Apache breaks down request handling into a series of steps, more or
+less the same way the Netscape server API does (although this API has
+a few more stages than NetSite does, as hooks for stuff I thought
+might be useful in the future). These are:
+
+
+
+
+These phases are handled by looking at each of a succession of
+modules, looking to see if each of them has a handler for the
+phase, and attempting invoking it if so. The handler can typically do
+one of three things:
+
+SetEnv
, which don't really fit well elsewhere.
+
+
+
+Most phases are terminated by the first module that handles them;
+however, for logging, `fixups', and non-access authentication
+checking, all handlers always run (barring an error). Also, the
+response phase is unique in that modules may declare multiple handlers
+for it, via a dispatch table keyed on the MIME type of the requested
+object. Modules may declare a response-phase handler which can handle
+any request, by giving it the key OK
.
+ DECLINED
. In this case, the
+ server behaves in all respects as if the handler simply hadn't
+ been there.
+ */*
(i.e., a
+wildcard MIME type specification). However, wildcard handlers are
+only invoked if the server has already tried and failed to find a more
+specific response handler for the MIME type of the requested object
+(either none existed, or they all declined).request_rec
structure. vide infra), which returns an
+integer, as above.A brief tour of a module
+
+At this point, we need to explain the structure of a module. Our
+candidate will be one of the messier ones, the CGI module --- this
+handles both CGI scripts and the ScriptAlias
config file
+command. It's actually a great deal more complicated than most
+modules, but if we're going to have only one example, it might as well
+be the one with its fingers in every place.ScriptAlias
, it also has handlers for the name
+translation phase (to recognise ScriptAlias
ed URIs), the
+type-checking phase (any ScriptAlias
ed request is typed
+as a CGI script).ScriptAlias
es in effect;
+the module structure therefore contains pointers to a functions which
+builds these structures, and to another which combines two of them (in
+case the main server and a virtual server both have
+ScriptAlias
es declared).ScriptAlias
command itself. This particular module only
+declares one command, but there could be more, so modules have
+command tables which declare their commands, and describe
+where they are permitted, and how they are to be invoked. pool
is a pointer to a resource pool
+structure; these are used by the server to keep track of the memory
+which has been allocated, files opened, etc., either to service a
+particular request, or to handle the process of configuring itself.
+That way, when the request is over (or, for the configuration pool,
+when the server is restarting), the memory can be freed, and the files
+closed, en masse, without anyone having to write explicit code to
+track them all down and dispose of them. Also, a
+cmd_parms
structure contains various information about
+the config file being read, and other status information, which is
+sometimes of use to the function which processes a config-file command
+(such as ScriptAlias
).
+
+With no further ado, the module itself:
+
+
+/* Declarations of handlers. */
+
+int translate_scriptalias (request_rec *);
+int type_scriptalias (request_rec *);
+int cgi_handler (request_rec *);
+
+/* Subsidiary dispatch table for response-phase handlers, by MIME type */
+
+handler_rec cgi_handlers[] = {
+{ "application/x-httpd-cgi", cgi_handler },
+{ NULL }
+};
+
+/* Declarations of routines to manipulate the module's configuration
+ * info. Note that these are returned, and passed in, as void *'s;
+ * the server core keeps track of them, but it doesn't, and can't,
+ * know their internal structure.
+ */
+
+void *make_cgi_server_config (pool *);
+void *merge_cgi_server_config (pool *, void *, void *);
+
+/* Declarations of routines to handle config-file commands */
+
+extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake,
+ char *real);
+
+command_rec cgi_cmds[] = {
+{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2,
+ "a fakename and a realname"},
+{ NULL }
+};
+
+module cgi_module = {
+ STANDARD_MODULE_STUFF,
+ NULL, /* initializer */
+ NULL, /* dir config creator */
+ NULL, /* dir merger --- default is to override */
+ make_cgi_server_config, /* server config */
+ merge_cgi_server_config, /* merge server config */
+ cgi_cmds, /* command table */
+ cgi_handlers, /* handlers */
+ translate_scriptalias, /* filename translation */
+ NULL, /* check_user_id */
+ NULL, /* check auth */
+ NULL, /* check access */
+ type_scriptalias, /* type_checker */
+ NULL, /* fixups */
+ NULL /* logger */
+};
+
+
+How handlers work
+
+The sole argument to handlers is a request_rec
structure.
+This structure describes a particular request which has been made to
+the server, on behalf of a client. In most cases, each connection to
+the client generates only one request_rec
structure.A brief tour of the
+
+The request_rec
request_rec
contains pointers to a resource pool
+which will be cleared when the server is finished handling the
+request; to structures containing per-server and per-connection
+information, and most importantly, information on the request itself.table_get
and table_set
routines. .htaccess
files or
+<Directory>
sections), for private data it has
+built in the course of servicing the request (so modules' handlers for
+one phase can pass `notes' to their handlers for other phases). There
+is another such configuration vector in the server_rec
+data structure pointed to by the request_rec
, which
+contains per (virtual) server configuration data.
+struct request_rec {
+
+ pool *pool;
+ conn_rec *connection;
+ server_rec *server;
+
+ /* What object is being requested */
+
+ char *uri;
+ char *filename;
+ char *path_info;
+ char *args; /* QUERY_ARGS, if any */
+ struct stat finfo; /* Set by server core;
+ * st_mode set to zero if no such file */
+
+ char *content_type;
+ char *content_encoding;
+
+ /* MIME header environments, in and out. Also, an array containing
+ * environment variables to be passed to subprocesses, so people can
+ * write modules to add to that environment.
+ *
+ * The difference between headers_out and err_headers_out is that
+ * the latter are printed even on error, and persist across internal
+ * redirects (so the headers printed for ErrorDocument handlers will
+ * have them).
+ */
+
+ table *headers_in;
+ table *headers_out;
+ table *err_headers_out;
+ table *subprocess_env;
+
+ /* Info about the request itself... */
+
+ int header_only; /* HEAD request, as opposed to GET */
+ char *protocol; /* Protocol, as given to us, or HTTP/0.9 */
+ char *method; /* GET, HEAD, POST, etc. */
+ int method_number; /* M_GET, M_POST, etc. */
+
+ /* Info for logging */
+
+ char *the_request;
+ int bytes_sent;
+
+ /* A flag which modules can set, to indicate that the data being
+ * returned is volatile, and clients should be told not to cache it.
+ */
+
+ int no_cache;
+
+ /* Various other config info which may change with .htaccess files
+ * These are config vectors, with one void* pointer for each module
+ * (the thing pointed to being the module's business).
+ */
+
+ void *per_dir_config; /* Options set in config files, etc. */
+ void *request_config; /* Notes on *this* request */
+
+};
+
+
+
+Where request_rec structures come from
+
+Most request_rec
structures are built by reading an HTTP
+request from a client, and filling in the fields. However, there are
+a few exceptions:
+
+
+
+
+*.var
file), or a CGI script which returned a
+ local `Location:', then the resource which the user requested
+ is going to be ultimately located by some URI other than what
+ the client originally supplied. In this case, the server does
+ an internal redirect, constructing a new
+ request_rec
for the new URI, and processing it
+ almost exactly as if the client had requested the new URI
+ directly. ErrorDocument
is in scope, the same internal
+ redirect machinery comes into play.sub_req_lookup_file
and
+ sub_req_lookup_uri
; this constructs a new
+ request_rec
structure and processes it as you
+ would expect, up to but not including the point of actually
+ sending a response. (These functions skip over the access
+ checks if the sub-request is for a file in the same directory
+ as the original request).run_sub_request
).
+Handling requests, declining, and returning error codes
+
+As discussed above, each handler, when invoked to handle a particular
+request_rec
, has to return an int
to
+indicate what happened. That can either be
+
+
+
+
+Note that if the error code returned is REDIRECT
, then
+the module should put a Location
in the request's
+headers_out
, to indicate where the client should be
+redirected to. Special considerations for response handlers
+
+Handlers for most phases do their work by simply setting a few fields
+in the request_rec
structure (or, in the case of access
+checkers, simply by returning the correct error code). However,
+response handlers have to actually send a request back to the client. send_http_header
. (You don't have to do
+anything special to skip sending the header for HTTP/0.9 requests; the
+function figures out on its own that it shouldn't do anything). If
+the request is marked header_only
, that's all they should
+do; they should return after that, without attempting any further
+output. rputc
+and rprintf
, for internally generated output, and
+send_fd
, to copy the contents of some FILE *
+straight to the client. GET
requests
+which have no more specific handler; it also shows how conditional
+GET
s can be handled, if it's desirable to do so in a
+particular response handler --- set_last_modified
checks
+against the If-modified-since
value supplied by the
+client, if any, and returns an appropriate code (which will, if
+nonzero, be USE_LOCAL_COPY). No similar considerations apply for
+set_content_length
, but it returns an error code for
+symmetry.
+int default_handler (request_rec *r)
+{
+ int errstatus;
+ FILE *f;
+
+ if (r->method_number != M_GET) return DECLINED;
+ if (r->finfo.st_mode == 0) return NOT_FOUND;
+
+ if ((errstatus = set_content_length (r, r->finfo.st_size))
+ || (errstatus = set_last_modified (r, r->finfo.st_mtime)))
+ return errstatus;
+
+ f = fopen (r->filename, "r");
+
+ if (f == NULL) {
+ log_reason("file permissions deny server access",
+ r->filename, r);
+ return FORBIDDEN;
+ }
+
+ register_timeout ("send", r);
+ send_http_header (r);
+
+ if (!r->header_only) send_fd (f, r);
+ pfclose (r->pool, f);
+ return OK;
+}
+
+
+Finally, if all of this is too much of a challenge, there are a few
+ways out of it. First off, as shown above, a response handler which
+has not yet produced any output can simply return an error code, in
+which case the server will automatically produce an error response.
+Secondly, it can punt to some other handler by invoking
+internal_redirect
, which is how the internal redirection
+machinery discussed above is invoked. A response handler which has
+internally redirected should always return OK
. internal_redirect
from handlers which are
+not response handlers will lead to serious confusion).
+
+Special considerations for authentication handlers
+
+Stuff that should be discussed here in detail:
+
+
+
+
+auth_type
,
+ auth_name
, and requires
.
+ get_basic_auth_pw
,
+ which sets the connection->user
structure field
+ automatically, and note_basic_auth_failure
, which
+ arranges for the proper WWW-Authenticate:
header
+ to be sent back).
+Special considerations for logging handlers
+
+When a request has internally redirected, there is the question of
+what to log. Apache handles this by bundling the entire chain of
+redirects into a list of request_rec
structures which are
+threaded through the r->prev
and r->next
+pointers. The request_rec
which is passed to the logging
+handlers in such cases is the one which was originally built for the
+initial request from the client; note that the bytes_sent field will
+only be correct in the last request in the chain (the one for which a
+response was actually sent).
+
+Resource allocation and resource pools
+
+One of the problems of writing and designing a server-pool server is
+that of preventing leakage, that is, allocating resources (memory,
+open files, etc.), without subsequently releasing them. The resource
+pool machinery is designed to make it easy to prevent this from
+happening, by allowing resource to be allocated in such a way that
+they are automatically released when the server is done with
+them. pfopen
, which also
+arranges for the underlying file descriptor to be closed before any
+child processes, such as for CGI scripts, are exec
ed), or
+in case you are using the timeout machinery (which isn't yet even
+documented here). However, there are two benefits to using it:
+resources allocated to a pool never leak (even if you allocate a
+scratch string, and just forget about it); also, for memory
+allocation, palloc
is generally faster than
+malloc
.Allocation of memory in pools
+
+Memory is allocated to pools by calling the function
+palloc
, which takes two arguments, one being a pointer to
+a resource pool structure, and the other being the amount of memory to
+allocate (in char
s). Within handlers for handling
+requests, the most common way of getting a resource pool structure is
+by looking at the pool
slot of the relevant
+request_rec
; hence the repeated appearance of the
+following idiom in module code:
+
+
+int my_handler(request_rec *r)
+{
+ struct my_structure *foo;
+ ...
+
+ foo = (foo *)palloc (r->pool, sizeof(my_structure));
+}
+
+
+Note that there is no pfree
---
+palloc
ed memory is freed only when the associated
+resource pool is cleared. This means that palloc
does not
+have to do as much accounting as malloc()
; all it does in
+the typical case is to round up the size, bump a pointer, and do a
+range check.palloc
+could cause a server process to grow excessively large. There are
+two ways to deal with this, which are dealt with below; briefly, you
+can use malloc
, and try to be sure that all of the memory
+gets explicitly free
d, or you can allocate a sub-pool of
+the main pool, allocate your memory in the sub-pool, and clear it out
+periodically. The latter technique is discussed in the section on
+sub-pools below, and is used in the directory-indexing code, in order
+to avoid excessive storage allocation when listing directories with
+thousands of files).
+
+Allocating initialized memory
+
+There are functions which allocate initialized memory, and are
+frequently useful. The function pcalloc
has the same
+interface as palloc
, but clears out the memory it
+allocates before it returns it. The function pstrdup
+takes a resource pool and a char *
as arguments, and
+allocates memory for a copy of the string the pointer points to,
+returning a pointer to the copy. Finally pstrcat
is a
+varargs-style function, which takes a pointer to a resource pool, and
+at least two char *
arguments, the last of which must be
+NULL
. It allocates enough memory to fit copies of each
+of the strings, as a unit; for instance:
+
+
+ pstrcat (r->pool, "foo", "/", "bar", NULL);
+
+
+returns a pointer to 8 bytes worth of memory, initialized to
+"foo/bar"
.
+
+Tracking open files, etc.
+
+As indicated above, resource pools are also used to track other sorts
+of resources besides memory. The most common are open files. The
+routine which is typically used for this is pfopen
, which
+takes a resource pool and two strings as arguments; the strings are
+the same as the typical arguments to fopen
, e.g.,
+
+
+ ...
+ FILE *f = pfopen (r->pool, r->filename, "r");
+
+ if (f == NULL) { ... } else { ... }
+
+
+There is also a popenf
routine, which parallels the
+lower-level open
system call. Both of these routines
+arrange for the file to be closed when the resource pool in question
+is cleared. pfopen
, and popenf
,
+namely pfclose
and pclosef
. (This is
+because, on many systems, the number of files which a single process
+can have open is quite limited). It is important to use these
+functions to close files allocated with pfopen
and
+popenf
, since to do otherwise could cause fatal errors on
+systems such as Linux, which react badly if the same
+FILE*
is closed more than once. close
functions is not mandatory, since the
+file will eventually be closed regardless, but you should consider it
+in cases where your module is opening, or could open, a lot of files).
+
+Other sorts of resources --- cleanup functions
+
+More text goes here. Describe the the cleanup primitives in terms of
+which the file stuff is implemented; also, spawn_process
.
+
+Fine control --- creating and dealing with sub-pools, with a note
+on sub-requests
+
+On rare occasions, too-free use of palloc()
and the
+associated primitives may result in undesirably profligate resource
+allocation. You can deal with such a case by creating a
+sub-pool, allocating within the sub-pool rather than the main
+pool, and clearing or destroying the sub-pool, which releases the
+resources which were associated with it. (This really is a
+rare situation; the only case in which it comes up in the standard
+module set is in case of listing directories, and then only with
+very large directories. Unnecessary use of the primitives
+discussed here can hair up your code quite a bit, with very little
+gain). make_sub_pool
,
+which takes another pool (the parent pool) as an argument. When the
+main pool is cleared, the sub-pool will be destroyed. The sub-pool
+may also be cleared or destroyed at any time, by calling the functions
+clear_pool
and destroy_pool
, respectively.
+(The difference is that clear_pool
frees resources
+associated with the pool, while destroy_pool
also
+deallocates the pool itself. In the former case, you can allocate new
+resources within the pool, and clear it again, and so forth; in the
+latter case, it is simply gone). sub_req_lookup_...
functions)
+is destroy_sub_request
, which frees the resource pool.
+Before calling this function, be sure to copy anything that you care
+about which might be allocated in the sub-request's resource pool into
+someplace a little less volatile (for instance, the filename in its
+request_rec
structure). destroy...
functions).
+
+Configuration, commands and the like
+
+One of the design goals for this server was to maintain external
+compatibility with the NCSA 1.3 server --- that is, to read the same
+configuration files, to process all the directives therein correctly,
+and in general to be a drop-in replacement for NCSA. On the other
+hand, another design goal was to move as much of the server's
+functionality into modules which have as little as possible to do with
+the monolithic server core. The only way to reconcile these goals is
+to move the handling of most commands from the central server into the
+modules. AddType
and
+DefaultType
directives, and so forth. In general, the
+governing philosophy is that anything which can be made
+configurable by directory should be; per-server information is
+generally used in the standard set of modules for information like
+Alias
es and Redirect
s which come into play
+before the request is tied to a particular place in the underlying
+file system. .htaccess
files, though even in the NCSA server they can
+contain directives which have nothing at all to do with access
+control. Accordingly, after URI -> filename translation, but before
+performing any other phase, the server walks down the directory
+hierarchy of the underlying filesystem, following the translated
+pathname, to read any .htaccess
files which might be
+present. The information which is read in then has to be
+merged with the applicable information from the server's own
+config files (either from the <Directory>
sections
+in access.conf
, or from defaults in
+srm.conf
, which actually behaves for most purposes almost
+exactly like <Directory />
)..htaccess
files, we need to discard the storage allocated
+for handling them. That is solved the same way it is solved wherever
+else similar problems come up, by tying those structures to the
+per-transaction resource pool. Per-directory configuration structures
+
+Let's look out how all of this plays out in mod_mime.c
,
+which defines the file typing handler which emulates the NCSA server's
+behavior of determining file types from suffixes. What we'll be
+looking at, here, is the code which implements the
+AddType
and AddEncoding
commands. These
+commands can appear in .htaccess
files, so they must be
+handled in the module's private per-directory data, which in fact,
+consists of two separate table
s for MIME types and
+encoding information, and is declared as follows:
+
+
+typedef struct {
+ table *forced_types; /* Additional AddTyped stuff */
+ table *encoding_types; /* Added with AddEncoding... */
+} mime_dir_config;
+
+
+When the server is reading a configuration file, or
+<Directory>
section, which includes one of the MIME
+module's commands, it needs to create a mime_dir_config
+structure, so those commands have something to act on. It does this
+by invoking the function it finds in the module's `create per-dir
+config slot', with two arguments: the name of the directory to which
+this configuration information applies (or NULL
for
+srm.conf
), and a pointer to a resource pool in which the
+allocation should happen. .htaccess
file, that resource pool
+is the per-request resource pool for the request; otherwise it is a
+resource pool which is used for configuration data, and cleared on
+restarts. Either way, it is important for the structure being created
+to vanish when the pool is cleared, by registering a cleanup on the
+pool if necessary). palloc
s the structure above, and a creates a couple of
+table
s to fill it. That looks like this:
+
+
+void *create_mime_dir_config (pool *p, char *dummy)
+{
+ mime_dir_config *new =
+ (mime_dir_config *) palloc (p, sizeof(mime_dir_config));
+
+ new->forced_types = make_table (p, 4);
+ new->encoding_types = make_table (p, 4);
+
+ return new;
+}
+
+
+Now, suppose we've just read in a .htaccess
file. We
+already have the per-directory configuration structure for the next
+directory up in the hierarchy. If the .htaccess
file we
+just read in didn't have any AddType
or
+AddEncoding
commands, its per-directory config structure
+for the MIME module is still valid, and we can just use it.
+Otherwise, we need to merge the two structures somehow.
+void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv)
+{
+ mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv;
+ mime_dir_config *subdir = (mime_dir_config *)subdirv;
+ mime_dir_config *new =
+ (mime_dir_config *)palloc (p, sizeof(mime_dir_config));
+
+ new->forced_types = overlay_tables (p, subdir->forced_types,
+ parent_dir->forced_types);
+ new->encoding_types = overlay_tables (p, subdir->encoding_types,
+ parent_dir->encoding_types);
+
+ return new;
+}
+
+
+As a note --- if there is no per-directory merge function present, the
+server will just use the subdirectory's configuration info, and ignore
+the parent's. For some modules, that works just fine (e.g., for the
+includes module, whose per-directory configuration information
+consists solely of the state of the XBITHACK
), and for
+those modules, you can just not declare one, and leave the
+corresponding structure slot in the module itself NULL
.Command handling
+
+Now that we have these structures, we need to be able to figure out
+how to fill them. That involves processing the actual
+AddType
and AddEncoding
commands. To find
+commands, the server looks in the module's command table
.
+That table contains information on how many arguments the commands
+take, and in what formats, where it is permitted, and so forth. That
+information is sufficient to allow the server to invoke most
+command-handling functions with pre-parsed arguments. Without further
+ado, let's look at the AddType
command handler, which
+looks like this (the AddEncoding
command looks basically
+the same, and won't be shown here):
+
+
+char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext)
+{
+ if (*ext == '.') ++ext;
+ table_set (m->forced_types, ext, ct);
+ return NULL;
+}
+
+
+This command handler is unusually simple. As you can see, it takes
+four arguments, two of which are pre-parsed arguments, the third being
+the per-directory configuration structure for the module in question,
+and the fourth being a pointer to a cmd_parms
structure.
+That structure contains a bunch of arguments which are frequently of
+use to some, but not all, commands, including a resource pool (from
+which memory can be allocated, and to which cleanups should be tied),
+and the (virtual) server being configured, from which the module's
+per-server configuration data can be obtained if required.NULL
; this causes an error to be printed out on the
+server's stderr
, followed by a quick exit, if it is in
+the main config files; for a .htaccess
file, the syntax
+error is logged in the server error log (along with an indication of
+where it came from), and the request is bounced with a server error
+response (HTTP error status, code 500).
+command_rec mime_cmds[] = {
+{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2,
+ "a mime type followed by a file extension" },
+{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2,
+ "an encoding (e.g., gzip), followed by a file extension" },
+{ NULL }
+};
+
+
+The entries in these tables are:
+
+
+
+
+Finally, having set this all up, we have to use it. This is
+ultimately done in the module's handlers, specifically for its
+file-typing handler, which looks more or less like this; note that the
+per-directory configuration structure is extracted from the
+(void *)
pointer, which is passed in the
+ cmd_parms
structure to the command handler ---
+ this is useful in case many similar commands are handled by the
+ same function.
+ AllowOverride
+ option, and an additional mask bit, RSRC_CONF
,
+ indicating that the command may appear in the server's own
+ config files, but not in any .htaccess
+ file.
+ TAKE2
indicates two pre-parsed arguments. Other
+ options are TAKE1
, which indicates one pre-parsed
+ argument, FLAG
, which indicates that the argument
+ should be On
or Off
, and is passed in
+ as a boolean flag, RAW_ARGS
, which causes the
+ server to give the command the raw, unparsed arguments
+ (everything but the command name itself). There is also
+ ITERATE
, which means that the handler looks the
+ same as TAKE1
, but that if multiple arguments are
+ present, it should be called multiple times, and finally
+ ITERATE2
, which indicates that the command handler
+ looks like a TAKE2
, but if more arguments are
+ present, then it should be called multiple times, holding the
+ first argument constant.
+ NULL
).
+request_rec
's per-directory configuration vector by using
+the get_module_config
function.
+
+
+int find_ct(request_rec *r)
+{
+ int i;
+ char *fn = pstrdup (r->pool, r->filename);
+ mime_dir_config *conf = (mime_dir_config *)
+ get_module_config(r->per_dir_config, &mime_module);
+ char *type;
+
+ if (S_ISDIR(r->finfo.st_mode)) {
+ r->content_type = DIR_MAGIC_TYPE;
+ return OK;
+ }
+
+ if((i=rind(fn,'.')) < 0) return DECLINED;
+ ++i;
+
+ if ((type = table_get (conf->encoding_types, &fn[i])))
+ {
+ r->content_encoding = type;
+
+ /* go back to previous extension to try to use it as a type */
+
+ fn[i-1] = '\0';
+ if((i=rind(fn,'.')) < 0) return OK;
+ ++i;
+ }
+
+ if ((type = table_get (conf->forced_types, &fn[i])))
+ {
+ r->content_type = type;
+ }
+
+ return OK;
+}
+
+
+
+Side notes --- per-server configuration, virtual servers, etc.
+
+The basic ideas behind per-server module configuration are basically
+the same as those for per-directory configuration; there is a creation
+function and a merge function, the latter being invoked where a
+virtual server has partially overridden the base server configuration,
+and a combined structure must be computed. (As with per-directory
+configuration, the default if no merge function is specified, and a
+module is configured in some virtual server, is that the base
+configuration is simply ignored). cmd_parms
data to get at it. Here's an example, from the
+alias module, which also indicates how a syntax error can be returned
+(note that the per-directory configuration argument to the command
+handler is declared as a dummy, since the module doesn't actually have
+per-directory config data):
+
+
+char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url)
+{
+ server_rec *s = cmd->server;
+ alias_server_conf *conf = (alias_server_conf *)
+ get_module_config(s->module_config,&alias_module);
+ alias_entry *new = push_array (conf->redirects);
+
+ if (!is_url (url)) return "Redirect to non-URL";
+
+ new->fake = f; new->real = url;
+ return NULL;
+}
+
+
+
+
diff --git a/docs/manual/misc/FAQ.html b/docs/manual/misc/FAQ.html
new file mode 100644
index 0000000000..b630a283f0
--- /dev/null
+++ b/docs/manual/misc/FAQ.html
@@ -0,0 +1,162 @@
+
+
+
+
+
Apache server Frequently Asked Questions
+
+The Questions
+
+
+
+
+
+The Answers
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
There is no official support for Apache. None of the developers want to +be swamped by a flood of trivial questions that can be resolved elsewhere. +Bug reports and suggestions should be sent via the bug report page. +Other questions should be directed to +comp.infosystems.www.servers.unix, where some of the Apache team lurk, +in the company of many other httpd gurus who should be able +to help. +
+Commercial support for Apache is, however, available from a number +third parties. +
+Indeed there is. See http://www.apache.org/. +
++You can find the source for Apache at http://www.apache.org/. +
+Apache 1.1 and earlier let modules handle POST and PUT requests by
+themselves. The module would, on its own, determine whether the
+request had an entity, how many bytes it was, and then called a
+function (read_client_block
) to get the data.
+
+
However, HTTP/1.1 requires several things of POST and PUT request +handlers that did not fit into this module, and all existing modules +have to be rewritten. The API calls for handling this have been +furthur abstracted, so that future HTTP protocol changes can be +accomplished while remaining backwards-compatible.
+ ++ int setup_client_block (request_rec *); + int should_client_block (request_rec *); + long get_client_block (request_rec *, char *buffer, int buffer_size); ++ +
setup_client_block()
near the beginning of the request
+ handler. This will set up all the neccessary properties, and
+ will return either OK, or an error code. If the latter,
+ the module should return that error code.
+
+should_client_block()
.
+ This will tell the module whether or not to read input. If it is 0,
+ the module should assume that the input is of a non-entity type
+ (e.g. a GET request). A nonzero response indicates that the module
+ should proceed (to step 3).
+ This step also sends a 100 Continue response
+ to HTTP/1.1 clients, so should not be called until the module
+ is *defenitely* ready to read content. (otherwise, the point of the
+ 100 response is defeated). Never call this function more than once.
+
+get_client_block
in a loop. Pass it a
+ buffer and its
+ size. It will put data into the buffer (not neccessarily the full
+ buffer, in the case of chunked inputs), and return the length of
+ the input block. When it is done reading, it will return 0, and
+ the module should proceed.
+
+As an example, please look at the code in
+mod_cgi.c
. This is properly written to the new API
+guidelines.
Please also check the known bugs page. + + + +
AddType
only accepts one file extension per line, without
+any dots (.
) in the extension, and does not take full filenames.
+If you need multiple extensions per type, use multiple lines, e.g.
+
+AddType application/foo foo
+AddType application/foo bar
+
+To map .foo
and .bar
to application/foo
++ + + +
If you follow the NCSA guidelines for setting up access restrictions
+ based on client domain, you may well have added entries for,
+ AuthType, AuthName, AuthUserFile
or AuthGroupFile
.
+ None of these are needed (or appropriate) for restricting access
+ based on client domain.
+
+
When Apache sees AuthType
it (reasonably) assumes you
+ are using some authorization type based on username and password.
+
+
Please remove AuthType
, it's unnecessary even for NCSA.
+
+
+ +
AuthUserFile
requires a full pathname. In earlier
+ versions of NCSA httpd and Apache, you could use a filename
+ relative to the .htaccess file. This could be a major security hole,
+ as it made it trivially easy to make a ".htpass" file in the a
+ directory easily accessable by the world. We recommend you store
+ your passwords outside your document tree.
+
+ + +
OldScriptAlias
is no longer supported.
+
+ + +
exec cgi=""
produces reasonable malformed header
+ responses when used to invoke non-CGI scripts.exec cmd=""
instead.
+ We might add virtual
support to exec cmd
to
+ make up for this difference.
+
+
+ +
+ +
+ +
+ +
.asis
files: Apache 0.6.5 did not require a Status header;
+it added one automatically if the .asis file contained a Location header.
+0.8.14 requires a Status header. + +
Some hints and tips on security issues in setting up a web server. Some of +the suggestions will be general, other, specific to Apache + +
Server side includes (SSI) can be configured so that users can execute +arbitrary programs on the server. That thought alone should send a shiver +down the spine of any sys-admin.
+ +One solution is to disable that part of SSI. To do that you use the +IncludesNOEXEC option to the Options +directive.
+ +
Allowing users to execute CGI scripts in any directory should only +be considered if; +
+
Limiting CGI to special directories gives the admin control over +what goes into those directories. This is inevitably more secure than +non script aliased CGI, but only if users with write access to the +directories are trusted or the admin is willing to test each new CGI +script/program for potential security holes.
+ +Most sites choose this option over the non script aliased CGI approach.
+ +
Always remember that you must trust the writers of the CGI script/programs +or your ability to spot potential security holes in CGI, whether they were +deliberate or accidental.
+ +All the CGI scripts will run as the same user, so they have potential to +conflict (accidentally or deliberately) with other scripts e.g. User A hates +User B, so he writes a script to trash User B's CGI database.
+ +
+
To run a really tight ship, you'll want to stop users from setting
+up .htaccess
files which can override security features
+you've configured. Here's one way to do it...
+ +In the server configuration file, put +
+<Directory>
+AllowOverride None
+Options None
+<Limit GET PUT POST>
+allow from all
+</Limit>
+</Directory>
+
+
+Then setup for specific directories+ +This stops all overrides, Includes and accesses in all directories apart +from those named.
+ +Edit the following two files: +
/usr/include/sys/socket.h
+ /usr/src/sys/sys/socket.h
+In each file, look for the following:
++ /* + * Maximum queue length specifiable by listen. + */ + #define SOMAXCONN 5 ++ +Just change the "5" to whatever appears to work. I bumped the two +machines I was having problems with up to 32 and haven't noticed the +problem since. + +
+ +After the edit, recompile the kernel and recompile the Apache server +then reboot. + +
+ +FreeBSD 2.1 seems to be perfectly happy, with SOMAXCONN +set to 32 already. + +
+
+
+Addendum for very heavily loaded BSD servers
+
+from Chuck Murcko <chuck@telebase.com>
+
+
+ +If you're running a really busy BSD Apache server, the following are useful +things to do if the system is acting sluggish:
+ +
+ +
+maxusers 256 ++ +Maxusers drives a lot of other kernel parameters: + +
+# Network options. NMBCLUSTERS defines the number of mbuf clusters and +# defaults to 256. This machine is a server that handles lots of traffic, +# so we crank that value. +options SOMAXCONN=256 # max pending connects +options NMBCLUSTERS=4096 # mbuf clusters at 4096 + +# +# Misc. options +# +options CHILD_MAX=512 # maximum number of child processes +options OPEN_MAX=512 # maximum fds (breaks RPC svcs) ++ +SOMAXCONN is not derived from maxusers, so you'll always need to increase +that yourself. We used a value guaranteed to be larger than Apache's +default for the listen() of 128, currently. + +
+ +In many cases, NMBCLUSTERS must be set much larger than would appear +necessary at first glance. The reason for this is that if the browser +disconnects in mid-transfer, the socket fd associated with that particular +connection ends up in the TIME_WAIT state for several minutes, during +which time its mbufs are not yet freed. + +
+ +Some more info on mbuf clusters (from sys/mbuf.h): +
+/* + * Mbufs are of a single size, MSIZE (machine/machparam.h), which + * includes overhead. An mbuf may add a single "mbuf cluster" of size + * MCLBYTES (also in machine/machparam.h), which has no additional overhead + * and is used instead of the internal data area; this is done when + * at least MINCLSIZE of data must be stored. + */ ++ +
+ +CHILD_MAX and OPEN_MAX are set to allow up to 512 child processes (different +than the maximum value for processes per user ID) and file descriptors. +These values may change for your particular configuration (a higher OPEN_MAX +value if you've got modules or CGI scripts opening lots of connections or +files). If you've got a lot of other activity besides httpd on the same +machine, you'll have to set NPROC higher still. In this example, the NPROC +value derived from maxusers proved sufficient for our load. + +
+ +Caveats + +
+ +Be aware that your system may not boot with a kernel that is configured +to use more resources than you have available system RAM. ALWAYS +have a known bootable kernel available when tuning your system this way, +and use the system tools beforehand to learn if you need to buy more +memory before tuning. + +
+ +RPC services will fail when the value of OPEN_MAX is larger than 256. +This is a function of the original implementations of the RPC library, +which used a byte value for holding file descriptors. BSDI has partially +addressed this limit in its 2.1 release, but a real fix may well await +the redesign of RPC itself. + +
+ +Finally, there's the hard limit of child processes configured in Apache. + +
+ +For versions of Apache later than 1.0.5 you'll need to change the +definition for HARD_SERVER_LIMIT in httpd.h and recompile +if you need to run more than the default 150 instances of httpd. + +
+ +From conf/httpd.conf-dist: + +
+# Limit on total number of servers running, i.e., limit on the number +# of clients who can simultaneously connect --- if this limit is ever +# reached, clients will be LOCKED OUT, so it should NOT BE SET TOO LOW. +# It is intended mainly as a brake to keep a runaway server from taking +# Unix with it as it spirals down... + +MaxClients 150 ++ +Know what you're doing if you bump this value up, and make sure you've +done your system monitoring, RAM expansion, and kernel tuning beforehand. +Then you're ready to service some serious hits! + +
+ +Thanks to Tony Sanders and Chris Torek at BSDI for their +helpful suggestions and information. + +
+
+
+
+
diff --git a/docs/manual/platform/perf-dec.html b/docs/manual/platform/perf-dec.html
new file mode 100644
index 0000000000..cd027bfc60
--- /dev/null
+++ b/docs/manual/platform/perf-dec.html
@@ -0,0 +1,267 @@
+
+ Patch ID OSF350-195 for V3.2C+ Patch IDs for V3.2E and V3.2F should be available soon. + There is no known reason why the Patch ID OSF360-350195 + won't work on these releases, but such use is not officially + supported by Digital. This patch kit will not be needed for + V3.2G when it is released. + + +
+ Patch ID OSF360-350195 for V3.2D +
+From mogul@pa.dec.com (Jeffrey Mogul) +Organization DEC Western Research +Date 30 May 1996 00:50:25 GMT +Newsgroups comp.unix.osf.osf1 +Message-ID <4oirch$bc8@usenet.pa.dec.com> +Subject Re: Web Site Performance +References 1 + + + +In article <skoogDs54BH.9pF@netcom.com> skoog@netcom.com (Jim Skoog) writes: +>Where are the performance bottlenecks for Alpha AXP running the +>Netscape Commerce Server 1.12 with high volume internet traffic? +>We are evaluating network performance for a variety of Alpha AXP +>runing DEC UNIX 3.2C, which run DEC's seal firewall and behind +>that Alpha 1000 and 2100 webservers. + +Our experience (running such Web servers as altavista.digital.com +and www.digital.com) is that there is one important kernel tuning +knob to adjust in order to get good performance on V3.2C. You +need to patch the kernel global variable "somaxconn" (use dbx -k +to do this) from its default value of 8 to something much larger. + +How much larger? Well, no larger than 32767 (decimal). And +probably no less than about 2048, if you have a really high volume +(millions of hits per day), like AltaVista does. + +This change allows the system to maintain more than 8 TCP +connections in the SYN_RCVD state for the HTTP server. (You +can use "netstat -An |grep SYN_RCVD" to see how many such +connections exist at any given instant). + +If you don't make this change, you might find that as the load gets +high, some connection attempts take a very long time. And if a lot +of your clients disconnect from the Internet during the process of +TCP connection establishment (this happens a lot with dialup +users), these "embryonic" connections might tie up your somaxconn +quota of SYN_RCVD-state connections. Until the kernel times out +these embryonic connections, no other connections will be accepted, +and it will appear as if the server has died. + +The default value for somaxconn in Digital UNIX V4.0 will be quite +a bit larger than it has been in previous versions (we inherited +this default from 4.3BSD). + +Digital UNIX V4.0 includes some other performance-related changes +that significantly improve its maximum HTTP connection rate. However, +we've been using V3.2C systems to front-end for altavista.digital.com +with no obvious performance bottlenecks at the millions-of-hits-per-day +level. + +We have some Webstone performance results available at + http://www.digital.com/info/alphaserver/news/webff.html +I'm not sure if these were done using V4.0 or an earlier version +of Digital UNIX, although I suspect they were done using a test +version of V4.0. + +-Jeff + +diff --git a/docs/manual/platform/perf.html b/docs/manual/platform/perf.html new file mode 100644 index 0000000000..d2a88e23b3 --- /dev/null +++ b/docs/manual/platform/perf.html @@ -0,0 +1,134 @@ + + + +
+ +---------------------------------------------------------------------------- + +From mogul@pa.dec.com (Jeffrey Mogul) +Organization DEC Western Research +Date 31 May 1996 21:01:01 GMT +Newsgroups comp.unix.osf.osf1 +Message-ID <4onmmd$mmd@usenet.pa.dec.com> +Subject Digital UNIX V3.2C Internet tuning patch info + +---------------------------------------------------------------------------- + +Something that probably few people are aware of is that Digital +has a patch kit available for Digital UNIX V3.2C that may improve +Internet performance, especially for busy web servers. + +This patch kit is one way to increase the value of somaxconn, +which I discussed in a message here a day or two ago. + +I've included in this message the revised README file for this +patch kit below. Note that the original README file in the patch +kit itself may be an earlier version; I'm told that the version +below is the right one. + +Sorry, this patch kit is NOT available for other versions of Digital +UNIX. Most (but not quite all) of these changes also made it into V4.0, +so the description of the various tuning parameters in this README +file might be useful to people running V4.0 systems. + +This patch kit does not appear to be available (yet?) from + http://www.service.digital.com/html/patch_service.html +so I guess you'll have to call Digital's Customer Support to get it. + +-Jeff + +DESCRIPTION: Digital UNIX Network tuning patch + + Patch ID: OSF350-146 + + SUPERSEDED PATCHES: OSF350-151, OSF350-158 + + This set of files improves the performance of the network + subsystem on a system being used as a web server. There are + additional tunable parameters included here, to be used + cautiously by an informed system administrator. + +TUNING + + To tune the web server, the number of simultaneous socket + connection requests are limited by: + + somaxconn Sets the maximum number of pending requests + allowed to wait on a listening socket. The + default value in Digital UNIX V3.2 is 8. + This patch kit increases the default to 1024, + which matches the value in Digital UNIX V4.0. + + sominconn Sets the minimum number of pending connections + allowed on a listening socket. When a user + process calls listen with a backlog less + than sominconn, the backlog will be set to + sominconn. sominconn overrides somaxconn. + The default value is 1. + + The effectiveness of tuning these parameters can be monitored by + the sobacklog variables available in the kernel: + + sobacklog_hiwat Tracks the maximum pending requests to any + socket. The initial value is 0. + + sobacklog_drops Tracks the number of drops exceeding the + socket set backlog limit. The initial + value is 0. + + somaxconn_drops Tracks the number of drops exceeding the + somaxconn limit. When sominconn is larger + than somaxconn, tracks the number of drops + exceeding sominconn. The initial value is 0. + + TCP timer parameters also affect performance. Tuning the following + require some knowledge of the characteristics of the network. + + tcp_msl Sets the tcp maximum segment lifetime. + This is the maximum lifetime in half + seconds that a packet can be in transit + on the network. This value, when doubled, + is the length of time a connection remains + in the TIME_WAIT state after a incoming + close request is processed. The unit is + specified in 1/2 seconds, the initial + value is 60. + + tcp_rexmit_interval_min + Sets the minimum TCP retransmit interval. + For some WAN networks the default value may + be too short, causing unnecessary duplicate + packets to be sent. The unit is specified + in 1/2 seconds, the initial value is 1. + + tcp_keepinit This is the amount of time a partially + established connection will sit on the listen + queue before timing out (e.g. if a client + sends a SYN but never answers our SYN/ACK). + Partially established connections tie up slots + on the listen queue. If the queue starts to + fill with connections in SYN_RCVD state, + tcp_keepinit can be decreased to make those + partial connects time out sooner. This should + be used with caution, since there might be + legitimate clients that are taking a while + to respond to SYN/ACK. The unit is specified + in 1/2 seconds, the default value is 150 + (ie. 75 seconds). + + The hashlist size for the TCP inpcb lookup table is regulated by: + + tcbhashsize The number of hash buckets used for the + TCP connection table used in the kernel. + The initial value is 32. For best results, + should be specified as a power of 2. For + busy Web servers, set this to 2048 or more. + + The hashlist size for the interface alias table is regulated by: + + inifaddr_hsize The number of hash buckets used for the + interface alias table used in the kernel. + The initial value is 32. For best results, + should be specified as a power of 2. + + ipport_userreserved The maximum number of concurrent non-reserved, + dynamically allocated ports. Default range + is 1025-5000. The maximum value is 65535. + This limits the numer of times you can + simultaneously telnet or ftp out to connect + to other systems. + + tcpnodelack Don't delay acknowledging TCP data; this + can sometimes improve performance of locally + run CAD packages. Default is value is 0, + the enabled value is 1. + + Digital UNIX version: + + V3.2C +Feature V3.2C patch V4.0 + ======= ===== ===== ==== +somaxconn X X X +sominconn - X X +sobacklog_hiwat - X - +sobacklog_drops - X - +somaxconn_drops - X - +tcpnodelack X X X +tcp_keepidle X X X +tcp_keepintvl X X X +tcp_keepcnt - X X +tcp_keepinit - X X +TCP keepalive per-socket - - X +tcp_msl - X - +tcp_rexmit_interval_min - X - +TCP inpcb hashing - X X +tcbhashsize - X X +interface alias hashing - X X +inifaddr_hsize - X X +ipport_userreserved - X - +sysconfig -q inet - - X +sysconfig -q socket - - X + +