From 65692a13c06e07a4b5293993e8f865d041000909 Mon Sep 17 00:00:00 2001 From: brian Date: Thu, 21 Nov 1996 09:17:50 +0000 Subject: Realized "misc" was better than "info" to describe this subdir. Yeah, I know, I apologize. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@77008 13f79535-47bb-0310-9956-ffa450edef68 --- docs/manual/developer/API.html | 988 +++++++++++++++++++++++++++++++++ docs/manual/misc/API.html | 988 +++++++++++++++++++++++++++++++++ docs/manual/misc/FAQ.html | 162 ++++++ docs/manual/misc/client_block_api.html | 70 +++ docs/manual/misc/compat_notes.html | 108 ++++ docs/manual/misc/security_tips.html | 92 +++ docs/manual/platform/perf-bsd44.html | 215 +++++++ docs/manual/platform/perf-dec.html | 267 +++++++++ docs/manual/platform/perf.html | 134 +++++ 9 files changed, 3024 insertions(+) create mode 100644 docs/manual/developer/API.html create mode 100644 docs/manual/misc/API.html create mode 100644 docs/manual/misc/FAQ.html create mode 100644 docs/manual/misc/client_block_api.html create mode 100644 docs/manual/misc/compat_notes.html create mode 100644 docs/manual/misc/security_tips.html create mode 100644 docs/manual/platform/perf-bsd44.html create mode 100644 docs/manual/platform/perf-dec.html create mode 100644 docs/manual/platform/perf.html diff --git a/docs/manual/developer/API.html b/docs/manual/developer/API.html new file mode 100644 index 0000000000..f860996e47 --- /dev/null +++ b/docs/manual/developer/API.html @@ -0,0 +1,988 @@ + + +Apache API notes + + + +

Apache API notes

+ +These are some notes on the Apache API and the data structures you +have to deal with, etc. They are not yet nearly complete, but +hopefully, they will help you get your bearings. Keep in mind that +the API is still subject to change as we gain experience with it. +(See the TODO file for what might be coming). However, +it will be easy to adapt modules to any changes that are made. +(We have more modules to adapt than you do). +

+ +A few notes on general pedagogical style here. In the interest of +conciseness, all structure declarations here are incomplete --- the +real ones have more slots that I'm not telling you about. For the +most part, these are reserved to one component of the server core or +another, and should be altered by modules with caution. However, in +some cases, they really are things I just haven't gotten around to +yet. Welcome to the bleeding edge.

+ +Finally, here's an outline, to give you some bare idea of what's +coming up, and in what order: + +

+ +

Basic concepts.

+ +We begin with an overview of the basic concepts behind the +API, and how they are manifested in the code. + +

Handlers, Modules, and Requests

+ +Apache breaks down request handling into a series of steps, more or +less the same way the Netscape server API does (although this API has +a few more stages than NetSite does, as hooks for stuff I thought +might be useful in the future). These are: + + + +These phases are handled by looking at each of a succession of +modules, looking to see if each of them has a handler for the +phase, and attempting invoking it if so. The handler can typically do +one of three things: + + + +Most phases are terminated by the first module that handles them; +however, for logging, `fixups', and non-access authentication +checking, all handlers always run (barring an error). Also, the +response phase is unique in that modules may declare multiple handlers +for it, via a dispatch table keyed on the MIME type of the requested +object. Modules may declare a response-phase handler which can handle +any request, by giving it the key */* (i.e., a +wildcard MIME type specification). However, wildcard handlers are +only invoked if the server has already tried and failed to find a more +specific response handler for the MIME type of the requested object +(either none existed, or they all declined).

+ +The handlers themselves are functions of one argument (a +request_rec structure. vide infra), which returns an +integer, as above.

+ +

A brief tour of a module

+ +At this point, we need to explain the structure of a module. Our +candidate will be one of the messier ones, the CGI module --- this +handles both CGI scripts and the ScriptAlias config file +command. It's actually a great deal more complicated than most +modules, but if we're going to have only one example, it might as well +be the one with its fingers in every place.

+ +Let's begin with handlers. In order to handle the CGI scripts, the +module declares a response handler for them. Because of +ScriptAlias, it also has handlers for the name +translation phase (to recognise ScriptAliased URIs), the +type-checking phase (any ScriptAliased request is typed +as a CGI script).

+ +The module needs to maintain some per (virtual) +server information, namely, the ScriptAliases in effect; +the module structure therefore contains pointers to a functions which +builds these structures, and to another which combines two of them (in +case the main server and a virtual server both have +ScriptAliases declared).

+ +Finally, this module contains code to handle the +ScriptAlias command itself. This particular module only +declares one command, but there could be more, so modules have +command tables which declare their commands, and describe +where they are permitted, and how they are to be invoked.

+ +A final note on the declared types of the arguments of some of these +commands: a pool is a pointer to a resource pool +structure; these are used by the server to keep track of the memory +which has been allocated, files opened, etc., either to service a +particular request, or to handle the process of configuring itself. +That way, when the request is over (or, for the configuration pool, +when the server is restarting), the memory can be freed, and the files +closed, en masse, without anyone having to write explicit code to +track them all down and dispose of them. Also, a +cmd_parms structure contains various information about +the config file being read, and other status information, which is +sometimes of use to the function which processes a config-file command +(such as ScriptAlias). + +With no further ado, the module itself: + +

+/* Declarations of handlers. */
+
+int translate_scriptalias (request_rec *);
+int type_scriptalias (request_rec *);
+int cgi_handler (request_rec *);
+
+/* Subsidiary dispatch table for response-phase handlers, by MIME type */
+
+handler_rec cgi_handlers[] = {
+{ "application/x-httpd-cgi", cgi_handler },
+{ NULL }
+};
+
+/* Declarations of routines to manipulate the module's configuration
+ * info.  Note that these are returned, and passed in, as void *'s;
+ * the server core keeps track of them, but it doesn't, and can't,
+ * know their internal structure.
+ */
+
+void *make_cgi_server_config (pool *);
+void *merge_cgi_server_config (pool *, void *, void *);
+
+/* Declarations of routines to handle config-file commands */
+
+extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake,
+                          char *real);
+
+command_rec cgi_cmds[] = {
+{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2,
+    "a fakename and a realname"},
+{ NULL }
+};
+
+module cgi_module = {
+   STANDARD_MODULE_STUFF,
+   NULL,                     /* initializer */
+   NULL,                     /* dir config creator */
+   NULL,                     /* dir merger --- default is to override */
+   make_cgi_server_config,   /* server config */
+   merge_cgi_server_config,  /* merge server config */
+   cgi_cmds,                 /* command table */
+   cgi_handlers,             /* handlers */
+   translate_scriptalias,    /* filename translation */
+   NULL,                     /* check_user_id */
+   NULL,                     /* check auth */
+   NULL,                     /* check access */
+   type_scriptalias,         /* type_checker */
+   NULL,                     /* fixups */
+   NULL                      /* logger */
+};
+
+ +

How handlers work

+ +The sole argument to handlers is a request_rec structure. +This structure describes a particular request which has been made to +the server, on behalf of a client. In most cases, each connection to +the client generates only one request_rec structure.

+ +

A brief tour of the request_rec

+ +The request_rec contains pointers to a resource pool +which will be cleared when the server is finished handling the +request; to structures containing per-server and per-connection +information, and most importantly, information on the request itself.

+ +The most important such information is a small set of character +strings describing attributes of the object being requested, including +its URI, filename, content-type and content-encoding (these being filled +in by the translation and type-check handlers which handle the +request, respectively).

+ +Other commonly used data items are tables giving the MIME headers on +the client's original request, MIME headers to be sent back with the +response (which modules can add to at will), and environment variables +for any subprocesses which are spawned off in the course of servicing +the request. These tables are manipulated using the +table_get and table_set routines.

+ +Finally, there are pointers to two data structures which, in turn, +point to per-module configuration structures. Specifically, these +hold pointers to the data structures which the module has built to +describe the way it has been configured to operate in a given +directory (via .htaccess files or +<Directory> sections), for private data it has +built in the course of servicing the request (so modules' handlers for +one phase can pass `notes' to their handlers for other phases). There +is another such configuration vector in the server_rec +data structure pointed to by the request_rec, which +contains per (virtual) server configuration data.

+ +Here is an abridged declaration, giving the fields most commonly used:

+ +

+struct request_rec {
+
+  pool *pool;
+  conn_rec *connection;
+  server_rec *server;
+
+  /* What object is being requested */
+  
+  char *uri;
+  char *filename;
+  char *path_info;
+  char *args;           /* QUERY_ARGS, if any */
+  struct stat finfo;    /* Set by server core;
+                         * st_mode set to zero if no such file */
+  
+  char *content_type;
+  char *content_encoding;
+  
+  /* MIME header environments, in and out.  Also, an array containing
+   * environment variables to be passed to subprocesses, so people can
+   * write modules to add to that environment.
+   *
+   * The difference between headers_out and err_headers_out is that
+   * the latter are printed even on error, and persist across internal
+   * redirects (so the headers printed for ErrorDocument handlers will
+   * have them).
+   */
+  
+  table *headers_in;
+  table *headers_out;
+  table *err_headers_out;
+  table *subprocess_env;
+
+  /* Info about the request itself... */
+  
+  int header_only;     /* HEAD request, as opposed to GET */
+  char *protocol;      /* Protocol, as given to us, or HTTP/0.9 */
+  char *method;        /* GET, HEAD, POST, etc. */
+  int method_number;   /* M_GET, M_POST, etc. */
+
+  /* Info for logging */
+
+  char *the_request;
+  int bytes_sent;
+
+  /* A flag which modules can set, to indicate that the data being
+   * returned is volatile, and clients should be told not to cache it.
+   */
+
+  int no_cache;
+
+  /* Various other config info which may change with .htaccess files
+   * These are config vectors, with one void* pointer for each module
+   * (the thing pointed to being the module's business).
+   */
+  
+  void *per_dir_config;   /* Options set in config files, etc. */
+  void *request_config;   /* Notes on *this* request */
+  
+};
+
+
+ +

Where request_rec structures come from

+ +Most request_rec structures are built by reading an HTTP +request from a client, and filling in the fields. However, there are +a few exceptions: + + + +

Handling requests, declining, and returning error codes

+ +As discussed above, each handler, when invoked to handle a particular +request_rec, has to return an int to +indicate what happened. That can either be + + + +Note that if the error code returned is REDIRECT, then +the module should put a Location in the request's +headers_out, to indicate where the client should be +redirected to.

+ +

Special considerations for response handlers

+ +Handlers for most phases do their work by simply setting a few fields +in the request_rec structure (or, in the case of access +checkers, simply by returning the correct error code). However, +response handlers have to actually send a request back to the client.

+ +They should begin by sending an HTTP response header, using the +function send_http_header. (You don't have to do +anything special to skip sending the header for HTTP/0.9 requests; the +function figures out on its own that it shouldn't do anything). If +the request is marked header_only, that's all they should +do; they should return after that, without attempting any further +output.

+ +Otherwise, they should produce a request body which responds to the +client as appropriate. The primitives for this are rputc +and rprintf, for internally generated output, and +send_fd, to copy the contents of some FILE * +straight to the client.

+ +At this point, you should more or less understand the following piece +of code, which is the handler which handles GET requests +which have no more specific handler; it also shows how conditional +GETs can be handled, if it's desirable to do so in a +particular response handler --- set_last_modified checks +against the If-modified-since value supplied by the +client, if any, and returns an appropriate code (which will, if +nonzero, be USE_LOCAL_COPY). No similar considerations apply for +set_content_length, but it returns an error code for +symmetry.

+ +

+int default_handler (request_rec *r)
+{
+    int errstatus;
+    FILE *f;
+    
+    if (r->method_number != M_GET) return DECLINED;
+    if (r->finfo.st_mode == 0) return NOT_FOUND;
+
+    if ((errstatus = set_content_length (r, r->finfo.st_size))
+        || (errstatus = set_last_modified (r, r->finfo.st_mtime)))
+        return errstatus;
+    
+    f = fopen (r->filename, "r");
+
+    if (f == NULL) {
+        log_reason("file permissions deny server access",
+                   r->filename, r);
+        return FORBIDDEN;
+    }
+      
+    register_timeout ("send", r);
+    send_http_header (r);
+
+    if (!r->header_only) send_fd (f, r);
+    pfclose (r->pool, f);
+    return OK;
+}
+
+ +Finally, if all of this is too much of a challenge, there are a few +ways out of it. First off, as shown above, a response handler which +has not yet produced any output can simply return an error code, in +which case the server will automatically produce an error response. +Secondly, it can punt to some other handler by invoking +internal_redirect, which is how the internal redirection +machinery discussed above is invoked. A response handler which has +internally redirected should always return OK.

+ +(Invoking internal_redirect from handlers which are +not response handlers will lead to serious confusion). + +

Special considerations for authentication handlers

+ +Stuff that should be discussed here in detail: + + + +

Special considerations for logging handlers

+ +When a request has internally redirected, there is the question of +what to log. Apache handles this by bundling the entire chain of +redirects into a list of request_rec structures which are +threaded through the r->prev and r->next +pointers. The request_rec which is passed to the logging +handlers in such cases is the one which was originally built for the +initial request from the client; note that the bytes_sent field will +only be correct in the last request in the chain (the one for which a +response was actually sent). + +

Resource allocation and resource pools

+ +One of the problems of writing and designing a server-pool server is +that of preventing leakage, that is, allocating resources (memory, +open files, etc.), without subsequently releasing them. The resource +pool machinery is designed to make it easy to prevent this from +happening, by allowing resource to be allocated in such a way that +they are automatically released when the server is done with +them.

+ +The way this works is as follows: the memory which is allocated, file +opened, etc., to deal with a particular request are tied to a +resource pool which is allocated for the request. The pool +is a data structure which itself tracks the resources in question.

+ +When the request has been processed, the pool is cleared. At +that point, all the memory associated with it is released for reuse, +all files associated with it are closed, and any other clean-up +functions which are associated with the pool are run. When this is +over, we can be confident that all the resource tied to the pool have +been released, and that none of them have leaked.

+ +Server restarts, and allocation of memory and resources for per-server +configuration, are handled in a similar way. There is a +configuration pool, which keeps track of resources which were +allocated while reading the server configuration files, and handling +the commands therein (for instance, the memory that was allocated for +per-server module configuration, log files and other files that were +opened, and so forth). When the server restarts, and has to reread +the configuration files, the configuration pool is cleared, and so the +memory and file descriptors which were taken up by reading them the +last time are made available for reuse.

+ +It should be noted that use of the pool machinery isn't generally +obligatory, except for situations like logging handlers, where you +really need to register cleanups to make sure that the log file gets +closed when the server restarts (this is most easily done by using the +function pfopen, which also +arranges for the underlying file descriptor to be closed before any +child processes, such as for CGI scripts, are execed), or +in case you are using the timeout machinery (which isn't yet even +documented here). However, there are two benefits to using it: +resources allocated to a pool never leak (even if you allocate a +scratch string, and just forget about it); also, for memory +allocation, palloc is generally faster than +malloc.

+ +We begin here by describing how memory is allocated to pools, and then +discuss how other resources are tracked by the resource pool +machinery. + +

Allocation of memory in pools

+ +Memory is allocated to pools by calling the function +palloc, which takes two arguments, one being a pointer to +a resource pool structure, and the other being the amount of memory to +allocate (in chars). Within handlers for handling +requests, the most common way of getting a resource pool structure is +by looking at the pool slot of the relevant +request_rec; hence the repeated appearance of the +following idiom in module code: + +
+int my_handler(request_rec *r)
+{
+    struct my_structure *foo;
+    ...
+
+    foo = (foo *)palloc (r->pool, sizeof(my_structure));
+}
+
+ +Note that there is no pfree --- +palloced memory is freed only when the associated +resource pool is cleared. This means that palloc does not +have to do as much accounting as malloc(); all it does in +the typical case is to round up the size, bump a pointer, and do a +range check.

+ +(It also raises the possibility that heavy use of palloc +could cause a server process to grow excessively large. There are +two ways to deal with this, which are dealt with below; briefly, you +can use malloc, and try to be sure that all of the memory +gets explicitly freed, or you can allocate a sub-pool of +the main pool, allocate your memory in the sub-pool, and clear it out +periodically. The latter technique is discussed in the section on +sub-pools below, and is used in the directory-indexing code, in order +to avoid excessive storage allocation when listing directories with +thousands of files). + +

Allocating initialized memory

+ +There are functions which allocate initialized memory, and are +frequently useful. The function pcalloc has the same +interface as palloc, but clears out the memory it +allocates before it returns it. The function pstrdup +takes a resource pool and a char * as arguments, and +allocates memory for a copy of the string the pointer points to, +returning a pointer to the copy. Finally pstrcat is a +varargs-style function, which takes a pointer to a resource pool, and +at least two char * arguments, the last of which must be +NULL. It allocates enough memory to fit copies of each +of the strings, as a unit; for instance: + +
+     pstrcat (r->pool, "foo", "/", "bar", NULL);
+
+ +returns a pointer to 8 bytes worth of memory, initialized to +"foo/bar". + +

Tracking open files, etc.

+ +As indicated above, resource pools are also used to track other sorts +of resources besides memory. The most common are open files. The +routine which is typically used for this is pfopen, which +takes a resource pool and two strings as arguments; the strings are +the same as the typical arguments to fopen, e.g., + +
+     ...
+     FILE *f = pfopen (r->pool, r->filename, "r");
+
+     if (f == NULL) { ... } else { ... }
+
+ +There is also a popenf routine, which parallels the +lower-level open system call. Both of these routines +arrange for the file to be closed when the resource pool in question +is cleared.

+ +Unlike the case for memory, there are functions to close +files allocated with pfopen, and popenf, +namely pfclose and pclosef. (This is +because, on many systems, the number of files which a single process +can have open is quite limited). It is important to use these +functions to close files allocated with pfopen and +popenf, since to do otherwise could cause fatal errors on +systems such as Linux, which react badly if the same +FILE* is closed more than once.

+ +(Using the close functions is not mandatory, since the +file will eventually be closed regardless, but you should consider it +in cases where your module is opening, or could open, a lot of files). + +

Other sorts of resources --- cleanup functions

+ +More text goes here. Describe the the cleanup primitives in terms of +which the file stuff is implemented; also, spawn_process. + +

Fine control --- creating and dealing with sub-pools, with a note +on sub-requests

+ +On rare occasions, too-free use of palloc() and the +associated primitives may result in undesirably profligate resource +allocation. You can deal with such a case by creating a +sub-pool, allocating within the sub-pool rather than the main +pool, and clearing or destroying the sub-pool, which releases the +resources which were associated with it. (This really is a +rare situation; the only case in which it comes up in the standard +module set is in case of listing directories, and then only with +very large directories. Unnecessary use of the primitives +discussed here can hair up your code quite a bit, with very little +gain).

+ +The primitive for creating a sub-pool is make_sub_pool, +which takes another pool (the parent pool) as an argument. When the +main pool is cleared, the sub-pool will be destroyed. The sub-pool +may also be cleared or destroyed at any time, by calling the functions +clear_pool and destroy_pool, respectively. +(The difference is that clear_pool frees resources +associated with the pool, while destroy_pool also +deallocates the pool itself. In the former case, you can allocate new +resources within the pool, and clear it again, and so forth; in the +latter case, it is simply gone).

+ +One final note --- sub-requests have their own resource pools, which +are sub-pools of the resource pool for the main request. The polite +way to reclaim the resources associated with a sub request which you +have allocated (using the sub_req_lookup_... functions) +is destroy_sub_request, which frees the resource pool. +Before calling this function, be sure to copy anything that you care +about which might be allocated in the sub-request's resource pool into +someplace a little less volatile (for instance, the filename in its +request_rec structure).

+ +(Again, under most circumstances, you shouldn't feel obliged to call +this function; only 2K of memory or so are allocated for a typical sub +request, and it will be freed anyway when the main request pool is +cleared. It is only when you are allocating many, many sub-requests +for a single main request that you should seriously consider the +destroy... functions). + +

Configuration, commands and the like

+ +One of the design goals for this server was to maintain external +compatibility with the NCSA 1.3 server --- that is, to read the same +configuration files, to process all the directives therein correctly, +and in general to be a drop-in replacement for NCSA. On the other +hand, another design goal was to move as much of the server's +functionality into modules which have as little as possible to do with +the monolithic server core. The only way to reconcile these goals is +to move the handling of most commands from the central server into the +modules.

+ +However, just giving the modules command tables is not enough to +divorce them completely from the server core. The server has to +remember the commands in order to act on them later. That involves +maintaining data which is private to the modules, and which can be +either per-server, or per-directory. Most things are per-directory, +including in particular access control and authorization information, +but also information on how to determine file types from suffixes, +which can be modified by AddType and +DefaultType directives, and so forth. In general, the +governing philosophy is that anything which can be made +configurable by directory should be; per-server information is +generally used in the standard set of modules for information like +Aliases and Redirects which come into play +before the request is tied to a particular place in the underlying +file system.

+ +Another requirement for emulating the NCSA server is being able to +handle the per-directory configuration files, generally called +.htaccess files, though even in the NCSA server they can +contain directives which have nothing at all to do with access +control. Accordingly, after URI -> filename translation, but before +performing any other phase, the server walks down the directory +hierarchy of the underlying filesystem, following the translated +pathname, to read any .htaccess files which might be +present. The information which is read in then has to be +merged with the applicable information from the server's own +config files (either from the <Directory> sections +in access.conf, or from defaults in +srm.conf, which actually behaves for most purposes almost +exactly like <Directory />).

+ +Finally, after having served a request which involved reading +.htaccess files, we need to discard the storage allocated +for handling them. That is solved the same way it is solved wherever +else similar problems come up, by tying those structures to the +per-transaction resource pool.

+ +

Per-directory configuration structures

+ +Let's look out how all of this plays out in mod_mime.c, +which defines the file typing handler which emulates the NCSA server's +behavior of determining file types from suffixes. What we'll be +looking at, here, is the code which implements the +AddType and AddEncoding commands. These +commands can appear in .htaccess files, so they must be +handled in the module's private per-directory data, which in fact, +consists of two separate tables for MIME types and +encoding information, and is declared as follows: + +
+typedef struct {
+    table *forced_types;      /* Additional AddTyped stuff */
+    table *encoding_types;    /* Added with AddEncoding... */
+} mime_dir_config;
+
+ +When the server is reading a configuration file, or +<Directory> section, which includes one of the MIME +module's commands, it needs to create a mime_dir_config +structure, so those commands have something to act on. It does this +by invoking the function it finds in the module's `create per-dir +config slot', with two arguments: the name of the directory to which +this configuration information applies (or NULL for +srm.conf), and a pointer to a resource pool in which the +allocation should happen.

+ +(If we are reading a .htaccess file, that resource pool +is the per-request resource pool for the request; otherwise it is a +resource pool which is used for configuration data, and cleared on +restarts. Either way, it is important for the structure being created +to vanish when the pool is cleared, by registering a cleanup on the +pool if necessary).

+ +For the MIME module, the per-dir config creation function just +pallocs the structure above, and a creates a couple of +tables to fill it. That looks like this: + +

+void *create_mime_dir_config (pool *p, char *dummy)
+{
+    mime_dir_config *new =
+      (mime_dir_config *) palloc (p, sizeof(mime_dir_config));
+
+    new->forced_types = make_table (p, 4);
+    new->encoding_types = make_table (p, 4);
+    
+    return new;
+}
+
+ +Now, suppose we've just read in a .htaccess file. We +already have the per-directory configuration structure for the next +directory up in the hierarchy. If the .htaccess file we +just read in didn't have any AddType or +AddEncoding commands, its per-directory config structure +for the MIME module is still valid, and we can just use it. +Otherwise, we need to merge the two structures somehow.

+ +To do that, the server invokes the module's per-directory config merge +function, if one is present. That function takes three arguments: +the two structures being merged, and a resource pool in which to +allocate the result. For the MIME module, all that needs to be done +is overlay the tables from the new per-directory config structure with +those from the parent: + +

+void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv)
+{
+    mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv;
+    mime_dir_config *subdir = (mime_dir_config *)subdirv;
+    mime_dir_config *new =
+      (mime_dir_config *)palloc (p, sizeof(mime_dir_config));
+
+    new->forced_types = overlay_tables (p, subdir->forced_types,
+                                        parent_dir->forced_types);
+    new->encoding_types = overlay_tables (p, subdir->encoding_types,
+                                          parent_dir->encoding_types);
+
+    return new;
+}
+
+ +As a note --- if there is no per-directory merge function present, the +server will just use the subdirectory's configuration info, and ignore +the parent's. For some modules, that works just fine (e.g., for the +includes module, whose per-directory configuration information +consists solely of the state of the XBITHACK), and for +those modules, you can just not declare one, and leave the +corresponding structure slot in the module itself NULL.

+ +

Command handling

+ +Now that we have these structures, we need to be able to figure out +how to fill them. That involves processing the actual +AddType and AddEncoding commands. To find +commands, the server looks in the module's command table. +That table contains information on how many arguments the commands +take, and in what formats, where it is permitted, and so forth. That +information is sufficient to allow the server to invoke most +command-handling functions with pre-parsed arguments. Without further +ado, let's look at the AddType command handler, which +looks like this (the AddEncoding command looks basically +the same, and won't be shown here): + +
+char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext)
+{
+    if (*ext == '.') ++ext;
+    table_set (m->forced_types, ext, ct);
+    return NULL;
+}
+
+ +This command handler is unusually simple. As you can see, it takes +four arguments, two of which are pre-parsed arguments, the third being +the per-directory configuration structure for the module in question, +and the fourth being a pointer to a cmd_parms structure. +That structure contains a bunch of arguments which are frequently of +use to some, but not all, commands, including a resource pool (from +which memory can be allocated, and to which cleanups should be tied), +and the (virtual) server being configured, from which the module's +per-server configuration data can be obtained if required.

+ +Another way in which this particular command handler is unusually +simple is that there are no error conditions which it can encounter. +If there were, it could return an error message instead of +NULL; this causes an error to be printed out on the +server's stderr, followed by a quick exit, if it is in +the main config files; for a .htaccess file, the syntax +error is logged in the server error log (along with an indication of +where it came from), and the request is bounced with a server error +response (HTTP error status, code 500).

+ +The MIME module's command table has entries for these commands, which +look like this: + +

+command_rec mime_cmds[] = {
+{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2, 
+    "a mime type followed by a file extension" },
+{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, 
+    "an encoding (e.g., gzip), followed by a file extension" },
+{ NULL }
+};
+
+ +The entries in these tables are: + + + +Finally, having set this all up, we have to use it. This is +ultimately done in the module's handlers, specifically for its +file-typing handler, which looks more or less like this; note that the +per-directory configuration structure is extracted from the +request_rec's per-directory configuration vector by using +the get_module_config function. + +
+int find_ct(request_rec *r)
+{
+    int i;
+    char *fn = pstrdup (r->pool, r->filename);
+    mime_dir_config *conf = (mime_dir_config *)
+             get_module_config(r->per_dir_config, &mime_module);
+    char *type;
+
+    if (S_ISDIR(r->finfo.st_mode)) {
+        r->content_type = DIR_MAGIC_TYPE;
+        return OK;
+    }
+    
+    if((i=rind(fn,'.')) < 0) return DECLINED;
+    ++i;
+
+    if ((type = table_get (conf->encoding_types, &fn[i])))
+    {
+        r->content_encoding = type;
+
+        /* go back to previous extension to try to use it as a type */
+
+        fn[i-1] = '\0';
+        if((i=rind(fn,'.')) < 0) return OK;
+        ++i;
+    }
+
+    if ((type = table_get (conf->forced_types, &fn[i])))
+    {
+        r->content_type = type;
+    }
+    
+    return OK;
+}
+
+
+ +

Side notes --- per-server configuration, virtual servers, etc.

+ +The basic ideas behind per-server module configuration are basically +the same as those for per-directory configuration; there is a creation +function and a merge function, the latter being invoked where a +virtual server has partially overridden the base server configuration, +and a combined structure must be computed. (As with per-directory +configuration, the default if no merge function is specified, and a +module is configured in some virtual server, is that the base +configuration is simply ignored).

+ +The only substantial difference is that when a command needs to +configure the per-server private module data, it needs to go to the +cmd_parms data to get at it. Here's an example, from the +alias module, which also indicates how a syntax error can be returned +(note that the per-directory configuration argument to the command +handler is declared as a dummy, since the module doesn't actually have +per-directory config data): + +

+char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url)
+{
+    server_rec *s = cmd->server;
+    alias_server_conf *conf = (alias_server_conf *)
+            get_module_config(s->module_config,&alias_module);
+    alias_entry *new = push_array (conf->redirects);
+
+    if (!is_url (url)) return "Redirect to non-URL";
+    
+    new->fake = f; new->real = url;
+    return NULL;
+}
+
+ + + diff --git a/docs/manual/misc/API.html b/docs/manual/misc/API.html new file mode 100644 index 0000000000..f860996e47 --- /dev/null +++ b/docs/manual/misc/API.html @@ -0,0 +1,988 @@ + + +Apache API notes + + + +

Apache API notes

+ +These are some notes on the Apache API and the data structures you +have to deal with, etc. They are not yet nearly complete, but +hopefully, they will help you get your bearings. Keep in mind that +the API is still subject to change as we gain experience with it. +(See the TODO file for what might be coming). However, +it will be easy to adapt modules to any changes that are made. +(We have more modules to adapt than you do). +

+ +A few notes on general pedagogical style here. In the interest of +conciseness, all structure declarations here are incomplete --- the +real ones have more slots that I'm not telling you about. For the +most part, these are reserved to one component of the server core or +another, and should be altered by modules with caution. However, in +some cases, they really are things I just haven't gotten around to +yet. Welcome to the bleeding edge.

+ +Finally, here's an outline, to give you some bare idea of what's +coming up, and in what order: + +

+ +

Basic concepts.

+ +We begin with an overview of the basic concepts behind the +API, and how they are manifested in the code. + +

Handlers, Modules, and Requests

+ +Apache breaks down request handling into a series of steps, more or +less the same way the Netscape server API does (although this API has +a few more stages than NetSite does, as hooks for stuff I thought +might be useful in the future). These are: + + + +These phases are handled by looking at each of a succession of +modules, looking to see if each of them has a handler for the +phase, and attempting invoking it if so. The handler can typically do +one of three things: + + + +Most phases are terminated by the first module that handles them; +however, for logging, `fixups', and non-access authentication +checking, all handlers always run (barring an error). Also, the +response phase is unique in that modules may declare multiple handlers +for it, via a dispatch table keyed on the MIME type of the requested +object. Modules may declare a response-phase handler which can handle +any request, by giving it the key */* (i.e., a +wildcard MIME type specification). However, wildcard handlers are +only invoked if the server has already tried and failed to find a more +specific response handler for the MIME type of the requested object +(either none existed, or they all declined).

+ +The handlers themselves are functions of one argument (a +request_rec structure. vide infra), which returns an +integer, as above.

+ +

A brief tour of a module

+ +At this point, we need to explain the structure of a module. Our +candidate will be one of the messier ones, the CGI module --- this +handles both CGI scripts and the ScriptAlias config file +command. It's actually a great deal more complicated than most +modules, but if we're going to have only one example, it might as well +be the one with its fingers in every place.

+ +Let's begin with handlers. In order to handle the CGI scripts, the +module declares a response handler for them. Because of +ScriptAlias, it also has handlers for the name +translation phase (to recognise ScriptAliased URIs), the +type-checking phase (any ScriptAliased request is typed +as a CGI script).

+ +The module needs to maintain some per (virtual) +server information, namely, the ScriptAliases in effect; +the module structure therefore contains pointers to a functions which +builds these structures, and to another which combines two of them (in +case the main server and a virtual server both have +ScriptAliases declared).

+ +Finally, this module contains code to handle the +ScriptAlias command itself. This particular module only +declares one command, but there could be more, so modules have +command tables which declare their commands, and describe +where they are permitted, and how they are to be invoked.

+ +A final note on the declared types of the arguments of some of these +commands: a pool is a pointer to a resource pool +structure; these are used by the server to keep track of the memory +which has been allocated, files opened, etc., either to service a +particular request, or to handle the process of configuring itself. +That way, when the request is over (or, for the configuration pool, +when the server is restarting), the memory can be freed, and the files +closed, en masse, without anyone having to write explicit code to +track them all down and dispose of them. Also, a +cmd_parms structure contains various information about +the config file being read, and other status information, which is +sometimes of use to the function which processes a config-file command +(such as ScriptAlias). + +With no further ado, the module itself: + +

+/* Declarations of handlers. */
+
+int translate_scriptalias (request_rec *);
+int type_scriptalias (request_rec *);
+int cgi_handler (request_rec *);
+
+/* Subsidiary dispatch table for response-phase handlers, by MIME type */
+
+handler_rec cgi_handlers[] = {
+{ "application/x-httpd-cgi", cgi_handler },
+{ NULL }
+};
+
+/* Declarations of routines to manipulate the module's configuration
+ * info.  Note that these are returned, and passed in, as void *'s;
+ * the server core keeps track of them, but it doesn't, and can't,
+ * know their internal structure.
+ */
+
+void *make_cgi_server_config (pool *);
+void *merge_cgi_server_config (pool *, void *, void *);
+
+/* Declarations of routines to handle config-file commands */
+
+extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake,
+                          char *real);
+
+command_rec cgi_cmds[] = {
+{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2,
+    "a fakename and a realname"},
+{ NULL }
+};
+
+module cgi_module = {
+   STANDARD_MODULE_STUFF,
+   NULL,                     /* initializer */
+   NULL,                     /* dir config creator */
+   NULL,                     /* dir merger --- default is to override */
+   make_cgi_server_config,   /* server config */
+   merge_cgi_server_config,  /* merge server config */
+   cgi_cmds,                 /* command table */
+   cgi_handlers,             /* handlers */
+   translate_scriptalias,    /* filename translation */
+   NULL,                     /* check_user_id */
+   NULL,                     /* check auth */
+   NULL,                     /* check access */
+   type_scriptalias,         /* type_checker */
+   NULL,                     /* fixups */
+   NULL                      /* logger */
+};
+
+ +

How handlers work

+ +The sole argument to handlers is a request_rec structure. +This structure describes a particular request which has been made to +the server, on behalf of a client. In most cases, each connection to +the client generates only one request_rec structure.

+ +

A brief tour of the request_rec

+ +The request_rec contains pointers to a resource pool +which will be cleared when the server is finished handling the +request; to structures containing per-server and per-connection +information, and most importantly, information on the request itself.

+ +The most important such information is a small set of character +strings describing attributes of the object being requested, including +its URI, filename, content-type and content-encoding (these being filled +in by the translation and type-check handlers which handle the +request, respectively).

+ +Other commonly used data items are tables giving the MIME headers on +the client's original request, MIME headers to be sent back with the +response (which modules can add to at will), and environment variables +for any subprocesses which are spawned off in the course of servicing +the request. These tables are manipulated using the +table_get and table_set routines.

+ +Finally, there are pointers to two data structures which, in turn, +point to per-module configuration structures. Specifically, these +hold pointers to the data structures which the module has built to +describe the way it has been configured to operate in a given +directory (via .htaccess files or +<Directory> sections), for private data it has +built in the course of servicing the request (so modules' handlers for +one phase can pass `notes' to their handlers for other phases). There +is another such configuration vector in the server_rec +data structure pointed to by the request_rec, which +contains per (virtual) server configuration data.

+ +Here is an abridged declaration, giving the fields most commonly used:

+ +

+struct request_rec {
+
+  pool *pool;
+  conn_rec *connection;
+  server_rec *server;
+
+  /* What object is being requested */
+  
+  char *uri;
+  char *filename;
+  char *path_info;
+  char *args;           /* QUERY_ARGS, if any */
+  struct stat finfo;    /* Set by server core;
+                         * st_mode set to zero if no such file */
+  
+  char *content_type;
+  char *content_encoding;
+  
+  /* MIME header environments, in and out.  Also, an array containing
+   * environment variables to be passed to subprocesses, so people can
+   * write modules to add to that environment.
+   *
+   * The difference between headers_out and err_headers_out is that
+   * the latter are printed even on error, and persist across internal
+   * redirects (so the headers printed for ErrorDocument handlers will
+   * have them).
+   */
+  
+  table *headers_in;
+  table *headers_out;
+  table *err_headers_out;
+  table *subprocess_env;
+
+  /* Info about the request itself... */
+  
+  int header_only;     /* HEAD request, as opposed to GET */
+  char *protocol;      /* Protocol, as given to us, or HTTP/0.9 */
+  char *method;        /* GET, HEAD, POST, etc. */
+  int method_number;   /* M_GET, M_POST, etc. */
+
+  /* Info for logging */
+
+  char *the_request;
+  int bytes_sent;
+
+  /* A flag which modules can set, to indicate that the data being
+   * returned is volatile, and clients should be told not to cache it.
+   */
+
+  int no_cache;
+
+  /* Various other config info which may change with .htaccess files
+   * These are config vectors, with one void* pointer for each module
+   * (the thing pointed to being the module's business).
+   */
+  
+  void *per_dir_config;   /* Options set in config files, etc. */
+  void *request_config;   /* Notes on *this* request */
+  
+};
+
+
+ +

Where request_rec structures come from

+ +Most request_rec structures are built by reading an HTTP +request from a client, and filling in the fields. However, there are +a few exceptions: + + + +

Handling requests, declining, and returning error codes

+ +As discussed above, each handler, when invoked to handle a particular +request_rec, has to return an int to +indicate what happened. That can either be + + + +Note that if the error code returned is REDIRECT, then +the module should put a Location in the request's +headers_out, to indicate where the client should be +redirected to.

+ +

Special considerations for response handlers

+ +Handlers for most phases do their work by simply setting a few fields +in the request_rec structure (or, in the case of access +checkers, simply by returning the correct error code). However, +response handlers have to actually send a request back to the client.

+ +They should begin by sending an HTTP response header, using the +function send_http_header. (You don't have to do +anything special to skip sending the header for HTTP/0.9 requests; the +function figures out on its own that it shouldn't do anything). If +the request is marked header_only, that's all they should +do; they should return after that, without attempting any further +output.

+ +Otherwise, they should produce a request body which responds to the +client as appropriate. The primitives for this are rputc +and rprintf, for internally generated output, and +send_fd, to copy the contents of some FILE * +straight to the client.

+ +At this point, you should more or less understand the following piece +of code, which is the handler which handles GET requests +which have no more specific handler; it also shows how conditional +GETs can be handled, if it's desirable to do so in a +particular response handler --- set_last_modified checks +against the If-modified-since value supplied by the +client, if any, and returns an appropriate code (which will, if +nonzero, be USE_LOCAL_COPY). No similar considerations apply for +set_content_length, but it returns an error code for +symmetry.

+ +

+int default_handler (request_rec *r)
+{
+    int errstatus;
+    FILE *f;
+    
+    if (r->method_number != M_GET) return DECLINED;
+    if (r->finfo.st_mode == 0) return NOT_FOUND;
+
+    if ((errstatus = set_content_length (r, r->finfo.st_size))
+        || (errstatus = set_last_modified (r, r->finfo.st_mtime)))
+        return errstatus;
+    
+    f = fopen (r->filename, "r");
+
+    if (f == NULL) {
+        log_reason("file permissions deny server access",
+                   r->filename, r);
+        return FORBIDDEN;
+    }
+      
+    register_timeout ("send", r);
+    send_http_header (r);
+
+    if (!r->header_only) send_fd (f, r);
+    pfclose (r->pool, f);
+    return OK;
+}
+
+ +Finally, if all of this is too much of a challenge, there are a few +ways out of it. First off, as shown above, a response handler which +has not yet produced any output can simply return an error code, in +which case the server will automatically produce an error response. +Secondly, it can punt to some other handler by invoking +internal_redirect, which is how the internal redirection +machinery discussed above is invoked. A response handler which has +internally redirected should always return OK.

+ +(Invoking internal_redirect from handlers which are +not response handlers will lead to serious confusion). + +

Special considerations for authentication handlers

+ +Stuff that should be discussed here in detail: + + + +

Special considerations for logging handlers

+ +When a request has internally redirected, there is the question of +what to log. Apache handles this by bundling the entire chain of +redirects into a list of request_rec structures which are +threaded through the r->prev and r->next +pointers. The request_rec which is passed to the logging +handlers in such cases is the one which was originally built for the +initial request from the client; note that the bytes_sent field will +only be correct in the last request in the chain (the one for which a +response was actually sent). + +

Resource allocation and resource pools

+ +One of the problems of writing and designing a server-pool server is +that of preventing leakage, that is, allocating resources (memory, +open files, etc.), without subsequently releasing them. The resource +pool machinery is designed to make it easy to prevent this from +happening, by allowing resource to be allocated in such a way that +they are automatically released when the server is done with +them.

+ +The way this works is as follows: the memory which is allocated, file +opened, etc., to deal with a particular request are tied to a +resource pool which is allocated for the request. The pool +is a data structure which itself tracks the resources in question.

+ +When the request has been processed, the pool is cleared. At +that point, all the memory associated with it is released for reuse, +all files associated with it are closed, and any other clean-up +functions which are associated with the pool are run. When this is +over, we can be confident that all the resource tied to the pool have +been released, and that none of them have leaked.

+ +Server restarts, and allocation of memory and resources for per-server +configuration, are handled in a similar way. There is a +configuration pool, which keeps track of resources which were +allocated while reading the server configuration files, and handling +the commands therein (for instance, the memory that was allocated for +per-server module configuration, log files and other files that were +opened, and so forth). When the server restarts, and has to reread +the configuration files, the configuration pool is cleared, and so the +memory and file descriptors which were taken up by reading them the +last time are made available for reuse.

+ +It should be noted that use of the pool machinery isn't generally +obligatory, except for situations like logging handlers, where you +really need to register cleanups to make sure that the log file gets +closed when the server restarts (this is most easily done by using the +function pfopen, which also +arranges for the underlying file descriptor to be closed before any +child processes, such as for CGI scripts, are execed), or +in case you are using the timeout machinery (which isn't yet even +documented here). However, there are two benefits to using it: +resources allocated to a pool never leak (even if you allocate a +scratch string, and just forget about it); also, for memory +allocation, palloc is generally faster than +malloc.

+ +We begin here by describing how memory is allocated to pools, and then +discuss how other resources are tracked by the resource pool +machinery. + +

Allocation of memory in pools

+ +Memory is allocated to pools by calling the function +palloc, which takes two arguments, one being a pointer to +a resource pool structure, and the other being the amount of memory to +allocate (in chars). Within handlers for handling +requests, the most common way of getting a resource pool structure is +by looking at the pool slot of the relevant +request_rec; hence the repeated appearance of the +following idiom in module code: + +
+int my_handler(request_rec *r)
+{
+    struct my_structure *foo;
+    ...
+
+    foo = (foo *)palloc (r->pool, sizeof(my_structure));
+}
+
+ +Note that there is no pfree --- +palloced memory is freed only when the associated +resource pool is cleared. This means that palloc does not +have to do as much accounting as malloc(); all it does in +the typical case is to round up the size, bump a pointer, and do a +range check.

+ +(It also raises the possibility that heavy use of palloc +could cause a server process to grow excessively large. There are +two ways to deal with this, which are dealt with below; briefly, you +can use malloc, and try to be sure that all of the memory +gets explicitly freed, or you can allocate a sub-pool of +the main pool, allocate your memory in the sub-pool, and clear it out +periodically. The latter technique is discussed in the section on +sub-pools below, and is used in the directory-indexing code, in order +to avoid excessive storage allocation when listing directories with +thousands of files). + +

Allocating initialized memory

+ +There are functions which allocate initialized memory, and are +frequently useful. The function pcalloc has the same +interface as palloc, but clears out the memory it +allocates before it returns it. The function pstrdup +takes a resource pool and a char * as arguments, and +allocates memory for a copy of the string the pointer points to, +returning a pointer to the copy. Finally pstrcat is a +varargs-style function, which takes a pointer to a resource pool, and +at least two char * arguments, the last of which must be +NULL. It allocates enough memory to fit copies of each +of the strings, as a unit; for instance: + +
+     pstrcat (r->pool, "foo", "/", "bar", NULL);
+
+ +returns a pointer to 8 bytes worth of memory, initialized to +"foo/bar". + +

Tracking open files, etc.

+ +As indicated above, resource pools are also used to track other sorts +of resources besides memory. The most common are open files. The +routine which is typically used for this is pfopen, which +takes a resource pool and two strings as arguments; the strings are +the same as the typical arguments to fopen, e.g., + +
+     ...
+     FILE *f = pfopen (r->pool, r->filename, "r");
+
+     if (f == NULL) { ... } else { ... }
+
+ +There is also a popenf routine, which parallels the +lower-level open system call. Both of these routines +arrange for the file to be closed when the resource pool in question +is cleared.

+ +Unlike the case for memory, there are functions to close +files allocated with pfopen, and popenf, +namely pfclose and pclosef. (This is +because, on many systems, the number of files which a single process +can have open is quite limited). It is important to use these +functions to close files allocated with pfopen and +popenf, since to do otherwise could cause fatal errors on +systems such as Linux, which react badly if the same +FILE* is closed more than once.

+ +(Using the close functions is not mandatory, since the +file will eventually be closed regardless, but you should consider it +in cases where your module is opening, or could open, a lot of files). + +

Other sorts of resources --- cleanup functions

+ +More text goes here. Describe the the cleanup primitives in terms of +which the file stuff is implemented; also, spawn_process. + +

Fine control --- creating and dealing with sub-pools, with a note +on sub-requests

+ +On rare occasions, too-free use of palloc() and the +associated primitives may result in undesirably profligate resource +allocation. You can deal with such a case by creating a +sub-pool, allocating within the sub-pool rather than the main +pool, and clearing or destroying the sub-pool, which releases the +resources which were associated with it. (This really is a +rare situation; the only case in which it comes up in the standard +module set is in case of listing directories, and then only with +very large directories. Unnecessary use of the primitives +discussed here can hair up your code quite a bit, with very little +gain).

+ +The primitive for creating a sub-pool is make_sub_pool, +which takes another pool (the parent pool) as an argument. When the +main pool is cleared, the sub-pool will be destroyed. The sub-pool +may also be cleared or destroyed at any time, by calling the functions +clear_pool and destroy_pool, respectively. +(The difference is that clear_pool frees resources +associated with the pool, while destroy_pool also +deallocates the pool itself. In the former case, you can allocate new +resources within the pool, and clear it again, and so forth; in the +latter case, it is simply gone).

+ +One final note --- sub-requests have their own resource pools, which +are sub-pools of the resource pool for the main request. The polite +way to reclaim the resources associated with a sub request which you +have allocated (using the sub_req_lookup_... functions) +is destroy_sub_request, which frees the resource pool. +Before calling this function, be sure to copy anything that you care +about which might be allocated in the sub-request's resource pool into +someplace a little less volatile (for instance, the filename in its +request_rec structure).

+ +(Again, under most circumstances, you shouldn't feel obliged to call +this function; only 2K of memory or so are allocated for a typical sub +request, and it will be freed anyway when the main request pool is +cleared. It is only when you are allocating many, many sub-requests +for a single main request that you should seriously consider the +destroy... functions). + +

Configuration, commands and the like

+ +One of the design goals for this server was to maintain external +compatibility with the NCSA 1.3 server --- that is, to read the same +configuration files, to process all the directives therein correctly, +and in general to be a drop-in replacement for NCSA. On the other +hand, another design goal was to move as much of the server's +functionality into modules which have as little as possible to do with +the monolithic server core. The only way to reconcile these goals is +to move the handling of most commands from the central server into the +modules.

+ +However, just giving the modules command tables is not enough to +divorce them completely from the server core. The server has to +remember the commands in order to act on them later. That involves +maintaining data which is private to the modules, and which can be +either per-server, or per-directory. Most things are per-directory, +including in particular access control and authorization information, +but also information on how to determine file types from suffixes, +which can be modified by AddType and +DefaultType directives, and so forth. In general, the +governing philosophy is that anything which can be made +configurable by directory should be; per-server information is +generally used in the standard set of modules for information like +Aliases and Redirects which come into play +before the request is tied to a particular place in the underlying +file system.

+ +Another requirement for emulating the NCSA server is being able to +handle the per-directory configuration files, generally called +.htaccess files, though even in the NCSA server they can +contain directives which have nothing at all to do with access +control. Accordingly, after URI -> filename translation, but before +performing any other phase, the server walks down the directory +hierarchy of the underlying filesystem, following the translated +pathname, to read any .htaccess files which might be +present. The information which is read in then has to be +merged with the applicable information from the server's own +config files (either from the <Directory> sections +in access.conf, or from defaults in +srm.conf, which actually behaves for most purposes almost +exactly like <Directory />).

+ +Finally, after having served a request which involved reading +.htaccess files, we need to discard the storage allocated +for handling them. That is solved the same way it is solved wherever +else similar problems come up, by tying those structures to the +per-transaction resource pool.

+ +

Per-directory configuration structures

+ +Let's look out how all of this plays out in mod_mime.c, +which defines the file typing handler which emulates the NCSA server's +behavior of determining file types from suffixes. What we'll be +looking at, here, is the code which implements the +AddType and AddEncoding commands. These +commands can appear in .htaccess files, so they must be +handled in the module's private per-directory data, which in fact, +consists of two separate tables for MIME types and +encoding information, and is declared as follows: + +
+typedef struct {
+    table *forced_types;      /* Additional AddTyped stuff */
+    table *encoding_types;    /* Added with AddEncoding... */
+} mime_dir_config;
+
+ +When the server is reading a configuration file, or +<Directory> section, which includes one of the MIME +module's commands, it needs to create a mime_dir_config +structure, so those commands have something to act on. It does this +by invoking the function it finds in the module's `create per-dir +config slot', with two arguments: the name of the directory to which +this configuration information applies (or NULL for +srm.conf), and a pointer to a resource pool in which the +allocation should happen.

+ +(If we are reading a .htaccess file, that resource pool +is the per-request resource pool for the request; otherwise it is a +resource pool which is used for configuration data, and cleared on +restarts. Either way, it is important for the structure being created +to vanish when the pool is cleared, by registering a cleanup on the +pool if necessary).

+ +For the MIME module, the per-dir config creation function just +pallocs the structure above, and a creates a couple of +tables to fill it. That looks like this: + +

+void *create_mime_dir_config (pool *p, char *dummy)
+{
+    mime_dir_config *new =
+      (mime_dir_config *) palloc (p, sizeof(mime_dir_config));
+
+    new->forced_types = make_table (p, 4);
+    new->encoding_types = make_table (p, 4);
+    
+    return new;
+}
+
+ +Now, suppose we've just read in a .htaccess file. We +already have the per-directory configuration structure for the next +directory up in the hierarchy. If the .htaccess file we +just read in didn't have any AddType or +AddEncoding commands, its per-directory config structure +for the MIME module is still valid, and we can just use it. +Otherwise, we need to merge the two structures somehow.

+ +To do that, the server invokes the module's per-directory config merge +function, if one is present. That function takes three arguments: +the two structures being merged, and a resource pool in which to +allocate the result. For the MIME module, all that needs to be done +is overlay the tables from the new per-directory config structure with +those from the parent: + +

+void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv)
+{
+    mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv;
+    mime_dir_config *subdir = (mime_dir_config *)subdirv;
+    mime_dir_config *new =
+      (mime_dir_config *)palloc (p, sizeof(mime_dir_config));
+
+    new->forced_types = overlay_tables (p, subdir->forced_types,
+                                        parent_dir->forced_types);
+    new->encoding_types = overlay_tables (p, subdir->encoding_types,
+                                          parent_dir->encoding_types);
+
+    return new;
+}
+
+ +As a note --- if there is no per-directory merge function present, the +server will just use the subdirectory's configuration info, and ignore +the parent's. For some modules, that works just fine (e.g., for the +includes module, whose per-directory configuration information +consists solely of the state of the XBITHACK), and for +those modules, you can just not declare one, and leave the +corresponding structure slot in the module itself NULL.

+ +

Command handling

+ +Now that we have these structures, we need to be able to figure out +how to fill them. That involves processing the actual +AddType and AddEncoding commands. To find +commands, the server looks in the module's command table. +That table contains information on how many arguments the commands +take, and in what formats, where it is permitted, and so forth. That +information is sufficient to allow the server to invoke most +command-handling functions with pre-parsed arguments. Without further +ado, let's look at the AddType command handler, which +looks like this (the AddEncoding command looks basically +the same, and won't be shown here): + +
+char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext)
+{
+    if (*ext == '.') ++ext;
+    table_set (m->forced_types, ext, ct);
+    return NULL;
+}
+
+ +This command handler is unusually simple. As you can see, it takes +four arguments, two of which are pre-parsed arguments, the third being +the per-directory configuration structure for the module in question, +and the fourth being a pointer to a cmd_parms structure. +That structure contains a bunch of arguments which are frequently of +use to some, but not all, commands, including a resource pool (from +which memory can be allocated, and to which cleanups should be tied), +and the (virtual) server being configured, from which the module's +per-server configuration data can be obtained if required.

+ +Another way in which this particular command handler is unusually +simple is that there are no error conditions which it can encounter. +If there were, it could return an error message instead of +NULL; this causes an error to be printed out on the +server's stderr, followed by a quick exit, if it is in +the main config files; for a .htaccess file, the syntax +error is logged in the server error log (along with an indication of +where it came from), and the request is bounced with a server error +response (HTTP error status, code 500).

+ +The MIME module's command table has entries for these commands, which +look like this: + +

+command_rec mime_cmds[] = {
+{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2, 
+    "a mime type followed by a file extension" },
+{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, 
+    "an encoding (e.g., gzip), followed by a file extension" },
+{ NULL }
+};
+
+ +The entries in these tables are: + + + +Finally, having set this all up, we have to use it. This is +ultimately done in the module's handlers, specifically for its +file-typing handler, which looks more or less like this; note that the +per-directory configuration structure is extracted from the +request_rec's per-directory configuration vector by using +the get_module_config function. + +
+int find_ct(request_rec *r)
+{
+    int i;
+    char *fn = pstrdup (r->pool, r->filename);
+    mime_dir_config *conf = (mime_dir_config *)
+             get_module_config(r->per_dir_config, &mime_module);
+    char *type;
+
+    if (S_ISDIR(r->finfo.st_mode)) {
+        r->content_type = DIR_MAGIC_TYPE;
+        return OK;
+    }
+    
+    if((i=rind(fn,'.')) < 0) return DECLINED;
+    ++i;
+
+    if ((type = table_get (conf->encoding_types, &fn[i])))
+    {
+        r->content_encoding = type;
+
+        /* go back to previous extension to try to use it as a type */
+
+        fn[i-1] = '\0';
+        if((i=rind(fn,'.')) < 0) return OK;
+        ++i;
+    }
+
+    if ((type = table_get (conf->forced_types, &fn[i])))
+    {
+        r->content_type = type;
+    }
+    
+    return OK;
+}
+
+
+ +

Side notes --- per-server configuration, virtual servers, etc.

+ +The basic ideas behind per-server module configuration are basically +the same as those for per-directory configuration; there is a creation +function and a merge function, the latter being invoked where a +virtual server has partially overridden the base server configuration, +and a combined structure must be computed. (As with per-directory +configuration, the default if no merge function is specified, and a +module is configured in some virtual server, is that the base +configuration is simply ignored).

+ +The only substantial difference is that when a command needs to +configure the per-server private module data, it needs to go to the +cmd_parms data to get at it. Here's an example, from the +alias module, which also indicates how a syntax error can be returned +(note that the per-directory configuration argument to the command +handler is declared as a dummy, since the module doesn't actually have +per-directory config data): + +

+char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url)
+{
+    server_rec *s = cmd->server;
+    alias_server_conf *conf = (alias_server_conf *)
+            get_module_config(s->module_config,&alias_module);
+    alias_entry *new = push_array (conf->redirects);
+
+    if (!is_url (url)) return "Redirect to non-URL";
+    
+    new->fake = f; new->real = url;
+    return NULL;
+}
+
+ + + diff --git a/docs/manual/misc/FAQ.html b/docs/manual/misc/FAQ.html new file mode 100644 index 0000000000..b630a283f0 --- /dev/null +++ b/docs/manual/misc/FAQ.html @@ -0,0 +1,162 @@ + + + +Apache server Frequently Asked Questions + + + + +

Apache server Frequently Asked Questions

+ +

The Questions

+
    +
  1. What is Apache ? +
  2. Why was Apache created ? +
  3. How does the Apache group relate to other servers ? +
  4. Why the name "Apache" ? +
  5. How compatible is Apache with my existing NCSA 1.3 setup ? +
  6. OK, so how does Apache compare to other servers ? +
  7. How thoroughly tested is Apache? +
  8. Does or will Apache act as a Proxy server? +
  9. What are the future plans for Apache ? +
  10. Who do I contact for support ? +
  11. Is there any more information on Apache ? +
  12. Where can get Apache ? +
+ +
+ +

The Answers

+
    +
  1. What is Apache ? +

    + Apache was originally based on code and ideas found in the most +popular HTTP server of the time.. NCSA httpd 1.3 (early 1995). It has +since evolved into a far superior system which can rival (and probably +surpass) almost any other UNIX based HTTP server in terms of functionality, +efficiency and speed. +

    Since it began, it has been completely rewritten, and includes many new +features. Apache is, as of June 1996, the most popular WWW server on +the Internet, according to the Netcraft Survey. + +

    +
    +
  2. How does the Apache group relate to other +server efforts, such as NCSA's? +

    +We, of course, owe a great debt to NCSA and their programmers for +making the server Apache was based on. We now, however, have our own +server, and our project is mostly our own. The Apache Project is an +entitely independent venture. +

    +
    + +
  3. Why was Apache created ? +

    to address concerns of a group of www providers and part time httpd +programmers, that httpd didn't behave as they wanted it +to. Apache is an entirely volunteer effort, completely funded by its +members, not by commercial sales. +

    + +
    + +
  4. Why the name "Apache" ? +

    A cute name which stuck. Apache is "A PAtCHy server". It was + based on some existing code and a series of "patch files". +

    +
    + + +
  5. How compatible is Apache with my existing NCSA 1.3 +setup ?

    + +Apache attempts to offer all the features and configuration options +of NCSA httpd 1.3, as well as many of the additional features found in +NCSA httpd 1.4 and NCSA httpd 1.5.

    + +NCSA httpd appears to be moving toward adding experimental features +which are not generally required at the moment. Some of the experiments +will succeed while others will inevitably be dropped. The Apache philosophy is +to add what's needed as and when it is needed.

    + +Friendly interaction between Apache and NCSA developers should ensure +that fundamental feature enhancments stay consistent between the two +servers for the foreseeable future.

    + +


    + +
  6. OK, so how does Apache compare to other servers ? +

    +For an independent assessment, see http://www.webcompare.com/server-main.html +

    + +

    Apache has been shown to be substantially faster than many other +free servers. Although certain commercial servers have claimed to +surpass Apache's speed (it has not been demonstrated that any of these +"benchmarks" are a good way of measuring WWW server speed at any +rate), we feel that it is better to have a mostly-fast free server +than an extremely-fast server that costs thousands of dollars. Apache +is run on sites that get millions of hits per day, and they have +experienced no performance difficulties.

    + +
    +
  7. How thoroughly tested is Apache? + +

    Apache is run on over 100,000 Internet servers (as of July 1996). It has +been tested thoroughly by both developers and users. The Apache Group +maintains rigorous standards before releasing new versions of their +server, and our server runs without a hitch on over one third of all +WWW servers. When bugs do show up, we release patches and new +versions, as soon as they are available. + +

    See http://www.apache.org/info/apache_users.html for an incomplete list of sites running Apache.

    + +
    + +
  8. Does or will Apache act as a Proxy server? +

    Apache version 1.1 +and above will come with a proxy module. If compiled in, this will make +Apache act as a caching-proxy server +

    +


    + +
  9. What are the future plans for Apache ? +

      +
    • to continue as a public domain HTTP server, +
    • to keep up with advances in HTTP protocol and web developments in general +
    • to collect suggestions for fixes/improvements from its users, +
    • to respond to needs of large volume providers as well as occasional users. +
    +


    + +
  10. Who do I contact for support ? +

    There is no official support for Apache. None of the developers want to +be swamped by a flood of trivial questions that can be resolved elsewhere. +Bug reports and suggestions should be sent via the bug report page. +Other questions should be directed to +comp.infosystems.www.servers.unix, where some of the Apache team lurk, +in the company of many other httpd gurus who should be able +to help. +

    +Commercial support for Apache is, however, available from a number +third parties. +

    +
    + +
  11. Is there any more information on Apache ? +

    Indeed there is. See http://www.apache.org/. +

    +
    + +
  12. Where can get Apache ? +

    +You can find the source for Apache at http://www.apache.org/. +

    +
    +
+ +Home +Index + + diff --git a/docs/manual/misc/client_block_api.html b/docs/manual/misc/client_block_api.html new file mode 100644 index 0000000000..c70ee37a66 --- /dev/null +++ b/docs/manual/misc/client_block_api.html @@ -0,0 +1,70 @@ + + + +Reading Client Input in Apache 1.2 + + + + +

Reading Client Input in Apache 1.2

+ +
+ +

Apache 1.1 and earlier let modules handle POST and PUT requests by +themselves. The module would, on its own, determine whether the +request had an entity, how many bytes it was, and then called a +function (read_client_block) to get the data. + +

However, HTTP/1.1 requires several things of POST and PUT request +handlers that did not fit into this module, and all existing modules +have to be rewritten. The API calls for handling this have been +furthur abstracted, so that future HTTP protocol changes can be +accomplished while remaining backwards-compatible.

+ +
+ +

The New API Functions

+ +
+   int setup_client_block (request_rec *);
+   int should_client_block (request_rec *);
+   long get_client_block (request_rec *, char *buffer, int buffer_size);
+
+ +
    +
  1. Call setup_client_block() near the beginning of the request + handler. This will set up all the neccessary properties, and + will return either OK, or an error code. If the latter, + the module should return that error code. + +
  2. When you are ready to possibly accept input, call + should_client_block(). + This will tell the module whether or not to read input. If it is 0, + the module should assume that the input is of a non-entity type + (e.g. a GET request). A nonzero response indicates that the module + should proceed (to step 3). + This step also sends a 100 Continue response + to HTTP/1.1 clients, so should not be called until the module + is *defenitely* ready to read content. (otherwise, the point of the + 100 response is defeated). Never call this function more than once. + +
  3. Finally, call get_client_block in a loop. Pass it a + buffer and its + size. It will put data into the buffer (not neccessarily the full + buffer, in the case of chunked inputs), and return the length of + the input block. When it is done reading, it will return 0, and + the module should proceed. + +
+ +

As an example, please look at the code in +mod_cgi.c. This is properly written to the new API +guidelines.

+ +
+ +Home +Index + + + diff --git a/docs/manual/misc/compat_notes.html b/docs/manual/misc/compat_notes.html new file mode 100644 index 0000000000..efa641f8b7 --- /dev/null +++ b/docs/manual/misc/compat_notes.html @@ -0,0 +1,108 @@ + +Apache HTTP Server: Compatibility Notes with NCSA's Server + + + +

Compatibility Notes with NCSA's Server

+ +
+ +While Apache 0.8.x and beyond are for the most part a drop-in +replacement for NCSA's httpd and earlier versions of Apache, there are +a couple gotcha's to watch out for. These are mostly due to the fact +that the parser for config and access control files was rewritten from +scratch, so certain liberties the earlier servers took may not be +available here. These are all easily fixable. If you know of other +non-fatal problems that belong here, let us know. + +

Please also check the known bugs page. + + + +

    + +
  1. AddType only accepts one file extension per line, without +any dots (.) in the extension, and does not take full filenames. +If you need multiple extensions per type, use multiple lines, e.g. +
    +AddType application/foo foo
    +AddType application/foo bar +
    +To map .foo and .bar to application/foo +

    + + + +

  2. If you follow the NCSA guidelines for setting up access restrictions + based on client domain, you may well have added entries for, + AuthType, AuthName, AuthUserFile or AuthGroupFile. + None of these are needed (or appropriate) for restricting access + based on client domain. + +

    When Apache sees AuthType it (reasonably) assumes you + are using some authorization type based on username and password. + +

    Please remove AuthType, it's unnecessary even for NCSA. + +

    + +

  3. AuthUserFile requires a full pathname. In earlier + versions of NCSA httpd and Apache, you could use a filename + relative to the .htaccess file. This could be a major security hole, + as it made it trivially easy to make a ".htpass" file in the a + directory easily accessable by the world. We recommend you store + your passwords outside your document tree. + +

    + +

  4. OldScriptAlias is no longer supported. + +

    + +

  5. exec cgi="" produces reasonable malformed header + responses when used to invoke non-CGI scripts.
    + The NCSA code ignores the missing header. (bad idea)
    + Solution: write CGI to the CGI spec or use exec cmd="" instead. +

    We might add virtual support to exec cmd to + make up for this difference. + +

    + +

  6. <Limit> sillyness - in the old Apache 0.6.5, a + directive of <Limit GET> would also restrict POST methods - Apache 0.8.8's new + core is correct in not presuming a limit on a GET is the same limit on a POST, + so if you are relying on that behavior you need to change your access configurations + to reflect that. + +

    + +

  7. Icons for FancyIndexing broken - well, no, they're not broken, we've just upgraded the + icons from flat .xbm files to pretty and much smaller .gif files, courtesy of +Kevin Hughes at +EIT. + If you are using the same srm.conf from an old distribution, make sure you add the new + AddIcon, AddIconByType, and DefaultIcon commands. + +

    + +

  8. Under IRIX, the "Group" directive in httpd.conf needs to be a valid group name + (i.e. "nogroup") not the numeric group ID. The distribution httpd.conf, and earlier + ones, had the default Group be "#-1", which was causing silent exits at startup.

    + +

  9. .asis files: Apache 0.6.5 did not require a Status header; +it added one automatically if the .asis file contained a Location header. +0.8.14 requires a Status header.

    + +

+ +More to come when we notice them.... + + +
+ +Home +Index + + + diff --git a/docs/manual/misc/security_tips.html b/docs/manual/misc/security_tips.html new file mode 100644 index 0000000000..a805d8cbed --- /dev/null +++ b/docs/manual/misc/security_tips.html @@ -0,0 +1,92 @@ + + + +Apache HTTP Server Documentation + + + + +

Security tips for server configuration

+ +
+ +

Some hints and tips on security issues in setting up a web server. Some of +the suggestions will be general, other, specific to Apache + +


+ +

Server Side Includes

+

Server side includes (SSI) can be configured so that users can execute +arbitrary programs on the server. That thought alone should send a shiver +down the spine of any sys-admin.

+ +One solution is to disable that part of SSI. To do that you use the +IncludesNOEXEC option to the Options +directive.

+ +


+ +

Non Script Aliased CGI

+

Allowing users to execute CGI scripts in any directory should only +be considered if; +

    +
  1. You trust your users not to write scripts which will deliberately or +accidentally expose your system to an attack. +
  2. You consider security at your site to be so feeble in other areas, as to +make one more potential hole irrelevant. +
  3. You have no users, and nobody ever visits your server. +

+


+ +

Script Alias'ed CGI

+

Limiting CGI to special directories gives the admin control over +what goes into those directories. This is inevitably more secure than +non script aliased CGI, but only if users with write access to the +directories are trusted or the admin is willing to test each new CGI +script/program for potential security holes.

+ +Most sites choose this option over the non script aliased CGI approach.

+ +


+

CGI in general

+

Always remember that you must trust the writers of the CGI script/programs +or your ability to spot potential security holes in CGI, whether they were +deliberate or accidental.

+ +All the CGI scripts will run as the same user, so they have potential to +conflict (accidentally or deliberately) with other scripts e.g. User A hates +User B, so he writes a script to trash User B's CGI database.

+ +


+ +Please send any other useful security tips to +apache-bugs@mail.apache.org +

+


+ +

Stopping users overriding system wide settings...

+

To run a really tight ship, you'll want to stop users from setting +up .htaccess files which can override security features +you've configured. Here's one way to do it...

+ +In the server configuration file, put +

+<Directory>
+AllowOverride None
+Options None
+<Limit GET PUT POST>
+allow from all
+</Limit>
+</Directory>
+
+ +Then setup for specific directories

+ +This stops all overrides, Includes and accesses in all directories apart +from those named.


+ +Home +Index + + + diff --git a/docs/manual/platform/perf-bsd44.html b/docs/manual/platform/perf-bsd44.html new file mode 100644 index 0000000000..1f3a6010c8 --- /dev/null +++ b/docs/manual/platform/perf-bsd44.html @@ -0,0 +1,215 @@ + + +Running a High-Performance Web Server for BSD + + + + + + +

Running a High-Performance Web Server for BSD

+ +Like other OS's, the listen queue is often the first limit hit. The +following are comments from "Aaron Gifford <agifford@InfoWest.COM>" +on how to fix this on BSDI 1.x, 2.x, and FreeBSD 2.0 (and earlier): + +

+ +Edit the following two files: +

/usr/include/sys/socket.h
+ /usr/src/sys/sys/socket.h
+In each file, look for the following: +
+    /*
+     * Maximum queue length specifiable by listen.
+     */
+    #define SOMAXCONN       5
+
+ +Just change the "5" to whatever appears to work. I bumped the two +machines I was having problems with up to 32 and haven't noticed the +problem since. + +

+ +After the edit, recompile the kernel and recompile the Apache server +then reboot. + +

+ +FreeBSD 2.1 seems to be perfectly happy, with SOMAXCONN +set to 32 already. + +

+ + +Addendum for very heavily loaded BSD servers
+
+from Chuck Murcko <chuck@telebase.com> + +

+ +If you're running a really busy BSD Apache server, the following are useful +things to do if the system is acting sluggish:

+ +

+ +These utilities give you an idea what you'll need to tune in your kernel, +and whether it'll help to buy more RAM. + +Here are some BSD kernel config parameters (actually BSDI, but pertinent to +FreeBSD and other 4.4-lite derivatives) from a system getting heavy usage. +The tools mentioned above were used, and the system memory was increased to +48 MB before these tuneups. Other system parameters remained unchanged. + +

+ +

+maxusers        256
+
+ +Maxusers drives a lot of other kernel parameters: + + + +The actual formulae for these derived parameters are in +/usr/src/sys/conf/param.c. +These calculated parameters can also be overridden (in part) by specifying +your own values in the kernel configuration file: + +
+# Network options. NMBCLUSTERS defines the number of mbuf clusters and
+# defaults to 256. This machine is a server that handles lots of traffic,
+# so we crank that value.
+options         SOMAXCONN=256           # max pending connects
+options         NMBCLUSTERS=4096        # mbuf clusters at 4096
+
+#
+# Misc. options
+#
+options         CHILD_MAX=512           # maximum number of child processes
+options         OPEN_MAX=512            # maximum fds (breaks RPC svcs)
+
+ +SOMAXCONN is not derived from maxusers, so you'll always need to increase +that yourself. We used a value guaranteed to be larger than Apache's +default for the listen() of 128, currently. + +

+ +In many cases, NMBCLUSTERS must be set much larger than would appear +necessary at first glance. The reason for this is that if the browser +disconnects in mid-transfer, the socket fd associated with that particular +connection ends up in the TIME_WAIT state for several minutes, during +which time its mbufs are not yet freed. + +

+ +Some more info on mbuf clusters (from sys/mbuf.h): +

+/*
+ * Mbufs are of a single size, MSIZE (machine/machparam.h), which
+ * includes overhead.  An mbuf may add a single "mbuf cluster" of size
+ * MCLBYTES (also in machine/machparam.h), which has no additional overhead
+ * and is used instead of the internal data area; this is done when
+ * at least MINCLSIZE of data must be stored.
+ */
+
+ +

+ +CHILD_MAX and OPEN_MAX are set to allow up to 512 child processes (different +than the maximum value for processes per user ID) and file descriptors. +These values may change for your particular configuration (a higher OPEN_MAX +value if you've got modules or CGI scripts opening lots of connections or +files). If you've got a lot of other activity besides httpd on the same +machine, you'll have to set NPROC higher still. In this example, the NPROC +value derived from maxusers proved sufficient for our load. + +

+ +Caveats + +

+ +Be aware that your system may not boot with a kernel that is configured +to use more resources than you have available system RAM. ALWAYS +have a known bootable kernel available when tuning your system this way, +and use the system tools beforehand to learn if you need to buy more +memory before tuning. + +

+ +RPC services will fail when the value of OPEN_MAX is larger than 256. +This is a function of the original implementations of the RPC library, +which used a byte value for holding file descriptors. BSDI has partially +addressed this limit in its 2.1 release, but a real fix may well await +the redesign of RPC itself. + +

+ +Finally, there's the hard limit of child processes configured in Apache. + +

+ +For versions of Apache later than 1.0.5 you'll need to change the +definition for HARD_SERVER_LIMIT in httpd.h and recompile +if you need to run more than the default 150 instances of httpd. + +

+ +From conf/httpd.conf-dist: + +

+# Limit on total number of servers running, i.e., limit on the number
+# of clients who can simultaneously connect --- if this limit is ever
+# reached, clients will be LOCKED OUT, so it should NOT BE SET TOO LOW.
+# It is intended mainly as a brake to keep a runaway server from taking
+# Unix with it as it spirals down...
+
+MaxClients 150
+
+ +Know what you're doing if you bump this value up, and make sure you've +done your system monitoring, RAM expansion, and kernel tuning beforehand. +Then you're ready to service some serious hits! + +

+ +Thanks to Tony Sanders and Chris Torek at BSDI for their +helpful suggestions and information. + +


+ +

More welcome!

+ +If you have tips to contribute, send mail to brian@organic.com + +


+Home +Index + + diff --git a/docs/manual/platform/perf-dec.html b/docs/manual/platform/perf-dec.html new file mode 100644 index 0000000000..cd027bfc60 --- /dev/null +++ b/docs/manual/platform/perf-dec.html @@ -0,0 +1,267 @@ + +Performance Tuning Tips for Digital Unix + + +

Performance Tuning Tips for Digital Unix

+ +Below is a set of newsgroup posts made by an engineer from DEC in +response to queries about how to modify DEC's Digital Unix OS for more +heavily loaded web sites. Copied with permission. + +
+ +

Update

+From: Jeffrey Mogul
+Date: Fri, 28 Jun 96 16:07:56 MDT
+ +
    +
  1. The advice given in the README file regarding the + "tcbhashsize" variable is incorrect. The largest value + this should be set to is 1024. Setting it any higher + will have the perverse result of disabling the hashing + mechanism. + +
  2. Patch ID OSF350-146 has been superseded by +
    + Patch ID OSF350-195 for V3.2C
    + Patch ID OSF360-350195 for V3.2D +
    + Patch IDs for V3.2E and V3.2F should be available soon. + There is no known reason why the Patch ID OSF360-350195 + won't work on these releases, but such use is not officially + supported by Digital. This patch kit will not be needed for + V3.2G when it is released. + + +
    + + +
    +From           mogul@pa.dec.com (Jeffrey Mogul)
    +Organization   DEC Western Research
    +Date           30 May 1996 00:50:25 GMT
    +Newsgroups     comp.unix.osf.osf1
    +Message-ID     <4oirch$bc8@usenet.pa.dec.com>
    +Subject        Re: Web Site Performance
    +References     1
    +
    +
    +
    +In article <skoogDs54BH.9pF@netcom.com> skoog@netcom.com (Jim Skoog) writes:
    +>Where are the performance bottlenecks for Alpha AXP running the
    +>Netscape Commerce Server 1.12 with high volume internet traffic?
    +>We are evaluating network performance for a variety of Alpha AXP
    +>runing DEC UNIX 3.2C, which run DEC's seal firewall and behind
    +>that Alpha 1000 and 2100 webservers.
    +
    +Our experience (running such Web servers as altavista.digital.com
    +and www.digital.com) is that there is one important kernel tuning
    +knob to adjust in order to get good performance on V3.2C.  You
    +need to patch the kernel global variable "somaxconn" (use dbx -k
    +to do this) from its default value of 8 to something much larger.
    +
    +How much larger?  Well, no larger than 32767 (decimal).  And
    +probably no less than about 2048, if you have a really high volume
    +(millions of hits per day), like AltaVista does.
    +
    +This change allows the system to maintain more than 8 TCP
    +connections in the SYN_RCVD state for the HTTP server.  (You
    +can use "netstat -An |grep SYN_RCVD" to see how many such
    +connections exist at any given instant).
    +
    +If you don't make this change, you might find that as the load gets
    +high, some connection attempts take a very long time.  And if a lot
    +of your clients disconnect from the Internet during the process of
    +TCP connection establishment (this happens a lot with dialup
    +users), these "embryonic" connections might tie up your somaxconn
    +quota of SYN_RCVD-state connections.  Until the kernel times out
    +these embryonic connections, no other connections will be accepted,
    +and it will appear as if the server has died.
    +
    +The default value for somaxconn in Digital UNIX V4.0 will be quite
    +a bit larger than it has been in previous versions (we inherited
    +this default from 4.3BSD).
    +
    +Digital UNIX V4.0 includes some other performance-related changes
    +that significantly improve its maximum HTTP connection rate.  However,
    +we've been using V3.2C systems to front-end for altavista.digital.com
    +with no obvious performance bottlenecks at the millions-of-hits-per-day
    +level.
    +
    +We have some Webstone performance results available at
    +        http://www.digital.com/info/alphaserver/news/webff.html
    +I'm not sure if these were done using V4.0 or an earlier version
    +of Digital UNIX, although I suspect they were done using a test
    +version of V4.0.
    +
    +-Jeff
    +
    +
    + +---------------------------------------------------------------------------- + +From mogul@pa.dec.com (Jeffrey Mogul) +Organization DEC Western Research +Date 31 May 1996 21:01:01 GMT +Newsgroups comp.unix.osf.osf1 +Message-ID <4onmmd$mmd@usenet.pa.dec.com> +Subject Digital UNIX V3.2C Internet tuning patch info + +---------------------------------------------------------------------------- + +Something that probably few people are aware of is that Digital +has a patch kit available for Digital UNIX V3.2C that may improve +Internet performance, especially for busy web servers. + +This patch kit is one way to increase the value of somaxconn, +which I discussed in a message here a day or two ago. + +I've included in this message the revised README file for this +patch kit below. Note that the original README file in the patch +kit itself may be an earlier version; I'm told that the version +below is the right one. + +Sorry, this patch kit is NOT available for other versions of Digital +UNIX. Most (but not quite all) of these changes also made it into V4.0, +so the description of the various tuning parameters in this README +file might be useful to people running V4.0 systems. + +This patch kit does not appear to be available (yet?) from + http://www.service.digital.com/html/patch_service.html +so I guess you'll have to call Digital's Customer Support to get it. + +-Jeff + +DESCRIPTION: Digital UNIX Network tuning patch + + Patch ID: OSF350-146 + + SUPERSEDED PATCHES: OSF350-151, OSF350-158 + + This set of files improves the performance of the network + subsystem on a system being used as a web server. There are + additional tunable parameters included here, to be used + cautiously by an informed system administrator. + +TUNING + + To tune the web server, the number of simultaneous socket + connection requests are limited by: + + somaxconn Sets the maximum number of pending requests + allowed to wait on a listening socket. The + default value in Digital UNIX V3.2 is 8. + This patch kit increases the default to 1024, + which matches the value in Digital UNIX V4.0. + + sominconn Sets the minimum number of pending connections + allowed on a listening socket. When a user + process calls listen with a backlog less + than sominconn, the backlog will be set to + sominconn. sominconn overrides somaxconn. + The default value is 1. + + The effectiveness of tuning these parameters can be monitored by + the sobacklog variables available in the kernel: + + sobacklog_hiwat Tracks the maximum pending requests to any + socket. The initial value is 0. + + sobacklog_drops Tracks the number of drops exceeding the + socket set backlog limit. The initial + value is 0. + + somaxconn_drops Tracks the number of drops exceeding the + somaxconn limit. When sominconn is larger + than somaxconn, tracks the number of drops + exceeding sominconn. The initial value is 0. + + TCP timer parameters also affect performance. Tuning the following + require some knowledge of the characteristics of the network. + + tcp_msl Sets the tcp maximum segment lifetime. + This is the maximum lifetime in half + seconds that a packet can be in transit + on the network. This value, when doubled, + is the length of time a connection remains + in the TIME_WAIT state after a incoming + close request is processed. The unit is + specified in 1/2 seconds, the initial + value is 60. + + tcp_rexmit_interval_min + Sets the minimum TCP retransmit interval. + For some WAN networks the default value may + be too short, causing unnecessary duplicate + packets to be sent. The unit is specified + in 1/2 seconds, the initial value is 1. + + tcp_keepinit This is the amount of time a partially + established connection will sit on the listen + queue before timing out (e.g. if a client + sends a SYN but never answers our SYN/ACK). + Partially established connections tie up slots + on the listen queue. If the queue starts to + fill with connections in SYN_RCVD state, + tcp_keepinit can be decreased to make those + partial connects time out sooner. This should + be used with caution, since there might be + legitimate clients that are taking a while + to respond to SYN/ACK. The unit is specified + in 1/2 seconds, the default value is 150 + (ie. 75 seconds). + + The hashlist size for the TCP inpcb lookup table is regulated by: + + tcbhashsize The number of hash buckets used for the + TCP connection table used in the kernel. + The initial value is 32. For best results, + should be specified as a power of 2. For + busy Web servers, set this to 2048 or more. + + The hashlist size for the interface alias table is regulated by: + + inifaddr_hsize The number of hash buckets used for the + interface alias table used in the kernel. + The initial value is 32. For best results, + should be specified as a power of 2. + + ipport_userreserved The maximum number of concurrent non-reserved, + dynamically allocated ports. Default range + is 1025-5000. The maximum value is 65535. + This limits the numer of times you can + simultaneously telnet or ftp out to connect + to other systems. + + tcpnodelack Don't delay acknowledging TCP data; this + can sometimes improve performance of locally + run CAD packages. Default is value is 0, + the enabled value is 1. + + Digital UNIX version: + + V3.2C +Feature V3.2C patch V4.0 + ======= ===== ===== ==== +somaxconn X X X +sominconn - X X +sobacklog_hiwat - X - +sobacklog_drops - X - +somaxconn_drops - X - +tcpnodelack X X X +tcp_keepidle X X X +tcp_keepintvl X X X +tcp_keepcnt - X X +tcp_keepinit - X X +TCP keepalive per-socket - - X +tcp_msl - X - +tcp_rexmit_interval_min - X - +TCP inpcb hashing - X X +tcbhashsize - X X +interface alias hashing - X X +inifaddr_hsize - X X +ipport_userreserved - X - +sysconfig -q inet - - X +sysconfig -q socket - - X + +
    diff --git a/docs/manual/platform/perf.html b/docs/manual/platform/perf.html new file mode 100644 index 0000000000..d2a88e23b3 --- /dev/null +++ b/docs/manual/platform/perf.html @@ -0,0 +1,134 @@ + + + +Hints on Running a High-Performance Web Server + + + + +

    Hints on Running a High-Performance Web Server

    + +Running Apache on a heavily loaded web server, one often encounters +problems related to the machine and OS configuration. "Heavy" is +relative, of course - but if you are seeing more than a couple hits +per second on a sustained basis you should consult the pointers on +this page. In general the suggestions involve how to tune your kernel +for the heavier TCP load, hardware/software conflicts that arise, etc. + + + +
    + + +

    A/UX (Apple's UNIX)

    +
    + +If you are running Apache on A/UX, a page that gives some helpful +performance hints (concerning the listen() queue and using +virtual hosts) +can be found here + +


    + + +

    BSD-based (BSDI, FreeBSD, etc)

    +
    + +Quick and +detailed +performance tuning hints for BSD-derived systems. + +


    + + +

    Digital UNIX

    +
    + +We have some newsgroup postings on how to +tune Digital UNIX 3.2 and 4.0. + +


    + + +

    Hewlett-Packard

    +
    + +Some documentation on tuning HP machines can be found at http://www.software.hp.com/internet/perf/tuning.html. + +


    + + +

    Linux

    +
    + +The most common problem on Linux shows up on heavily-loaded systems +where the whole server will appear to freeze for a couple of minutes +at a time, and then come back to life. This has been traced to a +listen() queue overload - certain Linux implementations have a low +value set for the incoming connection queue which can cause problems. +Please see our Using Apache on +Linux page for more info on how to fix this. + +


    + + +

    SGI

    + +
    + +


    + + +

    Solaris 2.4

    +
    + +The Solaris 2.4 TCP implementation has a few inherent limitations that +only became apparent under heavy loads. This has been fixed to some +extent in 2.5 (and completely revamped in 2.6), but for now consult +the following URL for tips on how to expand the capabilities if you +are finding slowdowns and lags are hurting performance. + + + +


    + + +

    SunOS 4.x

    +
    + +More information on tuning SOMAXCONN on SunOS can be found at + +http://www.islandnet.com/~mark/somaxconn.html. + +


    + +

    More welcome!

    + +If you have tips to contribute, send mail to brian@organic.com + +


    +Home +Index + + -- cgit v1.2.1