diff options
Diffstat (limited to 'libio/dbz/dbz.3z')
-rw-r--r-- | libio/dbz/dbz.3z | 547 |
1 files changed, 547 insertions, 0 deletions
diff --git a/libio/dbz/dbz.3z b/libio/dbz/dbz.3z new file mode 100644 index 00000000000..6df25311c70 --- /dev/null +++ b/libio/dbz/dbz.3z @@ -0,0 +1,547 @@ +.TH DBZ 3Z "3 Feb 1991" +.BY "C News" +.SH NAME +dbminit, fetch, store, dbmclose \- somewhat dbm-compatible database routines +.br +dbzfresh, dbzagain, dbzfetch, dbzstore \- database routines +.br +dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug \- database routines +.SH SYNOPSIS +.nf +.B #include <dbz.h> +.PP +.B dbminit(base) +.B char *base; +.PP +.B datum +.B fetch(key) +.B datum key; +.PP +.B store(key, value) +.B datum key; +.B datum value; +.PP +.B dbmclose() +.PP +.B dbzfresh(base, size, fieldsep, cmap, tagmask) +.B char *base; +.B long size; +.B int fieldsep; +.B int cmap; +.B long tagmask; +.PP +.B dbzagain(base, oldbase) +.B char *base; +.B char *oldbase; +.PP +.B datum +.B dbzfetch(key) +.B datum key; +.PP +.B dbzstore(key, value) +.B datum key; +.B datum value; +.PP +.B dbzsync() +.PP +.B long +.B dbzsize(nentries) +.B long nentries; +.PP +.B dbzincore(newvalue) +.PP +.B dbzcancel() +.PP +.B dbzdebug(newvalue) +.SH DESCRIPTION +These functions provide an indexing system for rapid random access to a +text file (the +.I base +.IR file ). +Subject to certain constraints, they are call-compatible with +.IR dbm (3), +although they also provide some extensions. +(Note that they are +.I not +file-compatible with +.I dbm +or any variant thereof.) +.PP +In principle, +.I dbz +stores key-value pairs, where both key and value are arbitrary sequences +of bytes, specified to the functions by +values of type +.IR datum , +typedefed in the header file to be a structure with members +.I dptr +(a value of type +.I char * +pointing to the bytes) +and +.I dsize +(a value of type +.I int +indicating how long the byte sequence is). +.PP +In practice, +.I dbz +is more restricted than +.IR dbm . +A +.I dbz +database +must be an index into a base file, +with the database +.IR value s +being +.IR fseek (3) +offsets into the base file. +Each such +.I value +must ``point to'' a place in the base file where the corresponding +.I key +sequence is found. +A key can be no longer than +.SM DBZMAXKEY +(a constant defined in the header file) bytes. +No key can be an initial subsequence of another, +which in most applications requires that keys be +either bracketed or terminated in some way (see the +discussion of the +.I fieldsep +parameter of +.IR dbzfresh , +below, +for a fine point on terminators). +.PP +.I Dbminit +opens a database, +an index into the base file +.IR base , +consisting of files +.IB base .dir +and +.IB base .pag +which must already exist. +(If the database is new, they should be zero-length files.) +Subsequent accesses go to that database until +.I dbmclose +is called to close the database. +The base file need not exist at the time of the +.IR dbminit , +but it must exist before accesses are attempted. +.PP +.I Fetch +searches the database for the specified +.IR key , +returning the corresponding +.IR value +if any. +.I Store +stores the +.IR key - value +pair in the database. +.I Store +will fail unless the database files are writeable. +See below for a complication arising from case mapping. +.PP +.I Dbzfresh +is a variant of +.I dbminit +for creating a new database with more control over details. +Unlike for +.IR dbminit , +the database files need not exist: +they will be created if necessary, +and truncated in any case. +.PP +.IR Dbzfresh 's +.I size +parameter specifies the size of the first hash table within the database, +in key-value pairs. +Performance will be best if +.I size +is a prime number and +the number of key-value pairs stored in the database does not exceed +about 2/3 of +.IR size . +(The +.I dbzsize +function, given the expected number of key-value pairs, +will suggest a database size that meets these criteria.) +Assuming that an +.I fseek +offset is 4 bytes, +the +.B .pag +file will be +.RI 4* size +bytes +(the +.B .dir +file is tiny and roughly constant in size) +until +the number of key-value pairs exceeds about 80% of +.IR size . +(Nothing awful will happen if the database grows beyond 100% of +.IR size , +but accesses will slow down somewhat and the +.B .pag +file will grow somewhat.) +.PP +.IR Dbzfresh 's +.I fieldsep +parameter specifies the field separator in the base file. +If this is not +NUL (0), and the last character of a +.I key +argument is NUL, that NUL compares equal to either a NUL or a +.I fieldsep +in the base file. +This permits use of NUL to terminate key strings without requiring that +NULs appear in the base file. +The +.I fieldsep +of a database created with +.I dbminit +is the horizontal-tab character. +.PP +For use in news systems, various forms of case mapping (e.g. uppercase to +lowercase) in keys are available. +The +.I cmap +parameter to +.I dbzfresh +is a single character specifying which of several mapping algorithms to use. +Available algorithms are: +.RS +.TP +.B 0 +case-sensitive: no case mapping +.TP +.B B +same as +.B 0 +.TP +.B NUL +same as +.B 0 +.TP +.B = +case-insensitive: uppercase and lowercase equivalent +.TP +.B b +same as +.B = +.TP +.B C +RFC822 message-ID rules, case-sensitive before `@' (with certain exceptions) +and case-insensitive after +.TP +.B ? +whatever the local default is, normally +.B C +.RE +.PP +Mapping algorithm +.B 0 +(no mapping) is faster than the others and is overwhelmingly the correct +choice for most applications. +Unless compatibility constraints interfere, it is more efficient to pre-map +the keys, storing mapped keys in the base file, than to have +.I dbz +do the mapping on every search. +.PP +For historical reasons, +.I fetch +and +.I store +expect their +.I key +arguments to be pre-mapped, but expect unmapped keys in the base file. +.I Dbzfetch +and +.I dbzstore +do the same jobs but handle all case mapping internally, +so the customer need not worry about it. +.PP +.I Dbz +stores only the database +.IR value s +in its files, relying on reference to the base file to confirm a hit on a key. +References to the base file can be minimized, greatly speeding up searches, +if a little bit of information about the keys can be stored in the +.I dbz +files. +This is ``free'' if there are some unused bits in an +.I fseek +offset, +so that the offset can be +.I tagged +with some information about the key. +The +.I tagmask +parameter of +.I dbzfresh +allows specifying the location of unused bits. +.I Tagmask +should be a mask with +one group of +contiguous +.B 1 +bits. +The bits in the mask should +be unused (0) in +.I most +offsets. +The bit immediately above the mask (the +.I flag +bit) should be unused (0) in +.I all +offsets; +.I (dbz)store +will reject attempts to store a key-value pair in which the +.I value +has the flag bit on. +Apart from this restriction, tagging is invisible to the user. +As a special case, a +.I tagmask +of 1 means ``no tagging'', for use with enormous base files or +on systems with unusual offset representations. +.PP +A +.I size +of 0 +given to +.I dbzfresh +is synonymous with the local default; +the normal default is suitable for tables of 90-100,000 +key-value pairs. +A +.I cmap +of 0 (NUL) is synonymous with the character +.BR 0 , +signifying no case mapping +(note that the character +.B ? +specifies the local default mapping, +normally +.BR C ). +A +.I tagmask +of 0 is synonymous with the local default tag mask, +normally 0x7f000000 (specifying the top bit in a 32-bit offset +as the flag bit, and the next 7 bits as the mask, +which is suitable for base files up to circa 24MB). +Calling +.I dbminit(name) +with the database files empty is equivalent to calling +.IR dbzfresh(name,0,'\et','?',0) . +.PP +When databases are regenerated periodically, as in news, +it is simplest to pick the parameters for a new database based on the old one. +This also permits some memory of past sizes of the old database, so that +a new database size can be chosen to cover expected fluctuations. +.I Dbzagain +is a variant of +.I dbminit +for creating a new database as a new generation of an old database. +The database files for +.I oldbase +must exist. +.I Dbzagain +is equivalent to calling +.I dbzfresh +with the same field separator, case mapping, and tag mask as the old database, +and a +.I size +equal to the result of applying +.I dbzsize +to the largest number of entries in the +.I oldbase +database and its previous 10 generations. +.PP +When many accesses are being done by the same program, +.I dbz +is massively faster if its first hash table is in memory. +If an internal flag is 1, +an attempt is made to read the table in when +the database is opened, and +.I dbmclose +writes it out to disk again (if it was read successfully and +has been modified). +.I Dbzincore +sets the flag to +.I newvalue +(which should be 0 or 1) +and returns the previous value; +this does not affect the status of a database that has already been opened. +The default is 0. +The attempt to read the table in may fail due to memory shortage; +in this case +.I dbz +quietly falls back on its default behavior. +.IR Store s +to an in-memory database are not (in general) written out to the file +until +.IR dbmclose +or +.IR dbzsync , +so if robustness in the presence of crashes +or concurrent accesses +is crucial, in-memory databases +should probably be avoided. +.PP +.I Dbzsync +causes all buffers etc. to be flushed out to the files. +It is typically used as a precaution against crashes or concurrent accesses +when a +.IR dbz -using +process will be running for a long time. +It is a somewhat expensive operation, +especially +for an in-memory database. +.PP +.I Dbzcancel +cancels any pending writes from buffers. +This is typically useful only for in-core databases, since writes are +otherwise done immediately. +Its main purpose is to let a child process, in the wake of a +.IR fork , +do a +.I dbmclose +without writing its parent's data to disk. +.PP +If +.I dbz +has been compiled with debugging facilities available (which makes it +bigger and a bit slower), +.I dbzdebug +alters the value (and returns the previous value) of an internal flag +which (when 1; default is 0) causes +verbose and cryptic debugging output on standard output. +.PP +Concurrent reading of databases is fairly safe, +but there is no (inter)locking, +so concurrent updating is not. +.PP +The database files include a record of the byte order of the processor +creating the database, and accesses by processors with different byte +order will work, although they will be slightly slower. +Byte order is preserved by +.IR dbzagain . +However, +agreement on the size and internal structure of an +.I fseek +offset is necessary, as is consensus on +the character set. +.PP +An open database occupies three +.I stdio +streams and their corresponding file descriptors; +a fourth is needed for an in-memory database. +Memory consumption is negligible (except for +.I stdio +buffers) except for in-memory databases. +.SH SEE ALSO +dbz(1), dbm(3) +.SH DIAGNOSTICS +Functions returning +.I int +values return 0 for success, \-1 for failure. +Functions returning +.I datum +values return a value with +.I dptr +set to NULL for failure. +.I Dbminit +attempts to have +.I errno +set plausibly on return, but otherwise this is not guaranteed. +An +.I errno +of +.B EDOM +from +.I dbminit +indicates that the database did not appear to be in +.I dbz +format. +.SH HISTORY +The original +.I dbz +was written by +Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us). +Later contributions by David Butler and Mark Moraes. +Extensive reworking, +including this documentation, +by Henry Spencer (henry@zoo.toronto.edu) as +part of the C News project. +Hashing function by Peter Honeyman. +.SH BUGS +The +.I dptr +members of returned +.I datum +values point to static storage which is overwritten by later calls. +.PP +Unlike +.IR dbm , +.I dbz +will misbehave if an existing key-value pair is `overwritten' by +a new +.I (dbz)store +with the same key. +The user is responsible for avoiding this by using +.I (dbz)fetch +first to check for duplicates; +an internal optimization remembers the result of the +first search so there is minimal overhead in this. +.PP +Waiting until after +.I dbminit +to bring the base file into existence +will fail if +.IR chdir (2) +has been used meanwhile. +.PP +The RFC822 case mapper implements only a first approximation to the +hideously-complex RFC822 case rules. +.PP +The prime finder in +.I dbzsize +is not particularly quick. +.PP +Should implement the +.I dbm +functions +.IR delete , +.IR firstkey , +and +.IR nextkey . +.PP +On C implementations which trap integer overflow, +.I dbz +will refuse to +.I (dbz)store +an +.I fseek +offset equal to the greatest +representable +positive number, +as this would cause overflow in the biased representation used. +.PP +.I Dbzagain +perhaps ought to notice when many offsets +in the old database were +too big for +tagging, and shrink the tag mask to match. +.PP +Marking +.IR dbz 's +file descriptors +.RI close-on- exec +would be a better approach to the problem +.I dbzcancel +tries to address, but that's harder to do portably. |