summaryrefslogtreecommitdiff
path: root/libio/dbz/dbz.3z
diff options
context:
space:
mode:
Diffstat (limited to 'libio/dbz/dbz.3z')
-rw-r--r--libio/dbz/dbz.3z547
1 files changed, 547 insertions, 0 deletions
diff --git a/libio/dbz/dbz.3z b/libio/dbz/dbz.3z
new file mode 100644
index 00000000000..6df25311c70
--- /dev/null
+++ b/libio/dbz/dbz.3z
@@ -0,0 +1,547 @@
+.TH DBZ 3Z "3 Feb 1991"
+.BY "C News"
+.SH NAME
+dbminit, fetch, store, dbmclose \- somewhat dbm-compatible database routines
+.br
+dbzfresh, dbzagain, dbzfetch, dbzstore \- database routines
+.br
+dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug \- database routines
+.SH SYNOPSIS
+.nf
+.B #include <dbz.h>
+.PP
+.B dbminit(base)
+.B char *base;
+.PP
+.B datum
+.B fetch(key)
+.B datum key;
+.PP
+.B store(key, value)
+.B datum key;
+.B datum value;
+.PP
+.B dbmclose()
+.PP
+.B dbzfresh(base, size, fieldsep, cmap, tagmask)
+.B char *base;
+.B long size;
+.B int fieldsep;
+.B int cmap;
+.B long tagmask;
+.PP
+.B dbzagain(base, oldbase)
+.B char *base;
+.B char *oldbase;
+.PP
+.B datum
+.B dbzfetch(key)
+.B datum key;
+.PP
+.B dbzstore(key, value)
+.B datum key;
+.B datum value;
+.PP
+.B dbzsync()
+.PP
+.B long
+.B dbzsize(nentries)
+.B long nentries;
+.PP
+.B dbzincore(newvalue)
+.PP
+.B dbzcancel()
+.PP
+.B dbzdebug(newvalue)
+.SH DESCRIPTION
+These functions provide an indexing system for rapid random access to a
+text file (the
+.I base
+.IR file ).
+Subject to certain constraints, they are call-compatible with
+.IR dbm (3),
+although they also provide some extensions.
+(Note that they are
+.I not
+file-compatible with
+.I dbm
+or any variant thereof.)
+.PP
+In principle,
+.I dbz
+stores key-value pairs, where both key and value are arbitrary sequences
+of bytes, specified to the functions by
+values of type
+.IR datum ,
+typedefed in the header file to be a structure with members
+.I dptr
+(a value of type
+.I char *
+pointing to the bytes)
+and
+.I dsize
+(a value of type
+.I int
+indicating how long the byte sequence is).
+.PP
+In practice,
+.I dbz
+is more restricted than
+.IR dbm .
+A
+.I dbz
+database
+must be an index into a base file,
+with the database
+.IR value s
+being
+.IR fseek (3)
+offsets into the base file.
+Each such
+.I value
+must ``point to'' a place in the base file where the corresponding
+.I key
+sequence is found.
+A key can be no longer than
+.SM DBZMAXKEY
+(a constant defined in the header file) bytes.
+No key can be an initial subsequence of another,
+which in most applications requires that keys be
+either bracketed or terminated in some way (see the
+discussion of the
+.I fieldsep
+parameter of
+.IR dbzfresh ,
+below,
+for a fine point on terminators).
+.PP
+.I Dbminit
+opens a database,
+an index into the base file
+.IR base ,
+consisting of files
+.IB base .dir
+and
+.IB base .pag
+which must already exist.
+(If the database is new, they should be zero-length files.)
+Subsequent accesses go to that database until
+.I dbmclose
+is called to close the database.
+The base file need not exist at the time of the
+.IR dbminit ,
+but it must exist before accesses are attempted.
+.PP
+.I Fetch
+searches the database for the specified
+.IR key ,
+returning the corresponding
+.IR value
+if any.
+.I Store
+stores the
+.IR key - value
+pair in the database.
+.I Store
+will fail unless the database files are writeable.
+See below for a complication arising from case mapping.
+.PP
+.I Dbzfresh
+is a variant of
+.I dbminit
+for creating a new database with more control over details.
+Unlike for
+.IR dbminit ,
+the database files need not exist:
+they will be created if necessary,
+and truncated in any case.
+.PP
+.IR Dbzfresh 's
+.I size
+parameter specifies the size of the first hash table within the database,
+in key-value pairs.
+Performance will be best if
+.I size
+is a prime number and
+the number of key-value pairs stored in the database does not exceed
+about 2/3 of
+.IR size .
+(The
+.I dbzsize
+function, given the expected number of key-value pairs,
+will suggest a database size that meets these criteria.)
+Assuming that an
+.I fseek
+offset is 4 bytes,
+the
+.B .pag
+file will be
+.RI 4* size
+bytes
+(the
+.B .dir
+file is tiny and roughly constant in size)
+until
+the number of key-value pairs exceeds about 80% of
+.IR size .
+(Nothing awful will happen if the database grows beyond 100% of
+.IR size ,
+but accesses will slow down somewhat and the
+.B .pag
+file will grow somewhat.)
+.PP
+.IR Dbzfresh 's
+.I fieldsep
+parameter specifies the field separator in the base file.
+If this is not
+NUL (0), and the last character of a
+.I key
+argument is NUL, that NUL compares equal to either a NUL or a
+.I fieldsep
+in the base file.
+This permits use of NUL to terminate key strings without requiring that
+NULs appear in the base file.
+The
+.I fieldsep
+of a database created with
+.I dbminit
+is the horizontal-tab character.
+.PP
+For use in news systems, various forms of case mapping (e.g. uppercase to
+lowercase) in keys are available.
+The
+.I cmap
+parameter to
+.I dbzfresh
+is a single character specifying which of several mapping algorithms to use.
+Available algorithms are:
+.RS
+.TP
+.B 0
+case-sensitive: no case mapping
+.TP
+.B B
+same as
+.B 0
+.TP
+.B NUL
+same as
+.B 0
+.TP
+.B =
+case-insensitive: uppercase and lowercase equivalent
+.TP
+.B b
+same as
+.B =
+.TP
+.B C
+RFC822 message-ID rules, case-sensitive before `@' (with certain exceptions)
+and case-insensitive after
+.TP
+.B ?
+whatever the local default is, normally
+.B C
+.RE
+.PP
+Mapping algorithm
+.B 0
+(no mapping) is faster than the others and is overwhelmingly the correct
+choice for most applications.
+Unless compatibility constraints interfere, it is more efficient to pre-map
+the keys, storing mapped keys in the base file, than to have
+.I dbz
+do the mapping on every search.
+.PP
+For historical reasons,
+.I fetch
+and
+.I store
+expect their
+.I key
+arguments to be pre-mapped, but expect unmapped keys in the base file.
+.I Dbzfetch
+and
+.I dbzstore
+do the same jobs but handle all case mapping internally,
+so the customer need not worry about it.
+.PP
+.I Dbz
+stores only the database
+.IR value s
+in its files, relying on reference to the base file to confirm a hit on a key.
+References to the base file can be minimized, greatly speeding up searches,
+if a little bit of information about the keys can be stored in the
+.I dbz
+files.
+This is ``free'' if there are some unused bits in an
+.I fseek
+offset,
+so that the offset can be
+.I tagged
+with some information about the key.
+The
+.I tagmask
+parameter of
+.I dbzfresh
+allows specifying the location of unused bits.
+.I Tagmask
+should be a mask with
+one group of
+contiguous
+.B 1
+bits.
+The bits in the mask should
+be unused (0) in
+.I most
+offsets.
+The bit immediately above the mask (the
+.I flag
+bit) should be unused (0) in
+.I all
+offsets;
+.I (dbz)store
+will reject attempts to store a key-value pair in which the
+.I value
+has the flag bit on.
+Apart from this restriction, tagging is invisible to the user.
+As a special case, a
+.I tagmask
+of 1 means ``no tagging'', for use with enormous base files or
+on systems with unusual offset representations.
+.PP
+A
+.I size
+of 0
+given to
+.I dbzfresh
+is synonymous with the local default;
+the normal default is suitable for tables of 90-100,000
+key-value pairs.
+A
+.I cmap
+of 0 (NUL) is synonymous with the character
+.BR 0 ,
+signifying no case mapping
+(note that the character
+.B ?
+specifies the local default mapping,
+normally
+.BR C ).
+A
+.I tagmask
+of 0 is synonymous with the local default tag mask,
+normally 0x7f000000 (specifying the top bit in a 32-bit offset
+as the flag bit, and the next 7 bits as the mask,
+which is suitable for base files up to circa 24MB).
+Calling
+.I dbminit(name)
+with the database files empty is equivalent to calling
+.IR dbzfresh(name,0,'\et','?',0) .
+.PP
+When databases are regenerated periodically, as in news,
+it is simplest to pick the parameters for a new database based on the old one.
+This also permits some memory of past sizes of the old database, so that
+a new database size can be chosen to cover expected fluctuations.
+.I Dbzagain
+is a variant of
+.I dbminit
+for creating a new database as a new generation of an old database.
+The database files for
+.I oldbase
+must exist.
+.I Dbzagain
+is equivalent to calling
+.I dbzfresh
+with the same field separator, case mapping, and tag mask as the old database,
+and a
+.I size
+equal to the result of applying
+.I dbzsize
+to the largest number of entries in the
+.I oldbase
+database and its previous 10 generations.
+.PP
+When many accesses are being done by the same program,
+.I dbz
+is massively faster if its first hash table is in memory.
+If an internal flag is 1,
+an attempt is made to read the table in when
+the database is opened, and
+.I dbmclose
+writes it out to disk again (if it was read successfully and
+has been modified).
+.I Dbzincore
+sets the flag to
+.I newvalue
+(which should be 0 or 1)
+and returns the previous value;
+this does not affect the status of a database that has already been opened.
+The default is 0.
+The attempt to read the table in may fail due to memory shortage;
+in this case
+.I dbz
+quietly falls back on its default behavior.
+.IR Store s
+to an in-memory database are not (in general) written out to the file
+until
+.IR dbmclose
+or
+.IR dbzsync ,
+so if robustness in the presence of crashes
+or concurrent accesses
+is crucial, in-memory databases
+should probably be avoided.
+.PP
+.I Dbzsync
+causes all buffers etc. to be flushed out to the files.
+It is typically used as a precaution against crashes or concurrent accesses
+when a
+.IR dbz -using
+process will be running for a long time.
+It is a somewhat expensive operation,
+especially
+for an in-memory database.
+.PP
+.I Dbzcancel
+cancels any pending writes from buffers.
+This is typically useful only for in-core databases, since writes are
+otherwise done immediately.
+Its main purpose is to let a child process, in the wake of a
+.IR fork ,
+do a
+.I dbmclose
+without writing its parent's data to disk.
+.PP
+If
+.I dbz
+has been compiled with debugging facilities available (which makes it
+bigger and a bit slower),
+.I dbzdebug
+alters the value (and returns the previous value) of an internal flag
+which (when 1; default is 0) causes
+verbose and cryptic debugging output on standard output.
+.PP
+Concurrent reading of databases is fairly safe,
+but there is no (inter)locking,
+so concurrent updating is not.
+.PP
+The database files include a record of the byte order of the processor
+creating the database, and accesses by processors with different byte
+order will work, although they will be slightly slower.
+Byte order is preserved by
+.IR dbzagain .
+However,
+agreement on the size and internal structure of an
+.I fseek
+offset is necessary, as is consensus on
+the character set.
+.PP
+An open database occupies three
+.I stdio
+streams and their corresponding file descriptors;
+a fourth is needed for an in-memory database.
+Memory consumption is negligible (except for
+.I stdio
+buffers) except for in-memory databases.
+.SH SEE ALSO
+dbz(1), dbm(3)
+.SH DIAGNOSTICS
+Functions returning
+.I int
+values return 0 for success, \-1 for failure.
+Functions returning
+.I datum
+values return a value with
+.I dptr
+set to NULL for failure.
+.I Dbminit
+attempts to have
+.I errno
+set plausibly on return, but otherwise this is not guaranteed.
+An
+.I errno
+of
+.B EDOM
+from
+.I dbminit
+indicates that the database did not appear to be in
+.I dbz
+format.
+.SH HISTORY
+The original
+.I dbz
+was written by
+Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us).
+Later contributions by David Butler and Mark Moraes.
+Extensive reworking,
+including this documentation,
+by Henry Spencer (henry@zoo.toronto.edu) as
+part of the C News project.
+Hashing function by Peter Honeyman.
+.SH BUGS
+The
+.I dptr
+members of returned
+.I datum
+values point to static storage which is overwritten by later calls.
+.PP
+Unlike
+.IR dbm ,
+.I dbz
+will misbehave if an existing key-value pair is `overwritten' by
+a new
+.I (dbz)store
+with the same key.
+The user is responsible for avoiding this by using
+.I (dbz)fetch
+first to check for duplicates;
+an internal optimization remembers the result of the
+first search so there is minimal overhead in this.
+.PP
+Waiting until after
+.I dbminit
+to bring the base file into existence
+will fail if
+.IR chdir (2)
+has been used meanwhile.
+.PP
+The RFC822 case mapper implements only a first approximation to the
+hideously-complex RFC822 case rules.
+.PP
+The prime finder in
+.I dbzsize
+is not particularly quick.
+.PP
+Should implement the
+.I dbm
+functions
+.IR delete ,
+.IR firstkey ,
+and
+.IR nextkey .
+.PP
+On C implementations which trap integer overflow,
+.I dbz
+will refuse to
+.I (dbz)store
+an
+.I fseek
+offset equal to the greatest
+representable
+positive number,
+as this would cause overflow in the biased representation used.
+.PP
+.I Dbzagain
+perhaps ought to notice when many offsets
+in the old database were
+too big for
+tagging, and shrink the tag mask to match.
+.PP
+Marking
+.IR dbz 's
+file descriptors
+.RI close-on- exec
+would be a better approach to the problem
+.I dbzcancel
+tries to address, but that's harder to do portably.