summaryrefslogtreecommitdiff
path: root/bdb/docs/ref/program/diskspace.html
diff options
context:
space:
mode:
Diffstat (limited to 'bdb/docs/ref/program/diskspace.html')
-rw-r--r--bdb/docs/ref/program/diskspace.html145
1 files changed, 0 insertions, 145 deletions
diff --git a/bdb/docs/ref/program/diskspace.html b/bdb/docs/ref/program/diskspace.html
deleted file mode 100644
index fb8425d8a26..00000000000
--- a/bdb/docs/ref/program/diskspace.html
+++ /dev/null
@@ -1,145 +0,0 @@
-<!--$Id: diskspace.so,v 10.9 2000/03/22 21:56:11 bostic Exp $-->
-<!--Copyright 1997, 1998, 1999, 2000 by Sleepycat Software, Inc.-->
-<!--All rights reserved.-->
-<html>
-<head>
-<title>Berkeley DB Reference Guide: Disk space requirements</title>
-<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
-<meta name="keywords" content="embedded,database,programmatic,toolkit,b+tree,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,java,C,C++">
-</head>
-<body bgcolor=white>
- <a name="2"><!--meow--></a>
-<table><tr valign=top>
-<td><h3><dl><dt>Berkeley DB Reference Guide:<dd>Programmer Notes</dl></h3></td>
-<td width="1%"><a href="../../ref/program/byteorder.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/program/compatible.html"><img src="../../images/next.gif" alt="Next"></a>
-</td></tr></table>
-<p>
-<h1 align=center>Disk space requirements</h1>
-<p>It is possible to estimate the total database size based on the size of
-the data. Simply put, the following calculations attempt to figure out
-how many bytes you will need to hold a set of data and then how many pages
-it will take to actually store it on disk.
-<p>Space freed by deleting key/data pairs from a Btree or Hash database is
-never returned to the filesystem, although it is reused where possible.
-This means that the Btree and Hash databases are grow-only. If enough
-keys are deleted from a database that shrinking the underlying file is
-desirable, you should create a new database and insert the records from
-the old one into it.
-<p>These are rough estimates at best. For example, they do not take into
-account overflow records, filesystem metadata information, or real-life
-situations where the sizes of key and data items are wildly variable, and
-the page-fill factor changes over time.
-<h3>Btree</h3>
-<p>The formulas for the Btree access method are as follows:
-<p><blockquote><pre>useful-bytes-per-page = (page-size - page-overhead) * page-fill-factor
-<p>
-bytes-of-data = n-records *
- (bytes-per-entry + page-overhead-for-two-entries)
-<p>
-n-pages-of-data = bytes-of-data / bytes-per-page
-<p>
-total-pages-on-disk = n-pages-of-data * page-size
-</pre></blockquote>
-<p>The <b>useful-bytes-per-page</b> is a measure of the bytes on each page
-that will actually hold the application data. It is computed as the total
-number of bytes on the page that are available to hold application data,
-corrected by the percentage of the page that is likely to contain data.
-The reason for this correction is that the percentage of a page that
-contains application data can vary from close to 50% after a page split,
-to almost 100% if the entries in the database were inserted in sorted
-order. Obviously, the <b>page-fill-factor</b> can drastically alter
-the amount of disk space required to hold any particular data set. The
-page-fill factor of any existing database can be displayed using the
-<a href="../../utility/db_stat.html">db_stat</a> utility.
-<p>As an example, using an 8K page size, with an 85% page-fill factor, there
-are 6941 bytes of useful space on each page:
-<p><blockquote><pre>6941 = (8192 - 26) * .85</pre></blockquote>
-<p>The total <b>bytes-of-data</b> is an easy calculation: it is the number
-of key/data pairs plus the overhead required to store each pair on a page.
-The overhead to store a single item on a Btree page is 5 bytes. So,
-assuming 60,000,000 key/data pairs, each of which is 8 bytes long, there
-are 1440000000 bytes, or roughly 1.34GB, of total data:
-<p><blockquote><pre>1560000000 = 60000000 * ((8 * 2) + (5 * 2))</pre></blockquote>
-<p>The total pages of data, <b>n-pages-of-data</b>, is the
-<b>bytes-of-data</b> divided by the <b>useful-bytes-per-page</b>. In
-the example, there are 224751 pages of data.
-<p><blockquote><pre>224751 = 1560000000 / 6941</pre></blockquote>
-<p>The total bytes of disk space for the database is <b>n-pages-of-data</b>
-multiplied by the <b>page-size</b>. In the example, the result is
-1841160192 bytes, or roughly 1.71GB.
-<p><blockquote><pre>1841160192 = 224751 * 8192</pre></blockquote>
-<h3>Hash</h3>
-<p>The formulas for the Hash access method are as follows:
-<p><blockquote><pre>useful-bytes-per-page = (page-size - page-overhead)
-<p>
-bytes-of-data = n-records *
- (bytes-per-entry + page-overhead-for-two-entries)
-<p>
-n-pages-of-data = bytes-of-data / bytes-per-page
-<p>
-total-pages-on-disk = n-pages-of-data * page-size
-</pre></blockquote>
-<p>The <b>useful-bytes-per-page</b> is a measure of the bytes on each page
-that will actually hold the application data. It is computed as the total
-number of bytes on the page that are available to hold application data.
-If the application has explicitly set a page fill factor, then pages will
-not necessarily be kept full. For databases with a preset fill factor,
-see the calculation below. The page-overhead for Hash databases is 26
-bytes and the page-overhead-for-two-entries is 6 bytes.
-<p>As an example, using an 8K page size, there are 8166 bytes of useful space
-on each page:
-<p><blockquote><pre>8166 = (8192 - 26)</pre></blockquote>
-<p>The total <b>bytes-of-data</b> is an easy calculation: it is the number
-of key/data pairs plus the overhead required to store each pair on a page.
-In this case that's 6 bytes per pair. So, assuming 60,000,000 key/data
-pairs, each of which is 8 bytes long, there are 1320000000 bytes, or
-roughly 1.23GB, of total data:
-<p><blockquote><pre>1320000000 = 60000000 * ((16 + 6))</pre></blockquote>
-<p>The total pages of data, <b>n-pages-of-data</b>, is the
-<b>bytes-of-data</b> divided by the <b>useful-bytes-per-page</b>. In
-this example, there are 161646 pages of data.
-<p><blockquote><pre>161646 = 1320000000 / 8166</pre></blockquote>
-<p>The total bytes of disk space for the database is <b>n-pages-of-data</b>
-multiplied by the <b>page-size</b>. In the example, the result is
-1324204032 bytes, or roughly 1.23GB.
-<p><blockquote><pre>1324204032 = 161646 * 8192</pre></blockquote>
-<p>Now, let's assume that the application specified a fill factor explicitly.
-The fill factor indicates the target number of items to place on a single
-page (a fill factor might reduce the utilization of each page, but it can
-be useful in avoiding splits and preventing buckets from becoming too
-large. Using our estimates above, each item is 22 bytes (16 + 6) and
-there are 8166 useful bytes on a page (8192 - 26). That means that, on
-average, you can fit 371 pairs per page.
-<p><blockquote><pre>371 = 8166 / 22</pre></blockquote>
-<p>However, let's assume that the application designer knows that while most
-items are 8 bytes, they can sometimes be as large as 10 and it's very
-important to avoid overflowing buckets and splitting. Then, the
-application might specify a fill factor of 314.
-<p><blockquote><pre>314 = 8166 / 26</pre></blockquote>
-<p>With a fill factor of 314, then the formula for computing database size
-is:
-<p><blockquote><pre>npages = npairs / pairs-per-page</pre></blockquote>
-<p>or 191082.
-<p><blockquote><pre>191082 = 60000000 / 314</pre></blockquote>
-<p>At 191082 pages, the total database size would be 1565343744 or 1.46GB.
-<p><blockquote><pre>1565343744 = 191082 * 8192 </pre></blockquote>
-<p>There are a few additional caveats with respect to Hash databases. This
-discussion assumes that the hash function does a good job of evenly
-distributing keys among hash buckets. If the function does not do this,
-you may find your table growing significantly larger than you expected.
-Secondly, in order to provide support for Hash databases co-existing with
-other databases in a single file, pages within a Hash database are
-allocated in power-of-2 chunks. That means that a Hash database with 65
-buckets will take up as much space as a Hash database with 128 buckets;
-each time the Hash database grows beyond its current power-of-two number
-of buckets, it allocates space for the next power-of-two buckets. This
-space may be sparsely allocated in the file system, but the files will
-appear to be their full size. Finally, because of this need for
-contiguous allocation, overflow pages and duplicate pages can be allocated
-only at specific points in the file, and this too can lead to sparse hash
-tables.
-<table><tr><td><br></td><td width="1%"><a href="../../ref/program/byteorder.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/program/compatible.html"><img src="../../images/next.gif" alt="Next"></a>
-</td></tr></table>
-<p><font size=1><a href="http://www.sleepycat.com">Copyright Sleepycat Software</a></font>
-</body>
-</html>