diff options
Diffstat (limited to 'bdb/docs/ref/program/diskspace.html')
-rw-r--r-- | bdb/docs/ref/program/diskspace.html | 145 |
1 files changed, 0 insertions, 145 deletions
diff --git a/bdb/docs/ref/program/diskspace.html b/bdb/docs/ref/program/diskspace.html deleted file mode 100644 index fb8425d8a26..00000000000 --- a/bdb/docs/ref/program/diskspace.html +++ /dev/null @@ -1,145 +0,0 @@ -<!--$Id: diskspace.so,v 10.9 2000/03/22 21:56:11 bostic Exp $--> -<!--Copyright 1997, 1998, 1999, 2000 by Sleepycat Software, Inc.--> -<!--All rights reserved.--> -<html> -<head> -<title>Berkeley DB Reference Guide: Disk space requirements</title> -<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit."> -<meta name="keywords" content="embedded,database,programmatic,toolkit,b+tree,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,java,C,C++"> -</head> -<body bgcolor=white> - <a name="2"><!--meow--></a> -<table><tr valign=top> -<td><h3><dl><dt>Berkeley DB Reference Guide:<dd>Programmer Notes</dl></h3></td> -<td width="1%"><a href="../../ref/program/byteorder.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/program/compatible.html"><img src="../../images/next.gif" alt="Next"></a> -</td></tr></table> -<p> -<h1 align=center>Disk space requirements</h1> -<p>It is possible to estimate the total database size based on the size of -the data. Simply put, the following calculations attempt to figure out -how many bytes you will need to hold a set of data and then how many pages -it will take to actually store it on disk. -<p>Space freed by deleting key/data pairs from a Btree or Hash database is -never returned to the filesystem, although it is reused where possible. -This means that the Btree and Hash databases are grow-only. If enough -keys are deleted from a database that shrinking the underlying file is -desirable, you should create a new database and insert the records from -the old one into it. -<p>These are rough estimates at best. For example, they do not take into -account overflow records, filesystem metadata information, or real-life -situations where the sizes of key and data items are wildly variable, and -the page-fill factor changes over time. -<h3>Btree</h3> -<p>The formulas for the Btree access method are as follows: -<p><blockquote><pre>useful-bytes-per-page = (page-size - page-overhead) * page-fill-factor -<p> -bytes-of-data = n-records * - (bytes-per-entry + page-overhead-for-two-entries) -<p> -n-pages-of-data = bytes-of-data / bytes-per-page -<p> -total-pages-on-disk = n-pages-of-data * page-size -</pre></blockquote> -<p>The <b>useful-bytes-per-page</b> is a measure of the bytes on each page -that will actually hold the application data. It is computed as the total -number of bytes on the page that are available to hold application data, -corrected by the percentage of the page that is likely to contain data. -The reason for this correction is that the percentage of a page that -contains application data can vary from close to 50% after a page split, -to almost 100% if the entries in the database were inserted in sorted -order. Obviously, the <b>page-fill-factor</b> can drastically alter -the amount of disk space required to hold any particular data set. The -page-fill factor of any existing database can be displayed using the -<a href="../../utility/db_stat.html">db_stat</a> utility. -<p>As an example, using an 8K page size, with an 85% page-fill factor, there -are 6941 bytes of useful space on each page: -<p><blockquote><pre>6941 = (8192 - 26) * .85</pre></blockquote> -<p>The total <b>bytes-of-data</b> is an easy calculation: it is the number -of key/data pairs plus the overhead required to store each pair on a page. -The overhead to store a single item on a Btree page is 5 bytes. So, -assuming 60,000,000 key/data pairs, each of which is 8 bytes long, there -are 1440000000 bytes, or roughly 1.34GB, of total data: -<p><blockquote><pre>1560000000 = 60000000 * ((8 * 2) + (5 * 2))</pre></blockquote> -<p>The total pages of data, <b>n-pages-of-data</b>, is the -<b>bytes-of-data</b> divided by the <b>useful-bytes-per-page</b>. In -the example, there are 224751 pages of data. -<p><blockquote><pre>224751 = 1560000000 / 6941</pre></blockquote> -<p>The total bytes of disk space for the database is <b>n-pages-of-data</b> -multiplied by the <b>page-size</b>. In the example, the result is -1841160192 bytes, or roughly 1.71GB. -<p><blockquote><pre>1841160192 = 224751 * 8192</pre></blockquote> -<h3>Hash</h3> -<p>The formulas for the Hash access method are as follows: -<p><blockquote><pre>useful-bytes-per-page = (page-size - page-overhead) -<p> -bytes-of-data = n-records * - (bytes-per-entry + page-overhead-for-two-entries) -<p> -n-pages-of-data = bytes-of-data / bytes-per-page -<p> -total-pages-on-disk = n-pages-of-data * page-size -</pre></blockquote> -<p>The <b>useful-bytes-per-page</b> is a measure of the bytes on each page -that will actually hold the application data. It is computed as the total -number of bytes on the page that are available to hold application data. -If the application has explicitly set a page fill factor, then pages will -not necessarily be kept full. For databases with a preset fill factor, -see the calculation below. The page-overhead for Hash databases is 26 -bytes and the page-overhead-for-two-entries is 6 bytes. -<p>As an example, using an 8K page size, there are 8166 bytes of useful space -on each page: -<p><blockquote><pre>8166 = (8192 - 26)</pre></blockquote> -<p>The total <b>bytes-of-data</b> is an easy calculation: it is the number -of key/data pairs plus the overhead required to store each pair on a page. -In this case that's 6 bytes per pair. So, assuming 60,000,000 key/data -pairs, each of which is 8 bytes long, there are 1320000000 bytes, or -roughly 1.23GB, of total data: -<p><blockquote><pre>1320000000 = 60000000 * ((16 + 6))</pre></blockquote> -<p>The total pages of data, <b>n-pages-of-data</b>, is the -<b>bytes-of-data</b> divided by the <b>useful-bytes-per-page</b>. In -this example, there are 161646 pages of data. -<p><blockquote><pre>161646 = 1320000000 / 8166</pre></blockquote> -<p>The total bytes of disk space for the database is <b>n-pages-of-data</b> -multiplied by the <b>page-size</b>. In the example, the result is -1324204032 bytes, or roughly 1.23GB. -<p><blockquote><pre>1324204032 = 161646 * 8192</pre></blockquote> -<p>Now, let's assume that the application specified a fill factor explicitly. -The fill factor indicates the target number of items to place on a single -page (a fill factor might reduce the utilization of each page, but it can -be useful in avoiding splits and preventing buckets from becoming too -large. Using our estimates above, each item is 22 bytes (16 + 6) and -there are 8166 useful bytes on a page (8192 - 26). That means that, on -average, you can fit 371 pairs per page. -<p><blockquote><pre>371 = 8166 / 22</pre></blockquote> -<p>However, let's assume that the application designer knows that while most -items are 8 bytes, they can sometimes be as large as 10 and it's very -important to avoid overflowing buckets and splitting. Then, the -application might specify a fill factor of 314. -<p><blockquote><pre>314 = 8166 / 26</pre></blockquote> -<p>With a fill factor of 314, then the formula for computing database size -is: -<p><blockquote><pre>npages = npairs / pairs-per-page</pre></blockquote> -<p>or 191082. -<p><blockquote><pre>191082 = 60000000 / 314</pre></blockquote> -<p>At 191082 pages, the total database size would be 1565343744 or 1.46GB. -<p><blockquote><pre>1565343744 = 191082 * 8192 </pre></blockquote> -<p>There are a few additional caveats with respect to Hash databases. This -discussion assumes that the hash function does a good job of evenly -distributing keys among hash buckets. If the function does not do this, -you may find your table growing significantly larger than you expected. -Secondly, in order to provide support for Hash databases co-existing with -other databases in a single file, pages within a Hash database are -allocated in power-of-2 chunks. That means that a Hash database with 65 -buckets will take up as much space as a Hash database with 128 buckets; -each time the Hash database grows beyond its current power-of-two number -of buckets, it allocates space for the next power-of-two buckets. This -space may be sparsely allocated in the file system, but the files will -appear to be their full size. Finally, because of this need for -contiguous allocation, overflow pages and duplicate pages can be allocated -only at specific points in the file, and this too can lead to sparse hash -tables. -<table><tr><td><br></td><td width="1%"><a href="../../ref/program/byteorder.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/program/compatible.html"><img src="../../images/next.gif" alt="Next"></a> -</td></tr></table> -<p><font size=1><a href="http://www.sleepycat.com">Copyright Sleepycat Software</a></font> -</body> -</html> |