summaryrefslogtreecommitdiff
path: root/fat_size_calculation.tex
diff options
context:
space:
mode:
Diffstat (limited to 'fat_size_calculation.tex')
-rw-r--r--fat_size_calculation.tex228
1 files changed, 228 insertions, 0 deletions
diff --git a/fat_size_calculation.tex b/fat_size_calculation.tex
new file mode 100644
index 0000000..6737cca
--- /dev/null
+++ b/fat_size_calculation.tex
@@ -0,0 +1,228 @@
+% Copyright 2003,2005,2007 Alain Knaff.
+
+% This documentation is for Mtools which is a collection of tools to
+% allow Unix systems to manipulate MS-DOS files.
+
+% Permission is granted to copy, distribute and/or modify this document
+% under the terms of the GNU Free Documentation License, Version 1.3 or
+% any later version published by the Free Software Foundation; with no
+% Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+%Texts. A copy of the license is included in the section entitled
+% ``GNU Free Documentation License''.
+
+\documentclass[a4paper,12pt]{article}
+
+\usepackage[T1]{fontenc}
+\usepackage[latin1]{inputenc}
+\usepackage{pslatex}
+\usepackage[pdfpagemode=None,colorlinks]{hyperref}
+
+\author{Alain Knaff}
+\title{How mformat-3.9.10 and above calculates needed FAT size}
+
+\begin{document}
+
+\maketitle
+
+This small document explains the formula used by {\tt mformat.c} to
+figure out fat size and number of clusters. Due to the way that
+filesystem parameters are stored in the boot sector, we can afford to
+have a FAT that is larger than need to be to accomodate the clusters
+present on disk, but under no circumstances can we have one that is
+too small.
+
+In this discussion, we use the following variable names:
+
+\begin{tabular}{|r|p{12cm}|}
+
+\hline
+
+$fatNybls$&
+Number of nubbles (4 bit unit per FAT). This is 3 for FAT12, 4 for
+FAT16, and 8 for FAT16\\
+
+\hline
+
+$numClus$&
+Number of clusters on the disk\\
+
+\hline
+
+$clusSiz$&
+Size of a cluster, expressed in sectors\\
+
+\hline
+
+$secSiz$&
+Size of a sector, in bytes. Size of a sector in nybbles is $secSiz$ * 2\\
+
+\hline
+
+$nfats$&
+Number of FAT copies, usually two\\
+
+\hline
+
+$remSects$&
+``Remaining sectors'', after number of boot sectors and root directory
+have been accounted for\\
+
+\hline
+
+$fatLen$&
+Length of the FAT, in sectors\\
+
+\hline
+
+
+\end{tabular}
+
+\ \\
+
+Taking into account both data and fat, each cluster takes up the
+following number of nybbles (units of 4 bytes):
+
+
+$$clusSiz * (2*secSiz) + nfats * fatNybls$$
+
+This accounts for the data of the cluster ($clusSiz * secSiz$),
+as well as for the space taken up by its descriptor.
+
+The space taken up by all clusters together, plus the space taken by
+descriptors for clusters 0 and 1 ($2*nfats*fatNybls$) should be less
+than what is available.
+
+Additional sectors may be lost due to slack (you have to use a full
+FAT sector, you also have to use a full cluster in the data
+area). Thus, an {\em upper bound} for the number of clusters is as
+follows:
+
+$$
+numClus \le {2*remSect*secSiz - 2*nfats*fatNybls \over
+2*clusSiz*secSiz + nfats*fatNybls}
+$$
+
+
+$$
+numClus \le {(remSect+2*clusSiz)*2*secSiz \over
+2*clusSiz*secSiz + nfats*fatNybls} - 2
+$$
+
+
+These will take up at most the following space in one copy of the FAT
+(we have to round up, because even a half-full fat sector must be
+taken in its entirety)
+
+$$
+fatLen \le \left\lceil { (numClus+2)*fatNybls \over secSiz * 2 } \right\rceil
+$$
+
+
+$$
+fatLen \le \left\lceil {
+\left( { 2*(remSect+2*clusSiz)*secSiz \over
+2*clusSiz*secSiz + nfats*fatNybls} \right) * fatNybls \over
+2*secSiz
+} \right\rceil
+$$
+
+
+$$
+fatLen \le \left\lceil {
+(remSect+2*clusSiz)* fatNybls \over
+2*clusSiz*secSiz + nfats*fatNybls
+} \right\rceil
+$$
+
+The space left after FAT sector has been deduced is now {\em less than
+or equal} to what would be needed for the data area of the clusters
+(including fractional clusters), which is good, as we may have under
+no circumstances {\em more} clusters in the data area than in the FAT.
+An important point in this calculation is that we based the needed FAT
+size on a {\em fractional} number of clusters, rather than a rounded
+down amount of clusters. Indeed, using a rounded down number could
+have exposed us to a situation where we had an {\em almost enough}
+space for one more cluster (i.e. not enough space for data + FAT, but
+enough for data alone). This situation, combined with a situation
+where the last FAT sector was flush full could have lead to a
+situation where $numClus$ would become too large for the FAT to
+accomodate. I think this is clearer with an example:
+\begin{itemize}
+\item $remSect=4127$, $clusSiz=1$, $nfats=1$
+\item (Non rounded down) $numClus={(4127+2)*(1024) \over 1032} -
+2 =4094.992$
+\item Rounded down: 4094 clusters
+\item These fit into 16 FAT sectors, exactly
+\item ... leaving us 4095 clusters, which is one to many (because
+these 4095 clusters would now need 17 FAT clusters)
+\end{itemize}
+
+Keeping the fractional part (0.992) allows us to round {\em up} the
+needed number of FAT sectors to 17, nicely solving this problem.
+
+The downside of counting the fractional part however is that we quite
+often waste a sector in the really common situation where both $nfats$
+and $clusSiz$ are even, while $remSect$ is odd. An easy way to address
+this is to substract one from $remSect$ before application of the
+formula, if this case is detected. Such operation carries no risk, as
+the odd final sector cannot be used to make a full cluster.
+
+There is still a case however, where fatLen must be grown manually
+after application of the formula: If numClus exceeds the maximum
+number of clusters allowable for FAT12 or FAT16), we need to shrink
+$numClus$ after application of the formula, and manually make the FAT
+larger in order to take up any excess space.
+
+Also note that as we used upper bounds, we may waste a certain number
+of sectors, which an exact calculation may not have wasted. However,
+normally, we should not lose more than one sector per FAT copy.
+
+N.B. In its document at \url{http://www.microsoft.com/hwdev/download/hardware/fatgen103.pdf},
+Microsoft proposes a much simpler formula. However, this formula is
+both wrong (i.e. occasionally it produces a smaller FAT than is
+needed for the clusters on disk), less generic (only works for sector
+sizes equal to 512), and less efficient (in case of FAT32, it may
+waste up to 8 sectors!)
+
+The formula is the following (for FAT16):
+$$
+fatLen \le \left\lceil { remSect \over 256 * clusSiz + nfats} \right\rceil
+$$
+
+Note that it doesn't account for the dummy clusters 0 and 1. Thus, if
+we have 258 sectors remaining, with a cluster size of 1, and two FAT
+copies, the Microsoft formula mistakenly assumes fatLen = 1. This
+leaves 258 - 2 = 256 sectors for clusters, which yields 256 clusters.
+However, those clusters do not fit into the FAT, as two clusters are
+lost (0 and 1). However, to Micro\$ofts' credit, this is not actually
+the formula they're using (tested with $remsect=160056$ and
+$clusSize=4$), so this seems to be a documentation issue rather than a
+genuine bug.
+
+In case of FAT32, Microsoft just halves the denominator. This is
+wasteful, as only the $256*clusSiz$ part would need to be halved.
+
+If we assume 16777000, and a cluster size of 8, our formula would give
+us:
+$$fatLen = \left\lceil (16777000 + 16) * 8 \over 2 * 8 * 512 + 16
+\right\rceil = 16352$$
+This would leave $16777000-2*16352=16744296$ sectors for the clusters,
+i.e. 2093037 clusters. The FAT descriptors for those 2093037 clusters
+do indeed fit into our 16352 fat sectors.
+
+Microsoft, on the other hand, would get: $$fatLen = \left\lceil{
+16777000 \over (256 * 8 + 2)/2} \right\rceil = 16368$$ This would only
+leave $16777000-2*16368=16744264$, i.e. 2093033 clusters, thus wasting
+4 clusters. The FAT would be 28 sectors too big, i.e. more than the
+mere 8 sectors announced by Microsoft! Unfortunately, I currently
+don't have access to any sufficiently recent Windows to check out
+whether this really happens in the code, or whether it is only a
+documentation issue as well.
+
+Oh, and before somebody points it out: the formula in this document
+occassionnally wastes sectors too, although not as much (Example: With
+$remsect=685$, $clusSiz=1$ and $nfats=2$ our formula gives $fatLen=3$,
+which leaves 679 clusters. However, we could use $fatLen=2$, leaving
+681 clusters.
+
+\end{document}