|
The KVM and VirtualBox deployments use sparse files for raw disk
images. This means they can store a large disk (say, tens or hundreds
of gigabytes) without using more disk space than is required for the
actual content (e.g., a gigabyte or so for the files in the root
filesystem). The kernel and filesystem make the unwritten parts of the
disk image look as if they are filled with zero bytes. This is good.
However, during deployment those sparse files get transferred as if
there really are a lot of zeroes. Those zeroes take a lot of time to
transfer. rsync, for example, does not handle large holes efficiently.
This change introduces a couple of helper tools (morphlib/xfer-hole
and morphlib/recv-hole), which transfer the holes more efficiently.
The xfer-hole program reads a file and outputs records like these:
DATA
123
binary data (exaclyt 123 bytes and no newline at the end)
HOLE
3245
xfer-hole can do this efficiently, without having to read through all
the zeroes in the holes, using the SEEK_DATA and SEEK_HOLE arguments
to lseek.
Using this, the holes take only take a few bytes each, making it
possible to transfer a disk image faster. In my benchmarks,
transferring a 100G byte disk image took about 100 seconds for KVM,
and 220 seconds for VirtualBox (which needs to more work at the
receiver to convert the raw disk to a VDI). Both benchmarks were from
a VM on my laptop to the laptop itself.
The interesting bit here is that the receiver (recv-hole) is simple
enough that it can be implemented in a bit of shell script, and the
text of the shell script can be run on the remote end by giving it to
ssh as a command line argument. This means there is no need to install
any special tools on the receiver, which makes using this improvement
much simpler.
|