delta/libgit2.git, branch ethomson/large_loose_blobs

tests: add GITTEST_SLOW env var check

2017-12-20T16:21:05+00:00

Writing very large files may be slow, particularly on inefficient
filesystems and when running instrumented code to detect invalid memory
accesses (eg within valgrind or similar tools).

Introduce `GITTEST_SLOW` so that tests that are slow can be skipped by
the CI system.

hash: commoncrypto hash should support large files

2017-12-20T16:08:04+00:00

Teach the CommonCrypto hash mechanisms to support large files.  The hash
primitives take a `CC_LONG` (aka `uint32_t`) at a time.  So loop to give
the hash function at most an unsigned 32 bit's worth of bytes until we
have hashed the entire file.

hash: win32 hash mechanism should support large files

2017-12-20T16:08:04+00:00

Teach the win32 hash mechanisms to support large files.  The hash
primitives take at most `ULONG_MAX` bytes at a time.  Loop, giving the
hash function the maximum supported number of bytes, until we have
hashed the entire file.

odb_loose: reject objects that cannot fit in memory

2017-12-20T16:08:04+00:00

Check the size of objects being read from the loose odb backend and
reject those that would not fit in memory with an error message that
reflects the actual problem, instead of error'ing later with an
unintuitive error message regarding truncation or invalid hashes.

zstream: use UINT_MAX sized chunks

2017-12-20T16:08:03+00:00

Instead of paging to zlib in INT_MAX sized chunks, we can give it
as many as UINT_MAX bytes at a time.  zlib doesn't care how big
a buffer we give it, this simply results in fewer calls into zlib.

odb: support large loose objects

2017-12-20T16:08:03+00:00

zlib will only inflate/deflate an `int`s worth of data at a time.
We need to loop through large files in order to ensure that we inflate
the entire file, not just an `int`s worth of data.  Thankfully, we
already have this loop in our `git_zstream` layer.  Handle large objects
using the `git_zstream`.

object: introduce git_object_stringn2type

2017-12-20T16:08:03+00:00

Introduce an internal API to get the object type based on a
length-specified (not null terminated) string representation.  This can
be used to compare the (space terminated) object type name in a loose
object.

Reimplement `git_object_string2type` based on this API.

odb: test loose reading/writing large objects

2017-12-20T16:08:02+00:00

Introduce a test for very large objects in the ODB.  Write a large
object (5 GB) and ensure that the write succeeds and provides us the
expected object ID.  Introduce a test that writes that file and
ensures that we can subsequently read it.

util: introduce `git__prefixncmp` and consolidate implementations

2017-12-20T16:08:01+00:00

Introduce `git_prefixncmp` that will search up to the first `n`
characters of a string to see if it is prefixed by another string.
This is useful for examining if a non-null terminated character
array is prefixed by a particular substring.

Consolidate the various implementations of `git__prefixcmp` around a
single core implementation and add some test cases to validate its
behavior.

zstream: treat `Z_BUF_ERROR` as non-fatal

2017-12-20T16:08:01+00:00

zlib will return `Z_BUF_ERROR` whenever there is more input to inflate
or deflate than there is output to store the result.  This is normal for
us as we iterate through the input, particularly with very large input
buffers.