diff options
author | Jeff King <peff@peff.net> | 2017-02-08 15:53:03 -0500 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2017-02-08 15:39:55 -0800 |
commit | 41a078c60b82bad4edf9d1bd8e826aae5f020ee5 (patch) | |
tree | a9ce65909dbb7521a2776570663fa1e1f60227e3 /object.h | |
parent | a10a17877bcce27b5677b5ac9f1a4e58d5bd8023 (diff) | |
download | git-41a078c60b82bad4edf9d1bd8e826aae5f020ee5.tar.gz |
fetch-pack: cache results of for_each_alternate_ref
We may run for_each_alternate_ref() twice, once in
find_common() and once in everything_local(). This operation
can be expensive, because it involves running a sub-process
which must freshly load all of the alternate's refs from
disk.
Let's cache and reuse the results between the two calls. We
can make some optimizations based on the particular use
pattern in fetch-pack to keep our memory usage down.
The first is that we only care about the sha1s, not the refs
themselves. So it's OK to store only the sha1s, and to
suppress duplicates. The natural fit would therefore be a
sha1_array.
However, sha1_array's de-duplication happens only after it
has read and sorted all entries. It still stores each
duplicate. For an alternate with a large number of refs
pointing to the same commits, this is a needless expense.
Instead, we'd prefer to eliminate duplicates before putting
them in the cache, which implies using a hash. We can
further note that fetch-pack will call parse_object() on
each alternate sha1. We can therefore keep our cache as a
set of pointers to "struct object". That gives us a place to
put our "already seen" bit with an optimized hash lookup.
And as a bonus, the object stores the sha1 for us, so
pointer-to-object is all we need.
There are two extra optimizations I didn't do here:
- we actually store an array of pointer-to-object.
Technically we could just walk the obj_hash table
looking for entries with the ALTERNATE flag set (because
our use case doesn't care about the order here).
But that hash table may be mostly composed of
non-ALTERNATE entries, so we'd waste time walking over
them. So it would be a slight win in memory use, but a
loss in CPU.
- the items we pull out of the cache are actual "struct
object"s, but then we feed "obj->sha1" to our
sub-functions, which promptly call parse_object().
This second parse is cheap, because it starts with
lookup_object() and will bail immediately when it sees
we've already parsed the object. We could save the extra
hash lookup, but it would involve refactoring the
functions we call. It may or may not be worth the
trouble.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'object.h')
-rw-r--r-- | object.h | 2 |
1 files changed, 1 insertions, 1 deletions
@@ -29,7 +29,7 @@ struct object_array { /* * object flag allocation: * revision.h: 0---------10 26 - * fetch-pack.c: 0---4 + * fetch-pack.c: 0---5 * walker.c: 0-2 * upload-pack.c: 4 11----------------19 * builtin/blame.c: 12-13 |