diff options
author | Junio C Hamano <gitster@pobox.com> | 2014-02-27 14:01:48 -0800 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2014-02-27 14:01:48 -0800 |
commit | 0f9e62e0847c075678a7a5a748567d1e881d16f8 (patch) | |
tree | 8ef8989069ae40eef891b0964fc2cb8036a74e48 | |
parent | 6784fab0ac1f973f22f1d4252f0e513d61be6c6b (diff) | |
parent | 6b5b3a27b7faf9d72efec28fa017408daf45cd00 (diff) | |
download | git-0f9e62e0847c075678a7a5a748567d1e881d16f8.tar.gz |
Merge branch 'jk/pack-bitmap'
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
-rw-r--r-- | Documentation/config.txt | 25 | ||||
-rw-r--r-- | Documentation/git-repack.txt | 9 | ||||
-rw-r--r-- | Documentation/git-rev-list.txt | 1 | ||||
-rw-r--r-- | Documentation/rev-list-options.txt | 8 | ||||
-rw-r--r-- | Documentation/technical/bitmap-format.txt | 164 | ||||
-rw-r--r-- | Makefile | 16 | ||||
-rw-r--r-- | block-sha1/sha1.c | 32 | ||||
-rw-r--r-- | builtin/pack-objects.c | 451 | ||||
-rw-r--r-- | builtin/repack.c | 41 | ||||
-rw-r--r-- | builtin/rev-list.c | 39 | ||||
-rw-r--r-- | cache.h | 1 | ||||
-rw-r--r-- | compat/bswap.h | 112 | ||||
-rw-r--r-- | ewah/bitmap.c | 221 | ||||
-rw-r--r-- | ewah/ewah_bitmap.c | 714 | ||||
-rw-r--r-- | ewah/ewah_io.c | 204 | ||||
-rw-r--r-- | ewah/ewah_rlw.c | 115 | ||||
-rw-r--r-- | ewah/ewok.h | 233 | ||||
-rw-r--r-- | ewah/ewok_rlw.h | 114 | ||||
-rw-r--r-- | khash.h | 338 | ||||
-rw-r--r-- | pack-bitmap-write.c | 552 | ||||
-rw-r--r-- | pack-bitmap.c | 1073 | ||||
-rw-r--r-- | pack-bitmap.h | 64 | ||||
-rw-r--r-- | pack-objects.c | 111 | ||||
-rw-r--r-- | pack-objects.h | 68 | ||||
-rw-r--r-- | pack-revindex.c | 43 | ||||
-rw-r--r-- | pack-revindex.h | 9 | ||||
-rw-r--r-- | pack-write.c | 2 | ||||
-rw-r--r-- | read-cache.c | 44 | ||||
-rw-r--r-- | revision.c | 4 | ||||
-rw-r--r-- | revision.h | 2 | ||||
-rw-r--r-- | sha1_file.c | 6 | ||||
-rwxr-xr-x | t/perf/p5310-pack-bitmaps.sh | 57 | ||||
-rwxr-xr-x | t/t5310-pack-bitmaps.sh | 139 |
33 files changed, 4736 insertions, 276 deletions
diff --git a/Documentation/config.txt b/Documentation/config.txt index 7eec746950..040197b10e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1870,6 +1870,31 @@ pack.packSizeLimit:: Common unit suffixes of 'k', 'm', or 'g' are supported. +pack.useBitmaps:: + When true, git will use pack bitmaps (if available) when packing + to stdout (e.g., during the server side of a fetch). Defaults to + true. You should not generally need to turn this off unless + you are debugging pack bitmaps. + +pack.writebitmaps:: + When true, git will write a bitmap index when packing all + objects to disk (e.g., when `git repack -a` is run). This + index can speed up the "counting objects" phase of subsequent + packs created for clones and fetches, at the cost of some disk + space and extra time spent on the initial repack. Defaults to + false. + +pack.writeBitmapHashCache:: + When true, git will include a "hash cache" section in the bitmap + index (if one is written). This cache can be used to feed git's + delta heuristics, potentially leading to better deltas between + bitmapped and non-bitmapped objects (e.g., when serving a fetch + between an older, bitmapped pack and objects that have been + pushed since the last gc). The downside is that it consumes 4 + bytes per object of disk space, and that JGit's bitmap + implementation does not understand it, causing it to complain if + Git and JGit are used on the same repository. Defaults to false. + pager.<cmd>:: If the value is boolean, turns on or off pagination of the output of a particular Git subcommand when writing to a tty. diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 509cf73e50..002cfd5eb9 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -9,7 +9,7 @@ git-repack - Pack unpacked objects in a repository SYNOPSIS -------- [verse] -'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [--window=<n>] [--depth=<n>] +'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [--window=<n>] [--depth=<n>] DESCRIPTION ----------- @@ -110,6 +110,13 @@ other objects in that pack they already have locally. The default is unlimited, unless the config variable `pack.packSizeLimit` is set. +-b:: +--write-bitmap-index:: + Write a reachability bitmap index as part of the repack. This + only makes sense when used with `-a` or `-A`, as the bitmaps + must be able to refer to all reachable objects. This option + overrides the setting of `pack.writebitmaps`. + Configuration ------------- diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt index 045b37b82e..7a1585def0 100644 --- a/Documentation/git-rev-list.txt +++ b/Documentation/git-rev-list.txt @@ -55,6 +55,7 @@ SYNOPSIS [ \--reverse ] [ \--walk-reflogs ] [ \--no-walk ] [ \--do-walk ] + [ \--use-bitmap-index ] <commit>... [ \-- <paths>... ] DESCRIPTION diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 03533af715..9a3da3646e 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -257,6 +257,14 @@ See also linkgit:git-reflog[1]. Output excluded boundary commits. Boundary commits are prefixed with `-`. +ifdef::git-rev-list[] +--use-bitmap-index:: + + Try to speed up the traversal using the pack bitmap index (if + one is available). Note that when traversing with `--objects`, + trees and blobs will not have their associated path printed. +endif::git-rev-list[] + -- History Simplification diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt new file mode 100644 index 0000000000..f8c18a0f7a --- /dev/null +++ b/Documentation/technical/bitmap-format.txt @@ -0,0 +1,164 @@ +GIT bitmap v1 format +==================== + + - A header appears at the beginning: + + 4-byte signature: {'B', 'I', 'T', 'M'} + + 2-byte version number (network byte order) + The current implementation only supports version 1 + of the bitmap index (the same one as JGit). + + 2-byte flags (network byte order) + + The following flags are supported: + + - BITMAP_OPT_FULL_DAG (0x1) REQUIRED + This flag must always be present. It implies that the bitmap + index has been generated for a packfile with full closure + (i.e. where every single object in the packfile can find + its parent links inside the same packfile). This is a + requirement for the bitmap index format, also present in JGit, + that greatly reduces the complexity of the implementation. + + - BITMAP_OPT_HASH_CACHE (0x4) + If present, the end of the bitmap file contains + `N` 32-bit name-hash values, one per object in the + pack. The format and meaning of the name-hash is + described below. + + 4-byte entry count (network byte order) + + The total count of entries (bitmapped commits) in this bitmap index. + + 20-byte checksum + + The SHA1 checksum of the pack this bitmap index belongs to. + + - 4 EWAH bitmaps that act as type indexes + + Type indexes are serialized after the hash cache in the shape + of four EWAH bitmaps stored consecutively (see Appendix A for + the serialization format of an EWAH bitmap). + + There is a bitmap for each Git object type, stored in the following + order: + + - Commits + - Trees + - Blobs + - Tags + + In each bitmap, the `n`th bit is set to true if the `n`th object + in the packfile is of that type. + + The obvious consequence is that the OR of all 4 bitmaps will result + in a full set (all bits set), and the AND of all 4 bitmaps will + result in an empty bitmap (no bits set). + + - N entries with compressed bitmaps, one for each indexed commit + + Where `N` is the total amount of entries in this bitmap index. + Each entry contains the following: + + - 4-byte object position (network byte order) + The position **in the index for the packfile** where the + bitmap for this commit is found. + + - 1-byte XOR-offset + The xor offset used to compress this bitmap. For an entry + in position `x`, a XOR offset of `y` means that the actual + bitmap representing this commit is composed by XORing the + bitmap for this entry with the bitmap in entry `x-y` (i.e. + the bitmap `y` entries before this one). + + Note that this compression can be recursive. In order to + XOR this entry with a previous one, the previous entry needs + to be decompressed first, and so on. + + The hard-limit for this offset is 160 (an entry can only be + xor'ed against one of the 160 entries preceding it). This + number is always positive, and hence entries are always xor'ed + with **previous** bitmaps, not bitmaps that will come afterwards + in the index. + + - 1-byte flags for this bitmap + At the moment the only available flag is `0x1`, which hints + that this bitmap can be re-used when rebuilding bitmap indexes + for the repository. + + - The compressed bitmap itself, see Appendix A. + +== Appendix A: Serialization format for an EWAH bitmap + +Ewah bitmaps are serialized in the same protocol as the JAVAEWAH +library, making them backwards compatible with the JGit +implementation: + + - 4-byte number of bits of the resulting UNCOMPRESSED bitmap + + - 4-byte number of words of the COMPRESSED bitmap, when stored + + - N x 8-byte words, as specified by the previous field + + This is the actual content of the compressed bitmap. + + - 4-byte position of the current RLW for the compressed + bitmap + +All words are stored in network byte order for their corresponding +sizes. + +The compressed bitmap is stored in a form of run-length encoding, as +follows. It consists of a concatenation of an arbitrary number of +chunks. Each chunk consists of one or more 64-bit words + + H L_1 L_2 L_3 .... L_M + +H is called RLW (run length word). It consists of (from lower to higher +order bits): + + - 1 bit: the repeated bit B + + - 32 bits: repetition count K (unsigned) + + - 31 bits: literal word count M (unsigned) + +The bitstream represented by the above chunk is then: + + - K repetitions of B + + - The bits stored in `L_1` through `L_M`. Within a word, bits at + lower order come earlier in the stream than those at higher + order. + +The next word after `L_M` (if any) must again be a RLW, for the next +chunk. For efficient appending to the bitstream, the EWAH stores a +pointer to the last RLW in the stream. + + +== Appendix B: Optional Bitmap Sections + +These sections may or may not be present in the `.bitmap` file; their +presence is indicated by the header flags section described above. + +Name-hash cache +--------------- + +If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains +a cache of 32-bit values, one per object in the pack. The value at +position `i` is the hash of the pathname at which the `i`th object +(counting in index order) in the pack can be found. This can be fed +into the delta heuristics to compare objects with similar pathnames. + +The hash algorithm used is: + + hash = 0; + while ((c = *name++)) + if (!isspace(c)) + hash = (hash >> 2) + (c << 24); + +Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag. +If implementations want to choose a different hashing scheme, they are +free to do so, but MUST allocate a new header flag (because comparing +hashes made under two different schemes would be pointless). @@ -664,6 +664,8 @@ LIB_H += diff.h LIB_H += diffcore.h LIB_H += dir.h LIB_H += exec_cmd.h +LIB_H += ewah/ewok.h +LIB_H += ewah/ewok_rlw.h LIB_H += fetch-pack.h LIB_H += fmt-merge-msg.h LIB_H += fsck.h @@ -691,8 +693,10 @@ LIB_H += notes-merge.h LIB_H += notes-utils.h LIB_H += notes.h LIB_H += object.h +LIB_H += pack-objects.h LIB_H += pack-revindex.h LIB_H += pack.h +LIB_H += pack-bitmap.h LIB_H += parse-options.h LIB_H += patch-ids.h LIB_H += pathspec.h @@ -796,6 +800,10 @@ LIB_OBJS += dir.o LIB_OBJS += editor.o LIB_OBJS += entry.o LIB_OBJS += environment.o +LIB_OBJS += ewah/bitmap.o +LIB_OBJS += ewah/ewah_bitmap.o +LIB_OBJS += ewah/ewah_io.o +LIB_OBJS += ewah/ewah_rlw.o LIB_OBJS += exec_cmd.o LIB_OBJS += fetch-pack.o LIB_OBJS += fsck.o @@ -827,7 +835,10 @@ LIB_OBJS += notes-cache.o LIB_OBJS += notes-merge.o LIB_OBJS += notes-utils.o LIB_OBJS += object.o +LIB_OBJS += pack-bitmap.o +LIB_OBJS += pack-bitmap-write.o LIB_OBJS += pack-check.o +LIB_OBJS += pack-objects.o LIB_OBJS += pack-revindex.o LIB_OBJS += pack-write.o LIB_OBJS += pager.o @@ -2480,8 +2491,9 @@ profile-clean: $(RM) $(addsuffix *.gcno,$(addprefix $(PROFILE_DIR)/, $(object_dirs))) clean: profile-clean coverage-clean - $(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o xdiff/*.o vcs-svn/*.o \ - builtin/*.o $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) + $(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o + $(RM) xdiff/*.o vcs-svn/*.o ewah/*.o builtin/*.o + $(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X $(RM) $(TEST_PROGRAMS) $(NO_INSTALL) $(RM) -r bin-wrappers $(dep_dirs) diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c index e1a1eb6097..22b125cf8c 100644 --- a/block-sha1/sha1.c +++ b/block-sha1/sha1.c @@ -62,38 +62,6 @@ #define setW(x, val) (W(x) = (val)) #endif -/* - * Performance might be improved if the CPU architecture is OK with - * unaligned 32-bit loads and a fast ntohl() is available. - * Otherwise fall back to byte loads and shifts which is portable, - * and is faster on architectures with memory alignment issues. - */ - -#if defined(__i386__) || defined(__x86_64__) || \ - defined(_M_IX86) || defined(_M_X64) || \ - defined(__ppc__) || defined(__ppc64__) || \ - defined(__powerpc__) || defined(__powerpc64__) || \ - defined(__s390__) || defined(__s390x__) - -#define get_be32(p) ntohl(*(unsigned int *)(p)) -#define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0) - -#else - -#define get_be32(p) ( \ - (*((unsigned char *)(p) + 0) << 24) | \ - (*((unsigned char *)(p) + 1) << 16) | \ - (*((unsigned char *)(p) + 2) << 8) | \ - (*((unsigned char *)(p) + 3) << 0) ) -#define put_be32(p, v) do { \ - unsigned int __v = (v); \ - *((unsigned char *)(p) + 0) = __v >> 24; \ - *((unsigned char *)(p) + 1) = __v >> 16; \ - *((unsigned char *)(p) + 2) = __v >> 8; \ - *((unsigned char *)(p) + 3) = __v >> 0; } while (0) - -#endif - /* This "rolls" over the 512-bit array */ #define W(x) (array[(x)&15]) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 541667f102..c73337931b 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -14,10 +14,12 @@ #include "diff.h" #include "revision.h" #include "list-objects.h" +#include "pack-objects.h" #include "progress.h" #include "refs.h" #include "streaming.h" #include "thread-utils.h" +#include "pack-bitmap.h" static const char *pack_usage[] = { N_("git pack-objects --stdout [options...] [< ref-list | < object-list]"), @@ -25,42 +27,15 @@ static const char *pack_usage[] = { NULL }; -struct object_entry { - struct pack_idx_entry idx; - unsigned long size; /* uncompressed size */ - struct packed_git *in_pack; /* already in pack */ - off_t in_pack_offset; - struct object_entry *delta; /* delta base object */ - struct object_entry *delta_child; /* deltified objects who bases me */ - struct object_entry *delta_sibling; /* other deltified objects who - * uses the same base as me - */ - void *delta_data; /* cached delta (uncompressed) */ - unsigned long delta_size; /* delta data size (uncompressed) */ - unsigned long z_delta_size; /* delta data size (compressed) */ - enum object_type type; - enum object_type in_pack_type; /* could be delta */ - uint32_t hash; /* name hint hash */ - unsigned char in_pack_header_size; - unsigned preferred_base:1; /* - * we do not pack this, but is available - * to be used as the base object to delta - * objects against. - */ - unsigned no_try_delta:1; - unsigned tagged:1; /* near the very tip of refs */ - unsigned filled:1; /* assigned write-order */ -}; - /* - * Objects we are going to pack are collected in objects array (dynamically - * expanded). nr_objects & nr_alloc controls this array. They are stored - * in the order we see -- typically rev-list --objects order that gives us - * nice "minimum seek" order. + * Objects we are going to pack are collected in the `to_pack` structure. + * It contains an array (dynamically expanded) of the object data, and a map + * that can resolve SHA1s to their position in the array. */ -static struct object_entry *objects; +static struct packing_data to_pack; + static struct pack_idx_entry **written_list; -static uint32_t nr_objects, nr_alloc, nr_result, nr_written; +static uint32_t nr_result, nr_written; static int non_empty; static int reuse_delta = 1, reuse_object = 1; @@ -83,6 +58,14 @@ static struct progress *progress_state; static int pack_compression_level = Z_DEFAULT_COMPRESSION; static int pack_compression_seen; +static struct packed_git *reuse_packfile; +static uint32_t reuse_packfile_objects; +static off_t reuse_packfile_offset; + +static int use_bitmap_index = 1; +static int write_bitmap_index; +static uint16_t write_bitmap_options; + static unsigned long delta_cache_size = 0; static unsigned long max_delta_cache_size = 256 * 1024 * 1024; static unsigned long cache_max_small_delta_size = 1000; @@ -90,20 +73,28 @@ static unsigned long cache_max_small_delta_size = 1000; static unsigned long window_memory_limit = 0; /* - * The object names in objects array are hashed with this hashtable, - * to help looking up the entry by object name. - * This hashtable is built after all the objects are seen. - */ -static int *object_ix; -static int object_ix_hashsz; -static struct object_entry *locate_object_entry(const unsigned char *sha1); - -/* * stats */ static uint32_t written, written_delta; static uint32_t reused, reused_delta; +/* + * Indexed commits + */ +static struct commit **indexed_commits; +static unsigned int indexed_commits_nr; +static unsigned int indexed_commits_alloc; + +static void index_commit_for_bitmap(struct commit *commit) +{ + if (indexed_commits_nr >= indexed_commits_alloc) { + indexed_commits_alloc = (indexed_commits_alloc + 32) * 2; + indexed_commits = xrealloc(indexed_commits, + indexed_commits_alloc * sizeof(struct commit *)); + } + + indexed_commits[indexed_commits_nr++] = commit; +} static void *get_delta(struct object_entry *entry) { @@ -553,12 +544,12 @@ static int mark_tagged(const char *path, const unsigned char *sha1, int flag, void *cb_data) { unsigned char peeled[20]; - struct object_entry *entry = locate_object_entry(sha1); + struct object_entry *entry = packlist_find(&to_pack, sha1, NULL); if (entry) entry->tagged = 1; if (!peel_ref(path, peeled)) { - entry = locate_object_entry(peeled); + entry = packlist_find(&to_pack, peeled, NULL); if (entry) entry->tagged = 1; } @@ -633,9 +624,10 @@ static struct object_entry **compute_write_order(void) { unsigned int i, wo_end, last_untagged; - struct object_entry **wo = xmalloc(nr_objects * sizeof(*wo)); + struct object_entry **wo = xmalloc(to_pack.nr_objects * sizeof(*wo)); + struct object_entry *objects = to_pack.objects; - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < to_pack.nr_objects; i++) { objects[i].tagged = 0; objects[i].filled = 0; objects[i].delta_child = NULL; @@ -647,7 +639,7 @@ static struct object_entry **compute_write_order(void) * Make sure delta_sibling is sorted in the original * recency order. */ - for (i = nr_objects; i > 0;) { + for (i = to_pack.nr_objects; i > 0;) { struct object_entry *e = &objects[--i]; if (!e->delta) continue; @@ -665,7 +657,7 @@ static struct object_entry **compute_write_order(void) * Give the objects in the original recency order until * we see a tagged tip. */ - for (i = wo_end = 0; i < nr_objects; i++) { + for (i = wo_end = 0; i < to_pack.nr_objects; i++) { if (objects[i].tagged) break; add_to_write_order(wo, &wo_end, &objects[i]); @@ -675,7 +667,7 @@ static struct object_entry **compute_write_order(void) /* * Then fill all the tagged tips. */ - for (; i < nr_objects; i++) { + for (; i < to_pack.nr_objects; i++) { if (objects[i].tagged) add_to_write_order(wo, &wo_end, &objects[i]); } @@ -683,7 +675,7 @@ static struct object_entry **compute_write_order(void) /* * And then all remaining commits and tags. */ - for (i = last_untagged; i < nr_objects; i++) { + for (i = last_untagged; i < to_pack.nr_objects; i++) { if (objects[i].type != OBJ_COMMIT && objects[i].type != OBJ_TAG) continue; @@ -693,7 +685,7 @@ static struct object_entry **compute_write_order(void) /* * And then all the trees. */ - for (i = last_untagged; i < nr_objects; i++) { + for (i = last_untagged; i < to_pack.nr_objects; i++) { if (objects[i].type != OBJ_TREE) continue; add_to_write_order(wo, &wo_end, &objects[i]); @@ -702,17 +694,57 @@ static struct object_entry **compute_write_order(void) /* * Finally all the rest in really tight order */ - for (i = last_untagged; i < nr_objects; i++) { + for (i = last_untagged; i < to_pack.nr_objects; i++) { if (!objects[i].filled) add_family_to_write_order(wo, &wo_end, &objects[i]); } - if (wo_end != nr_objects) - die("ordered %u objects, expected %"PRIu32, wo_end, nr_objects); + if (wo_end != to_pack.nr_objects) + die("ordered %u objects, expected %"PRIu32, wo_end, to_pack.nr_objects); return wo; } +static off_t write_reused_pack(struct sha1file *f) +{ + unsigned char buffer[8192]; + off_t to_write; + int fd; + + if (!is_pack_valid(reuse_packfile)) + die("packfile is invalid: %s", reuse_packfile->pack_name); + + fd = git_open_noatime(reuse_packfile->pack_name); + if (fd < 0) + die_errno("unable to open packfile for reuse: %s", + reuse_packfile->pack_name); + + if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1) + die_errno("unable to seek in reused packfile"); + + if (reuse_packfile_offset < 0) + reuse_packfile_offset = reuse_packfile->pack_size - 20; + + to_write = reuse_packfile_offset - sizeof(struct pack_header); + + while (to_write) { + int read_pack = xread(fd, buffer, sizeof(buffer)); + + if (read_pack <= 0) + die_errno("unable to read from reused packfile"); + + if (read_pack > to_write) + read_pack = to_write; + + sha1write(f, buffer, read_pack); + to_write -= read_pack; + } + + close(fd); + written += reuse_packfile_objects; + return reuse_packfile_offset - sizeof(struct pack_header); +} + static void write_pack_file(void) { uint32_t i = 0, j; @@ -724,7 +756,7 @@ static void write_pack_file(void) if (progress > pack_to_stdout) progress_state = start_progress("Writing objects", nr_result); - written_list = xmalloc(nr_objects * sizeof(*written_list)); + written_list = xmalloc(to_pack.nr_objects * sizeof(*written_list)); write_order = compute_write_order(); do { @@ -737,8 +769,17 @@ static void write_pack_file(void) f = create_tmp_packfile(&pack_tmp_name); offset = write_pack_header(f, nr_remaining); + + if (reuse_packfile) { + off_t packfile_size; + assert(pack_to_stdout); + + packfile_size = write_reused_pack(f); + offset += packfile_size; + } + nr_written = 0; - for (; i < nr_objects; i++) { + for (; i < to_pack.nr_objects; i++) { struct object_entry *e = write_order[i]; if (write_one(f, e, &offset) == WRITE_ONE_BREAK) break; @@ -789,9 +830,31 @@ static void write_pack_file(void) if (sizeof(tmpname) <= strlen(base_name) + 50) die("pack base name '%s' too long", base_name); snprintf(tmpname, sizeof(tmpname), "%s-", base_name); + + if (write_bitmap_index) { + bitmap_writer_set_checksum(sha1); + bitmap_writer_build_type_index(written_list, nr_written); + } + finish_tmp_packfile(tmpname, pack_tmp_name, written_list, nr_written, &pack_idx_opts, sha1); + + if (write_bitmap_index) { + char *end_of_name_prefix = strrchr(tmpname, 0); + sprintf(end_of_name_prefix, "%s.bitmap", sha1_to_hex(sha1)); + + stop_progress(&progress_state); + + bitmap_writer_show_progress(progress); + bitmap_writer_reuse_bitmaps(&to_pack); + bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1); + bitmap_writer_build(&to_pack); + bitmap_writer_finish(written_list, nr_written, + tmpname, write_bitmap_options); + write_bitmap_index = 0; + } + free(pack_tmp_name); puts(sha1_to_hex(sha1)); } @@ -801,7 +864,7 @@ static void write_pack_file(void) written_list[j]->offset = (off_t)-1; } nr_remaining -= nr_written; - } while (nr_remaining && i < nr_objects); + } while (nr_remaining && i < to_pack.nr_objects); free(written_list); free(write_order); @@ -811,73 +874,6 @@ static void write_pack_file(void) written, nr_result); } -static int locate_object_entry_hash(const unsigned char *sha1) -{ - int i; - unsigned int ui; - memcpy(&ui, sha1, sizeof(unsigned int)); - i = ui % object_ix_hashsz; - while (0 < object_ix[i]) { - if (!hashcmp(sha1, objects[object_ix[i] - 1].idx.sha1)) - return i; - if (++i == object_ix_hashsz) - i = 0; - } - return -1 - i; -} - -static struct object_entry *locate_object_entry(const unsigned char *sha1) -{ - int i; - - if (!object_ix_hashsz) - return NULL; - - i = locate_object_entry_hash(sha1); - if (0 <= i) - return &objects[object_ix[i]-1]; - return NULL; -} - -static void rehash_objects(void) -{ - uint32_t i; - struct object_entry *oe; - - object_ix_hashsz = nr_objects * 3; - if (object_ix_hashsz < 1024) - object_ix_hashsz = 1024; - object_ix = xrealloc(object_ix, sizeof(int) * object_ix_hashsz); - memset(object_ix, 0, sizeof(int) * object_ix_hashsz); - for (i = 0, oe = objects; i < nr_objects; i++, oe++) { - int ix = locate_object_entry_hash(oe->idx.sha1); - if (0 <= ix) - continue; - ix = -1 - ix; - object_ix[ix] = i + 1; - } -} - -static uint32_t name_hash(const char *name) -{ - uint32_t c, hash = 0; - - if (!name) - return 0; - - /* - * This effectively just creates a sortable number from the - * last sixteen non-whitespace characters. Last characters - * count "most", so things that end in ".c" sort together. - */ - while ((c = *name++) != 0) { - if (isspace(c)) - continue; - hash = (hash >> 2) + (c << 24); - } - return hash; -} - static void setup_delta_attr_check(struct git_attr_check *check) { static struct git_attr *attr_delta; @@ -900,42 +896,69 @@ static int no_try_delta(const char *path) return 0; } -static int add_object_entry(const unsigned char *sha1, enum object_type type, - const char *name, int exclude) +/* + * When adding an object, check whether we have already added it + * to our packing list. If so, we can skip. However, if we are + * being asked to excludei t, but the previous mention was to include + * it, make sure to adjust its flags and tweak our numbers accordingly. + * + * As an optimization, we pass out the index position where we would have + * found the item, since that saves us from having to look it up again a + * few lines later when we want to add the new entry. + */ +static int have_duplicate_entry(const unsigned char *sha1, + int exclude, + uint32_t *index_pos) { struct object_entry *entry; - struct packed_git *p, *found_pack = NULL; - off_t found_offset = 0; - int ix; - uint32_t hash = name_hash(name); - - ix = nr_objects ? locate_object_entry_hash(sha1) : -1; - if (ix >= 0) { - if (exclude) { - entry = objects + object_ix[ix] - 1; - if (!entry->preferred_base) - nr_result--; - entry->preferred_base = 1; - } + + entry = packlist_find(&to_pack, sha1, index_pos); + if (!entry) return 0; + + if (exclude) { + if (!entry->preferred_base) + nr_result--; + entry->preferred_base = 1; } + return 1; +} + +/* + * Check whether we want the object in the pack (e.g., we do not want + * objects found in non-local stores if the "--local" option was used). + * + * As a side effect of this check, we will find the packed version of this + * object, if any. We therefore pass out the pack information to avoid having + * to look it up again later. + */ +static int want_object_in_pack(const unsigned char *sha1, + int exclude, + struct packed_git **found_pack, + off_t *found_offset) +{ + struct packed_git *p; + if (!exclude && local && has_loose_object_nonlocal(sha1)) return 0; + *found_pack = NULL; + *found_offset = 0; + for (p = packed_git; p; p = p->next) { off_t offset = find_pack_entry_one(sha1, p); if (offset) { - if (!found_pack) { + if (!*found_pack) { if (!is_pack_valid(p)) { warning("packfile %s cannot be accessed", p->pack_name); continue; } - found_offset = offset; - found_pack = p; + *found_offset = offset; + *found_pack = p; } if (exclude) - break; + return 1; if (incremental) return 0; if (local && !p->pack_local) @@ -945,14 +968,21 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type, } } - if (nr_objects >= nr_alloc) { - nr_alloc = (nr_alloc + 1024) * 3 / 2; - objects = xrealloc(objects, nr_alloc * sizeof(*entry)); - } + return 1; +} + +static void create_object_entry(const unsigned char *sha1, + enum object_type type, + uint32_t hash, + int exclude, + int no_try_delta, + uint32_t index_pos, + struct packed_git *found_pack, + off_t found_offset) +{ + struct object_entry *entry; - entry = objects + nr_objects++; - memset(entry, 0, sizeof(*entry)); - hashcpy(entry->idx.sha1, sha1); + entry = packlist_alloc(&to_pack, sha1, index_pos); entry->hash = hash; if (type) entry->type = type; @@ -965,16 +995,43 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type, entry->in_pack_offset = found_offset; } - if (object_ix_hashsz * 3 <= nr_objects * 4) - rehash_objects(); - else - object_ix[-1 - ix] = nr_objects; + entry->no_try_delta = no_try_delta; +} + +static int add_object_entry(const unsigned char *sha1, enum object_type type, + const char *name, int exclude) +{ + struct packed_git *found_pack; + off_t found_offset; + uint32_t index_pos; - display_progress(progress_state, nr_objects); + if (have_duplicate_entry(sha1, exclude, &index_pos)) + return 0; + + if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset)) + return 0; - if (name && no_try_delta(name)) - entry->no_try_delta = 1; + create_object_entry(sha1, type, pack_name_hash(name), + exclude, name && no_try_delta(name), + index_pos, found_pack, found_offset); + display_progress(progress_state, to_pack.nr_objects); + return 1; +} + +static int add_object_entry_from_bitmap(const unsigned char *sha1, + enum object_type type, + int flags, uint32_t name_hash, + struct packed_git *pack, off_t offset) +{ + uint32_t index_pos; + + if (have_duplicate_entry(sha1, 0, &index_pos)) + return 0; + + create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); + + display_progress(progress_state, to_pack.nr_objects); return 1; } @@ -1175,7 +1232,7 @@ static void add_preferred_base_object(const char *name) { struct pbase_tree *it; int cmplen; - unsigned hash = name_hash(name); + unsigned hash = pack_name_hash(name); if (!num_preferred_base || check_pbase_path(hash)) return; @@ -1327,7 +1384,7 @@ static void check_object(struct object_entry *entry) break; } - if (base_ref && (base_entry = locate_object_entry(base_ref))) { + if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, NULL))) { /* * If base_ref was set above that means we wish to * reuse delta data, and we even found that base @@ -1401,12 +1458,12 @@ static void get_object_details(void) uint32_t i; struct object_entry **sorted_by_offset; - sorted_by_offset = xcalloc(nr_objects, sizeof(struct object_entry *)); - for (i = 0; i < nr_objects; i++) - sorted_by_offset[i] = objects + i; - qsort(sorted_by_offset, nr_objects, sizeof(*sorted_by_offset), pack_offset_sort); + sorted_by_offset = xcalloc(to_pack.nr_objects, sizeof(struct object_entry *)); + for (i = 0; i < to_pack.nr_objects; i++) + sorted_by_offset[i] = to_pack.objects + i; + qsort(sorted_by_offset, to_pack.nr_objects, sizeof(*sorted_by_offset), pack_offset_sort); - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < to_pack.nr_objects; i++) { struct object_entry *entry = sorted_by_offset[i]; check_object(entry); if (big_file_threshold < entry->size) @@ -2032,7 +2089,7 @@ static int add_ref_tag(const char *path, const unsigned char *sha1, int flag, vo if (starts_with(path, "refs/tags/") && /* is a tag? */ !peel_ref(path, peeled) && /* peelable? */ - locate_object_entry(peeled)) /* object packed? */ + packlist_find(&to_pack, peeled, NULL)) /* object packed? */ add_object_entry(sha1, OBJ_TAG, NULL, 0); return 0; } @@ -2055,14 +2112,14 @@ static void prepare_pack(int window, int depth) if (!pack_to_stdout) do_check_packed_object_crc = 1; - if (!nr_objects || !window || !depth) + if (!to_pack.nr_objects || !window || !depth) return; - delta_list = xmalloc(nr_objects * sizeof(*delta_list)); + delta_list = xmalloc(to_pack.nr_objects * sizeof(*delta_list)); nr_deltas = n = 0; - for (i = 0; i < nr_objects; i++) { - struct object_entry *entry = objects + i; + for (i = 0; i < to_pack.nr_objects; i++) { + struct object_entry *entry = to_pack.objects + i; if (entry->delta) /* This happens if we decided to reuse existing @@ -2140,6 +2197,20 @@ static int git_pack_config(const char *k, const char *v, void *cb) cache_max_small_delta_size = git_config_int(k, v); return 0; } + if (!strcmp(k, "pack.writebitmaps")) { + write_bitmap_index = git_config_bool(k, v); + return 0; + } + if (!strcmp(k, "pack.writebitmaphashcache")) { + if (git_config_bool(k, v)) + write_bitmap_options |= BITMAP_OPT_HASH_CACHE; + else + write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; + } + if (!strcmp(k, "pack.usebitmaps")) { + use_bitmap_index = git_config_bool(k, v); + return 0; + } if (!strcmp(k, "pack.threads")) { delta_search_threads = git_config_int(k, v); if (delta_search_threads < 0) @@ -2198,6 +2269,9 @@ static void show_commit(struct commit *commit, void *data) { add_object_entry(commit->object.sha1, OBJ_COMMIT, NULL, 0); commit->object.flags |= OBJECT_ADDED; + + if (write_bitmap_index) + index_commit_for_bitmap(commit); } static void show_object(struct object *obj, @@ -2340,7 +2414,7 @@ static void loosen_unused_packed_objects(struct rev_info *revs) for (i = 0; i < p->num_objects; i++) { sha1 = nth_packed_object_sha1(p, i); - if (!locate_object_entry(sha1) && + if (!packlist_find(&to_pack, sha1, NULL) && !has_sha1_pack_kept_or_nonlocal(sha1)) if (force_object_loose(sha1, p->mtime)) die("unable to force loose object"); @@ -2348,6 +2422,29 @@ static void loosen_unused_packed_objects(struct rev_info *revs) } } +static int get_object_list_from_bitmap(struct rev_info *revs) +{ + if (prepare_bitmap_walk(revs) < 0) + return -1; + + if (!reuse_partial_packfile_from_bitmap( + &reuse_packfile, + &reuse_packfile_objects, + &reuse_packfile_offset)) { + assert(reuse_packfile_objects); + nr_result += reuse_packfile_objects; + + if (progress) { + fprintf(stderr, "Reusing existing pack: %d, done.\n", + reuse_packfile_objects); + fflush(stderr); + } + } + + traverse_bitmap_commit_list(&add_object_entry_from_bitmap); + return 0; +} + static void get_object_list(int ac, const char **av) { struct rev_info revs; @@ -2367,6 +2464,7 @@ static void get_object_list(int ac, const char **av) if (*line == '-') { if (!strcmp(line, "--not")) { flags ^= UNINTERESTING; + write_bitmap_index = 0; continue; } die("not a rev '%s'", line); @@ -2375,6 +2473,9 @@ static void get_object_list(int ac, const char **av) die("bad revision '%s'", line); } + if (use_bitmap_index && !get_object_list_from_bitmap(&revs)) + return; + if (prepare_revision_walk(&revs)) die("revision walk setup failed"); mark_edges_uninteresting(&revs, show_edge); @@ -2504,6 +2605,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) N_("pack compression level")), OPT_SET_INT(0, "keep-true-parents", &grafts_replace_parents, N_("do not hide commits by grafts"), 0), + OPT_BOOL(0, "use-bitmap-index", &use_bitmap_index, + N_("use a bitmap index if available to speed up counting objects")), + OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index, + N_("write a bitmap index together with the pack index")), OPT_END(), }; @@ -2570,6 +2675,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (keep_unreachable && unpack_unreachable) die("--keep-unreachable and --unpack-unreachable are incompatible."); + if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + use_bitmap_index = 0; + + if (pack_to_stdout || !rev_list_all) + write_bitmap_index = 0; + if (progress && all_progress_implied) progress = 2; diff --git a/builtin/repack.c b/builtin/repack.c index bb2314c9cb..49f5857627 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -94,7 +94,7 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list) static void remove_redundant_pack(const char *dir_name, const char *base_name) { - const char *exts[] = {".pack", ".idx", ".keep"}; + const char *exts[] = {".pack", ".idx", ".keep", ".bitmap"}; int i; struct strbuf buf = STRBUF_INIT; size_t plen; @@ -115,7 +115,14 @@ static void remove_redundant_pack(const char *dir_name, const char *base_name) int cmd_repack(int argc, const char **argv, const char *prefix) { - const char *exts[2] = {".pack", ".idx"}; + struct { + const char *name; + unsigned optional:1; + } exts[] = { + {".pack"}, + {".idx"}, + {".bitmap", 1}, + }; struct child_process cmd; struct string_list_item *item; struct argv_array cmd_args = ARGV_ARRAY_INIT; @@ -137,6 +144,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) int no_update_server_info = 0; int quiet = 0; int local = 0; + int write_bitmap = -1; struct option builtin_repack_options[] = { OPT_BIT('a', NULL, &pack_everything, @@ -155,6 +163,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) OPT__QUIET(&quiet, N_("be quiet")), OPT_BOOL('l', "local", &local, N_("pass --local to git-pack-objects")), + OPT_BOOL('b', "write-bitmap-index", &write_bitmap, + N_("write bitmap index")), OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"), N_("with -A, do not loosen objects older than this")), OPT_STRING(0, "window", &window, N_("n"), @@ -196,6 +206,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argv_array_pushf(&cmd_args, "--no-reuse-delta"); if (no_reuse_object) argv_array_pushf(&cmd_args, "--no-reuse-object"); + if (write_bitmap >= 0) + argv_array_pushf(&cmd_args, "--%swrite-bitmap-index", + write_bitmap ? "" : "no-"); if (pack_everything & ALL_INTO_ONE) { get_non_kept_pack_filenames(&existing_packs); @@ -256,17 +269,17 @@ int cmd_repack(int argc, const char **argv, const char *prefix) */ failed = 0; for_each_string_list_item(item, &names) { - for (ext = 0; ext < 2; ext++) { + for (ext = 0; ext < ARRAY_SIZE(exts); ext++) { char *fname, *fname_old; fname = mkpathdup("%s/pack-%s%s", packdir, - item->string, exts[ext]); + item->string, exts[ext].name); if (!file_exists(fname)) { free(fname); continue; } fname_old = mkpath("%s/old-%s%s", packdir, - item->string, exts[ext]); + item->string, exts[ext].name); if (file_exists(fname_old)) if (unlink(fname_old)) failed = 1; @@ -313,19 +326,23 @@ int cmd_repack(int argc, const char **argv, const char *prefix) /* Now the ones with the same name are out of the way... */ for_each_string_list_item(item, &names) { - for (ext = 0; ext < 2; ext++) { + for (ext = 0; ext < ARRAY_SIZE(exts); ext++) { char *fname, *fname_old; struct stat statbuffer; + int exists = 0; fname = mkpathdup("%s/pack-%s%s", - packdir, item->string, exts[ext]); + packdir, item->string, exts[ext].name); fname_old = mkpathdup("%s-%s%s", - packtmp, item->string, exts[ext]); + packtmp, item->string, exts[ext].name); if (!stat(fname_old, &statbuffer)) { statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH); chmod(fname_old, statbuffer.st_mode); + exists = 1; + } + if (exists || !exts[ext].optional) { + if (rename(fname_old, fname)) + die_errno(_("renaming '%s' failed"), fname_old); } - if (rename(fname_old, fname)) - die_errno(_("renaming '%s' failed"), fname_old); free(fname); free(fname_old); } @@ -333,12 +350,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix) /* Remove the "old-" files */ for_each_string_list_item(item, &names) { - for (ext = 0; ext < 2; ext++) { + for (ext = 0; ext < ARRAY_SIZE(exts); ext++) { char *fname; fname = mkpath("%s/old-%s%s", packdir, item->string, - exts[ext]); + exts[ext].name); if (remove_path(fname)) warning(_("removing '%s' failed"), fname); } diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 0745e2d053..9f92905379 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -3,6 +3,8 @@ #include "diff.h" #include "revision.h" #include "list-objects.h" +#include "pack.h" +#include "pack-bitmap.h" #include "builtin.h" #include "log-tree.h" #include "graph.h" @@ -257,6 +259,18 @@ static int show_bisect_vars(struct rev_list_info *info, int reaches, int all) return 0; } +static int show_object_fast( + const unsigned char *sha1, + enum object_type type, + int exclude, + uint32_t name_hash, + struct packed_git *found_pack, + off_t found_offset) +{ + fprintf(stdout, "%s\n", sha1_to_hex(sha1)); + return 1; +} + int cmd_rev_list(int argc, const char **argv, const char *prefix) { struct rev_info revs; @@ -265,6 +279,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) int bisect_list = 0; int bisect_show_vars = 0; int bisect_find_all = 0; + int use_bitmap_index = 0; git_config(git_default_config, NULL); init_revisions(&revs, prefix); @@ -306,6 +321,14 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) bisect_show_vars = 1; continue; } + if (!strcmp(arg, "--use-bitmap-index")) { + use_bitmap_index = 1; + continue; + } + if (!strcmp(arg, "--test-bitmap")) { + test_bitmap_walk(&revs); + return 0; + } usage(rev_list_usage); } @@ -333,6 +356,22 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) if (bisect_list) revs.limited = 1; + if (use_bitmap_index) { + if (revs.count && !revs.left_right && !revs.cherry_mark) { + uint32_t commit_count; + if (!prepare_bitmap_walk(&revs)) { + count_bitmap_commit_list(&commit_count, NULL, NULL, NULL); + printf("%d\n", commit_count); + return 0; + } + } else if (revs.tag_objects && revs.tree_objects && revs.blob_objects) { + if (!prepare_bitmap_walk(&revs)) { + traverse_bitmap_commit_list(&show_object_fast); + return 0; + } + } + } + if (prepare_revision_walk(&revs)) die("revision walk setup failed"); if (revs.tree_objects) @@ -809,6 +809,7 @@ extern int hash_sha1_file(const void *buf, unsigned long len, const char *type, extern int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1); extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *); extern int force_object_loose(const unsigned char *sha1, time_t mtime); +extern int git_open_noatime(const char *name); extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size); extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz); extern int parse_sha1_header(const char *hdr, unsigned long *sizep); diff --git a/compat/bswap.h b/compat/bswap.h index 5061214f73..120c6c1d37 100644 --- a/compat/bswap.h +++ b/compat/bswap.h @@ -17,7 +17,20 @@ static inline uint32_t default_swab32(uint32_t val) ((val & 0x000000ff) << 24)); } +static inline uint64_t default_bswap64(uint64_t val) +{ + return (((val & (uint64_t)0x00000000000000ffULL) << 56) | + ((val & (uint64_t)0x000000000000ff00ULL) << 40) | + ((val & (uint64_t)0x0000000000ff0000ULL) << 24) | + ((val & (uint64_t)0x00000000ff000000ULL) << 8) | + ((val & (uint64_t)0x000000ff00000000ULL) >> 8) | + ((val & (uint64_t)0x0000ff0000000000ULL) >> 24) | + ((val & (uint64_t)0x00ff000000000000ULL) >> 40) | + ((val & (uint64_t)0xff00000000000000ULL) >> 56)); +} + #undef bswap32 +#undef bswap64 #if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__)) @@ -32,15 +45,42 @@ static inline uint32_t git_bswap32(uint32_t x) return result; } +#define bswap64 git_bswap64 +#if defined(__x86_64__) +static inline uint64_t git_bswap64(uint64_t x) +{ + uint64_t result; + if (__builtin_constant_p(x)) + result = default_bswap64(x); + else + __asm__("bswap %q0" : "=r" (result) : "0" (x)); + return result; +} +#else +static inline uint64_t git_bswap64(uint64_t x) +{ + union { uint64_t i64; uint32_t i32[2]; } tmp, result; + if (__builtin_constant_p(x)) + result.i64 = default_bswap64(x); + else { + tmp.i64 = x; + result.i32[0] = git_bswap32(tmp.i32[1]); + result.i32[1] = git_bswap32(tmp.i32[0]); + } + return result.i64; +} +#endif + #elif defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_X64)) #include <stdlib.h> #define bswap32(x) _byteswap_ulong(x) +#define bswap64(x) _byteswap_uint64(x) #endif -#ifdef bswap32 +#if defined(bswap32) #undef ntohl #undef htonl @@ -48,3 +88,73 @@ static inline uint32_t git_bswap32(uint32_t x) #define htonl(x) bswap32(x) #endif + +#if defined(bswap64) + +#undef ntohll +#undef htonll +#define ntohll(x) bswap64(x) +#define htonll(x) bswap64(x) + +#else + +#undef ntohll +#undef htonll + +#if !defined(__BYTE_ORDER) +# if defined(BYTE_ORDER) && defined(LITTLE_ENDIAN) && defined(BIG_ENDIAN) +# define __BYTE_ORDER BYTE_ORDER +# define __LITTLE_ENDIAN LITTLE_ENDIAN +# define __BIG_ENDIAN BIG_ENDIAN +# endif +#endif + +#if !defined(__BYTE_ORDER) +# error "Cannot determine endianness" +#endif + +#if __BYTE_ORDER == __BIG_ENDIAN +# define ntohll(n) (n) +# define htonll(n) (n) +#else +# define ntohll(n) default_bswap64(n) +# define htonll(n) default_bswap64(n) +#endif + +#endif + +/* + * Performance might be improved if the CPU architecture is OK with + * unaligned 32-bit loads and a fast ntohl() is available. + * Otherwise fall back to byte loads and shifts which is portable, + * and is faster on architectures with memory alignment issues. + */ + +#if defined(__i386__) || defined(__x86_64__) || \ + defined(_M_IX86) || defined(_M_X64) || \ + defined(__ppc__) || defined(__ppc64__) || \ + defined(__powerpc__) || defined(__powerpc64__) || \ + defined(__s390__) || defined(__s390x__) + +#define get_be16(p) ntohs(*(unsigned short *)(p)) +#define get_be32(p) ntohl(*(unsigned int *)(p)) +#define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0) + +#else + +#define get_be16(p) ( \ + (*((unsigned char *)(p) + 0) << 8) | \ + (*((unsigned char *)(p) + 1) << 0) ) +#define get_be32(p) ( \ + (*((unsigned char *)(p) + 0) << 24) | \ + (*((unsigned char *)(p) + 1) << 16) | \ + (*((unsigned char *)(p) + 2) << 8) | \ + (*((unsigned char *)(p) + 3) << 0) ) +#define put_be32(p, v) do { \ + unsigned int __v = (v); \ + *((unsigned char *)(p) + 0) = __v >> 24; \ + *((unsigned char *)(p) + 1) = __v >> 16; \ + *((unsigned char *)(p) + 2) = __v >> 8; \ + *((unsigned char *)(p) + 3) = __v >> 0; } while (0) + +#endif diff --git a/ewah/bitmap.c b/ewah/bitmap.c new file mode 100644 index 0000000000..710e58c1bf --- /dev/null +++ b/ewah/bitmap.c @@ -0,0 +1,221 @@ +/** + * Copyright 2013, GitHub, Inc + * Copyright 2009-2013, Daniel Lemire, Cliff Moon, + * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ +#include "git-compat-util.h" +#include "ewok.h" + +#define MASK(x) ((eword_t)1 << (x % BITS_IN_WORD)) +#define BLOCK(x) (x / BITS_IN_WORD) + +struct bitmap *bitmap_new(void) +{ + struct bitmap *bitmap = ewah_malloc(sizeof(struct bitmap)); + bitmap->words = ewah_calloc(32, sizeof(eword_t)); + bitmap->word_alloc = 32; + return bitmap; +} + +void bitmap_set(struct bitmap *self, size_t pos) +{ + size_t block = BLOCK(pos); + + if (block >= self->word_alloc) { + size_t old_size = self->word_alloc; + self->word_alloc = block * 2; + self->words = ewah_realloc(self->words, + self->word_alloc * sizeof(eword_t)); + + memset(self->words + old_size, 0x0, + (self->word_alloc - old_size) * sizeof(eword_t)); + } + + self->words[block] |= MASK(pos); +} + +void bitmap_clear(struct bitmap *self, size_t pos) +{ + size_t block = BLOCK(pos); + + if (block < self->word_alloc) + self->words[block] &= ~MASK(pos); +} + +int bitmap_get(struct bitmap *self, size_t pos) +{ + size_t block = BLOCK(pos); + return block < self->word_alloc && + (self->words[block] & MASK(pos)) != 0; +} + +struct ewah_bitmap *bitmap_to_ewah(struct bitmap *bitmap) +{ + struct ewah_bitmap *ewah = ewah_new(); + size_t i, running_empty_words = 0; + eword_t last_word = 0; + + for (i = 0; i < bitmap->word_alloc; ++i) { + if (bitmap->words[i] == 0) { + running_empty_words++; + continue; + } + + if (last_word != 0) + ewah_add(ewah, last_word); + + if (running_empty_words > 0) { + ewah_add_empty_words(ewah, 0, running_empty_words); + running_empty_words = 0; + } + + last_word = bitmap->words[i]; + } + + ewah_add(ewah, last_word); + return ewah; +} + +struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah) +{ + struct bitmap *bitmap = bitmap_new(); + struct ewah_iterator it; + eword_t blowup; + size_t i = 0; + + ewah_iterator_init(&it, ewah); + + while (ewah_iterator_next(&blowup, &it)) { + if (i >= bitmap->word_alloc) { + bitmap->word_alloc *= 1.5; + bitmap->words = ewah_realloc( + bitmap->words, bitmap->word_alloc * sizeof(eword_t)); + } + + bitmap->words[i++] = blowup; + } + + bitmap->word_alloc = i; + return bitmap; +} + +void bitmap_and_not(struct bitmap *self, struct bitmap *other) +{ + const size_t count = (self->word_alloc < other->word_alloc) ? + self->word_alloc : other->word_alloc; + + size_t i; + + for (i = 0; i < count; ++i) + self->words[i] &= ~other->words[i]; +} + +void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other) +{ + size_t original_size = self->word_alloc; + size_t other_final = (other->bit_size / BITS_IN_WORD) + 1; + size_t i = 0; + struct ewah_iterator it; + eword_t word; + + if (self->word_alloc < other_final) { + self->word_alloc = other_final; + self->words = ewah_realloc(self->words, + self->word_alloc * sizeof(eword_t)); + memset(self->words + original_size, 0x0, + (self->word_alloc - original_size) * sizeof(eword_t)); + } + + ewah_iterator_init(&it, other); + + while (ewah_iterator_next(&word, &it)) + self->words[i++] |= word; +} + +void bitmap_each_bit(struct bitmap *self, ewah_callback callback, void *data) +{ + size_t pos = 0, i; + + for (i = 0; i < self->word_alloc; ++i) { + eword_t word = self->words[i]; + uint32_t offset; + + if (word == (eword_t)~0) { + for (offset = 0; offset < BITS_IN_WORD; ++offset) + callback(pos++, data); + } else { + for (offset = 0; offset < BITS_IN_WORD; ++offset) { + if ((word >> offset) == 0) + break; + + offset += ewah_bit_ctz64(word >> offset); + callback(pos + offset, data); + } + pos += BITS_IN_WORD; + } + } +} + +size_t bitmap_popcount(struct bitmap *self) +{ + size_t i, count = 0; + + for (i = 0; i < self->word_alloc; ++i) + count += ewah_bit_popcount64(self->words[i]); + + return count; +} + +int bitmap_equals(struct bitmap *self, struct bitmap *other) +{ + struct bitmap *big, *small; + size_t i; + + if (self->word_alloc < other->word_alloc) { + small = self; + big = other; + } else { + small = other; + big = self; + } + + for (i = 0; i < small->word_alloc; ++i) { + if (small->words[i] != big->words[i]) + return 0; + } + + for (; i < big->word_alloc; ++i) { + if (big->words[i] != 0) + return 0; + } + + return 1; +} + +void bitmap_reset(struct bitmap *bitmap) +{ + memset(bitmap->words, 0x0, bitmap->word_alloc * sizeof(eword_t)); +} + +void bitmap_free(struct bitmap *bitmap) +{ + if (bitmap == NULL) + return; + + free(bitmap->words); + free(bitmap); +} diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c new file mode 100644 index 0000000000..9ced2dadfe --- /dev/null +++ b/ewah/ewah_bitmap.c @@ -0,0 +1,714 @@ +/** + * Copyright 2013, GitHub, Inc + * Copyright 2009-2013, Daniel Lemire, Cliff Moon, + * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ +#include "git-compat-util.h" +#include "ewok.h" +#include "ewok_rlw.h" + +static inline size_t min_size(size_t a, size_t b) +{ + return a < b ? a : b; +} + +static inline size_t max_size(size_t a, size_t b) +{ + return a > b ? a : b; +} + +static inline void buffer_grow(struct ewah_bitmap *self, size_t new_size) +{ + size_t rlw_offset = (uint8_t *)self->rlw - (uint8_t *)self->buffer; + + if (self->alloc_size >= new_size) + return; + + self->alloc_size = new_size; + self->buffer = ewah_realloc(self->buffer, + self->alloc_size * sizeof(eword_t)); + self->rlw = self->buffer + (rlw_offset / sizeof(size_t)); +} + +static inline void buffer_push(struct ewah_bitmap *self, eword_t value) +{ + if (self->buffer_size + 1 >= self->alloc_size) + buffer_grow(self, self->buffer_size * 3 / 2); + + self->buffer[self->buffer_size++] = value; +} + +static void buffer_push_rlw(struct ewah_bitmap *self, eword_t value) +{ + buffer_push(self, value); + self->rlw = self->buffer + self->buffer_size - 1; +} + +static size_t add_empty_words(struct ewah_bitmap *self, int v, size_t number) +{ + size_t added = 0; + eword_t runlen, can_add; + + if (rlw_get_run_bit(self->rlw) != v && rlw_size(self->rlw) == 0) { + rlw_set_run_bit(self->rlw, v); + } else if (rlw_get_literal_words(self->rlw) != 0 || + rlw_get_run_bit(self->rlw) != v) { + buffer_push_rlw(self, 0); + if (v) rlw_set_run_bit(self->rlw, v); + added++; + } + + runlen = rlw_get_running_len(self->rlw); + can_add = min_size(number, RLW_LARGEST_RUNNING_COUNT - runlen); + + rlw_set_running_len(self->rlw, runlen + can_add); + number -= can_add; + + while (number >= RLW_LARGEST_RUNNING_COUNT) { + buffer_push_rlw(self, 0); + added++; + if (v) rlw_set_run_bit(self->rlw, v); + rlw_set_running_len(self->rlw, RLW_LARGEST_RUNNING_COUNT); + number -= RLW_LARGEST_RUNNING_COUNT; + } + + if (number > 0) { + buffer_push_rlw(self, 0); + added++; + + if (v) rlw_set_run_bit(self->rlw, v); + rlw_set_running_len(self->rlw, number); + } + + return added; +} + +size_t ewah_add_empty_words(struct ewah_bitmap *self, int v, size_t number) +{ + if (number == 0) + return 0; + + self->bit_size += number * BITS_IN_WORD; + return add_empty_words(self, v, number); +} + +static size_t add_literal(struct ewah_bitmap *self, eword_t new_data) +{ + eword_t current_num = rlw_get_literal_words(self->rlw); + + if (current_num >= RLW_LARGEST_LITERAL_COUNT) { + buffer_push_rlw(self, 0); + + rlw_set_literal_words(self->rlw, 1); + buffer_push(self, new_data); + return 2; + } + + rlw_set_literal_words(self->rlw, current_num + 1); + + /* sanity check */ + assert(rlw_get_literal_words(self->rlw) == current_num + 1); + + buffer_push(self, new_data); + return 1; +} + +void ewah_add_dirty_words( + struct ewah_bitmap *self, const eword_t *buffer, + size_t number, int negate) +{ + size_t literals, can_add; + + while (1) { + literals = rlw_get_literal_words(self->rlw); + can_add = min_size(number, RLW_LARGEST_LITERAL_COUNT - literals); + + rlw_set_literal_words(self->rlw, literals + can_add); + + if (self->buffer_size + can_add >= self->alloc_size) + buffer_grow(self, (self->buffer_size + can_add) * 3 / 2); + + if (negate) { + size_t i; + for (i = 0; i < can_add; ++i) + self->buffer[self->buffer_size++] = ~buffer[i]; + } else { + memcpy(self->buffer + self->buffer_size, + buffer, can_add * sizeof(eword_t)); + self->buffer_size += can_add; + } + + self->bit_size += can_add * BITS_IN_WORD; + + if (number - can_add == 0) + break; + + buffer_push_rlw(self, 0); + buffer += can_add; + number -= can_add; + } +} + +static size_t add_empty_word(struct ewah_bitmap *self, int v) +{ + int no_literal = (rlw_get_literal_words(self->rlw) == 0); + eword_t run_len = rlw_get_running_len(self->rlw); + + if (no_literal && run_len == 0) { + rlw_set_run_bit(self->rlw, v); + assert(rlw_get_run_bit(self->rlw) == v); + } + + if (no_literal && rlw_get_run_bit(self->rlw) == v && + run_len < RLW_LARGEST_RUNNING_COUNT) { + rlw_set_running_len(self->rlw, run_len + 1); + assert(rlw_get_running_len(self->rlw) == run_len + 1); + return 0; + } else { + buffer_push_rlw(self, 0); + + assert(rlw_get_running_len(self->rlw) == 0); + assert(rlw_get_run_bit(self->rlw) == 0); + assert(rlw_get_literal_words(self->rlw) == 0); + + rlw_set_run_bit(self->rlw, v); + assert(rlw_get_run_bit(self->rlw) == v); + + rlw_set_running_len(self->rlw, 1); + assert(rlw_get_running_len(self->rlw) == 1); + assert(rlw_get_literal_words(self->rlw) == 0); + return 1; + } +} + +size_t ewah_add(struct ewah_bitmap *self, eword_t word) +{ + self->bit_size += BITS_IN_WORD; + + if (word == 0) + return add_empty_word(self, 0); + + if (word == (eword_t)(~0)) + return add_empty_word(self, 1); + + return add_literal(self, word); +} + +void ewah_set(struct ewah_bitmap *self, size_t i) +{ + const size_t dist = + (i + BITS_IN_WORD) / BITS_IN_WORD - + (self->bit_size + BITS_IN_WORD - 1) / BITS_IN_WORD; + + assert(i >= self->bit_size); + + self->bit_size = i + 1; + + if (dist > 0) { + if (dist > 1) + add_empty_words(self, 0, dist - 1); + + add_literal(self, (eword_t)1 << (i % BITS_IN_WORD)); + return; + } + + if (rlw_get_literal_words(self->rlw) == 0) { + rlw_set_running_len(self->rlw, + rlw_get_running_len(self->rlw) - 1); + add_literal(self, (eword_t)1 << (i % BITS_IN_WORD)); + return; + } + + self->buffer[self->buffer_size - 1] |= + ((eword_t)1 << (i % BITS_IN_WORD)); + + /* check if we just completed a stream of 1s */ + if (self->buffer[self->buffer_size - 1] == (eword_t)(~0)) { + self->buffer[--self->buffer_size] = 0; + rlw_set_literal_words(self->rlw, + rlw_get_literal_words(self->rlw) - 1); + add_empty_word(self, 1); + } +} + +void ewah_each_bit(struct ewah_bitmap *self, void (*callback)(size_t, void*), void *payload) +{ + size_t pos = 0; + size_t pointer = 0; + size_t k; + + while (pointer < self->buffer_size) { + eword_t *word = &self->buffer[pointer]; + + if (rlw_get_run_bit(word)) { + size_t len = rlw_get_running_len(word) * BITS_IN_WORD; + for (k = 0; k < len; ++k, ++pos) + callback(pos, payload); + } else { + pos += rlw_get_running_len(word) * BITS_IN_WORD; + } + + ++pointer; + + for (k = 0; k < rlw_get_literal_words(word); ++k) { + int c; + + /* todo: zero count optimization */ + for (c = 0; c < BITS_IN_WORD; ++c, ++pos) { + if ((self->buffer[pointer] & ((eword_t)1 << c)) != 0) + callback(pos, payload); + } + + ++pointer; + } + } +} + +struct ewah_bitmap *ewah_new(void) +{ + struct ewah_bitmap *self; + + self = ewah_malloc(sizeof(struct ewah_bitmap)); + if (self == NULL) + return NULL; + + self->buffer = ewah_malloc(32 * sizeof(eword_t)); + self->alloc_size = 32; + + ewah_clear(self); + return self; +} + +void ewah_clear(struct ewah_bitmap *self) +{ + self->buffer_size = 1; + self->buffer[0] = 0; + self->bit_size = 0; + self->rlw = self->buffer; +} + +void ewah_free(struct ewah_bitmap *self) +{ + if (!self) + return; + + if (self->alloc_size) + free(self->buffer); + + free(self); +} + +static void read_new_rlw(struct ewah_iterator *it) +{ + const eword_t *word = NULL; + + it->literals = 0; + it->compressed = 0; + + while (1) { + word = &it->buffer[it->pointer]; + + it->rl = rlw_get_running_len(word); + it->lw = rlw_get_literal_words(word); + it->b = rlw_get_run_bit(word); + + if (it->rl || it->lw) + return; + + if (it->pointer < it->buffer_size - 1) { + it->pointer++; + } else { + it->pointer = it->buffer_size; + return; + } + } +} + +int ewah_iterator_next(eword_t *next, struct ewah_iterator *it) +{ + if (it->pointer >= it->buffer_size) + return 0; + + if (it->compressed < it->rl) { + it->compressed++; + *next = it->b ? (eword_t)(~0) : 0; + } else { + assert(it->literals < it->lw); + + it->literals++; + it->pointer++; + + assert(it->pointer < it->buffer_size); + + *next = it->buffer[it->pointer]; + } + + if (it->compressed == it->rl && it->literals == it->lw) { + if (++it->pointer < it->buffer_size) + read_new_rlw(it); + } + + return 1; +} + +void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent) +{ + it->buffer = parent->buffer; + it->buffer_size = parent->buffer_size; + it->pointer = 0; + + it->lw = 0; + it->rl = 0; + it->compressed = 0; + it->literals = 0; + it->b = 0; + + if (it->pointer < it->buffer_size) + read_new_rlw(it); +} + +void ewah_not(struct ewah_bitmap *self) +{ + size_t pointer = 0; + + while (pointer < self->buffer_size) { + eword_t *word = &self->buffer[pointer]; + size_t literals, k; + + rlw_xor_run_bit(word); + ++pointer; + + literals = rlw_get_literal_words(word); + for (k = 0; k < literals; ++k) { + self->buffer[pointer] = ~self->buffer[pointer]; + ++pointer; + } + } +} + +void ewah_xor( + struct ewah_bitmap *ewah_i, + struct ewah_bitmap *ewah_j, + struct ewah_bitmap *out) +{ + struct rlw_iterator rlw_i; + struct rlw_iterator rlw_j; + size_t literals; + + rlwit_init(&rlw_i, ewah_i); + rlwit_init(&rlw_j, ewah_j); + + while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) { + while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) { + struct rlw_iterator *prey, *predator; + size_t index; + int negate_words; + + if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) { + prey = &rlw_i; + predator = &rlw_j; + } else { + prey = &rlw_j; + predator = &rlw_i; + } + + negate_words = !!predator->rlw.running_bit; + index = rlwit_discharge(prey, out, + predator->rlw.running_len, negate_words); + + ewah_add_empty_words(out, negate_words, + predator->rlw.running_len - index); + + rlwit_discard_first_words(predator, + predator->rlw.running_len); + } + + literals = min_size( + rlw_i.rlw.literal_words, + rlw_j.rlw.literal_words); + + if (literals) { + size_t k; + + for (k = 0; k < literals; ++k) { + ewah_add(out, + rlw_i.buffer[rlw_i.literal_word_start + k] ^ + rlw_j.buffer[rlw_j.literal_word_start + k] + ); + } + + rlwit_discard_first_words(&rlw_i, literals); + rlwit_discard_first_words(&rlw_j, literals); + } + } + + if (rlwit_word_size(&rlw_i) > 0) + rlwit_discharge(&rlw_i, out, ~0, 0); + else + rlwit_discharge(&rlw_j, out, ~0, 0); + + out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size); +} + +void ewah_and( + struct ewah_bitmap *ewah_i, + struct ewah_bitmap *ewah_j, + struct ewah_bitmap *out) +{ + struct rlw_iterator rlw_i; + struct rlw_iterator rlw_j; + size_t literals; + + rlwit_init(&rlw_i, ewah_i); + rlwit_init(&rlw_j, ewah_j); + + while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) { + while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) { + struct rlw_iterator *prey, *predator; + + if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) { + prey = &rlw_i; + predator = &rlw_j; + } else { + prey = &rlw_j; + predator = &rlw_i; + } + + if (predator->rlw.running_bit == 0) { + ewah_add_empty_words(out, 0, + predator->rlw.running_len); + rlwit_discard_first_words(prey, + predator->rlw.running_len); + rlwit_discard_first_words(predator, + predator->rlw.running_len); + } else { + size_t index = rlwit_discharge(prey, out, + predator->rlw.running_len, 0); + ewah_add_empty_words(out, 0, + predator->rlw.running_len - index); + rlwit_discard_first_words(predator, + predator->rlw.running_len); + } + } + + literals = min_size( + rlw_i.rlw.literal_words, + rlw_j.rlw.literal_words); + + if (literals) { + size_t k; + + for (k = 0; k < literals; ++k) { + ewah_add(out, + rlw_i.buffer[rlw_i.literal_word_start + k] & + rlw_j.buffer[rlw_j.literal_word_start + k] + ); + } + + rlwit_discard_first_words(&rlw_i, literals); + rlwit_discard_first_words(&rlw_j, literals); + } + } + + if (rlwit_word_size(&rlw_i) > 0) + rlwit_discharge_empty(&rlw_i, out); + else + rlwit_discharge_empty(&rlw_j, out); + + out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size); +} + +void ewah_and_not( + struct ewah_bitmap *ewah_i, + struct ewah_bitmap *ewah_j, + struct ewah_bitmap *out) +{ + struct rlw_iterator rlw_i; + struct rlw_iterator rlw_j; + size_t literals; + + rlwit_init(&rlw_i, ewah_i); + rlwit_init(&rlw_j, ewah_j); + + while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) { + while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) { + struct rlw_iterator *prey, *predator; + + if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) { + prey = &rlw_i; + predator = &rlw_j; + } else { + prey = &rlw_j; + predator = &rlw_i; + } + + if ((predator->rlw.running_bit && prey == &rlw_i) || + (!predator->rlw.running_bit && prey != &rlw_i)) { + ewah_add_empty_words(out, 0, + predator->rlw.running_len); + rlwit_discard_first_words(prey, + predator->rlw.running_len); + rlwit_discard_first_words(predator, + predator->rlw.running_len); + } else { + size_t index; + int negate_words; + + negate_words = (&rlw_i != prey); + index = rlwit_discharge(prey, out, + predator->rlw.running_len, negate_words); + ewah_add_empty_words(out, negate_words, + predator->rlw.running_len - index); + rlwit_discard_first_words(predator, + predator->rlw.running_len); + } + } + + literals = min_size( + rlw_i.rlw.literal_words, + rlw_j.rlw.literal_words); + + if (literals) { + size_t k; + + for (k = 0; k < literals; ++k) { + ewah_add(out, + rlw_i.buffer[rlw_i.literal_word_start + k] & + ~(rlw_j.buffer[rlw_j.literal_word_start + k]) + ); + } + + rlwit_discard_first_words(&rlw_i, literals); + rlwit_discard_first_words(&rlw_j, literals); + } + } + + if (rlwit_word_size(&rlw_i) > 0) + rlwit_discharge(&rlw_i, out, ~0, 0); + else + rlwit_discharge_empty(&rlw_j, out); + + out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size); +} + +void ewah_or( + struct ewah_bitmap *ewah_i, + struct ewah_bitmap *ewah_j, + struct ewah_bitmap *out) +{ + struct rlw_iterator rlw_i; + struct rlw_iterator rlw_j; + size_t literals; + + rlwit_init(&rlw_i, ewah_i); + rlwit_init(&rlw_j, ewah_j); + + while (rlwit_word_size(&rlw_i) > 0 && rlwit_word_size(&rlw_j) > 0) { + while (rlw_i.rlw.running_len > 0 || rlw_j.rlw.running_len > 0) { + struct rlw_iterator *prey, *predator; + + if (rlw_i.rlw.running_len < rlw_j.rlw.running_len) { + prey = &rlw_i; + predator = &rlw_j; + } else { + prey = &rlw_j; + predator = &rlw_i; + } + + if (predator->rlw.running_bit) { + ewah_add_empty_words(out, 0, + predator->rlw.running_len); + rlwit_discard_first_words(prey, + predator->rlw.running_len); + rlwit_discard_first_words(predator, + predator->rlw.running_len); + } else { + size_t index = rlwit_discharge(prey, out, + predator->rlw.running_len, 0); + ewah_add_empty_words(out, 0, + predator->rlw.running_len - index); + rlwit_discard_first_words(predator, + predator->rlw.running_len); + } + } + + literals = min_size( + rlw_i.rlw.literal_words, + rlw_j.rlw.literal_words); + + if (literals) { + size_t k; + + for (k = 0; k < literals; ++k) { + ewah_add(out, + rlw_i.buffer[rlw_i.literal_word_start + k] | + rlw_j.buffer[rlw_j.literal_word_start + k] + ); + } + + rlwit_discard_first_words(&rlw_i, literals); + rlwit_discard_first_words(&rlw_j, literals); + } + } + + if (rlwit_word_size(&rlw_i) > 0) + rlwit_discharge(&rlw_i, out, ~0, 0); + else + rlwit_discharge(&rlw_j, out, ~0, 0); + + out->bit_size = max_size(ewah_i->bit_size, ewah_j->bit_size); +} + + +#define BITMAP_POOL_MAX 16 +static struct ewah_bitmap *bitmap_pool[BITMAP_POOL_MAX]; +static size_t bitmap_pool_size; + +struct ewah_bitmap *ewah_pool_new(void) +{ + if (bitmap_pool_size) + return bitmap_pool[--bitmap_pool_size]; + + return ewah_new(); +} + +void ewah_pool_free(struct ewah_bitmap *self) +{ + if (self == NULL) + return; + + if (bitmap_pool_size == BITMAP_POOL_MAX || + self->alloc_size == 0) { + ewah_free(self); + return; + } + + ewah_clear(self); + bitmap_pool[bitmap_pool_size++] = self; +} + +uint32_t ewah_checksum(struct ewah_bitmap *self) +{ + const uint8_t *p = (uint8_t *)self->buffer; + uint32_t crc = (uint32_t)self->bit_size; + size_t size = self->buffer_size * sizeof(eword_t); + + while (size--) + crc = (crc << 5) - crc + (uint32_t)*p++; + + return crc; +} diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c new file mode 100644 index 0000000000..f7f700ef51 --- /dev/null +++ b/ewah/ewah_io.c @@ -0,0 +1,204 @@ +/** + * Copyright 2013, GitHub, Inc + * Copyright 2009-2013, Daniel Lemire, Cliff Moon, + * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ +#include "git-compat-util.h" +#include "ewok.h" + +int ewah_serialize_native(struct ewah_bitmap *self, int fd) +{ + uint32_t write32; + size_t to_write = self->buffer_size * 8; + + /* 32 bit -- bit size for the map */ + write32 = (uint32_t)self->bit_size; + if (write(fd, &write32, 4) != 4) + return -1; + + /** 32 bit -- number of compressed 64-bit words */ + write32 = (uint32_t)self->buffer_size; + if (write(fd, &write32, 4) != 4) + return -1; + + if (write(fd, self->buffer, to_write) != to_write) + return -1; + + /** 32 bit -- position for the RLW */ + write32 = self->rlw - self->buffer; + if (write(fd, &write32, 4) != 4) + return -1; + + return (3 * 4) + to_write; +} + +int ewah_serialize_to(struct ewah_bitmap *self, + int (*write_fun)(void *, const void *, size_t), + void *data) +{ + size_t i; + eword_t dump[2048]; + const size_t words_per_dump = sizeof(dump) / sizeof(eword_t); + uint32_t bitsize, word_count, rlw_pos; + + const eword_t *buffer; + size_t words_left; + + /* 32 bit -- bit size for the map */ + bitsize = htonl((uint32_t)self->bit_size); + if (write_fun(data, &bitsize, 4) != 4) + return -1; + + /** 32 bit -- number of compressed 64-bit words */ + word_count = htonl((uint32_t)self->buffer_size); + if (write_fun(data, &word_count, 4) != 4) + return -1; + + /** 64 bit x N -- compressed words */ + buffer = self->buffer; + words_left = self->buffer_size; + + while (words_left >= words_per_dump) { + for (i = 0; i < words_per_dump; ++i, ++buffer) + dump[i] = htonll(*buffer); + + if (write_fun(data, dump, sizeof(dump)) != sizeof(dump)) + return -1; + + words_left -= words_per_dump; + } + + if (words_left) { + for (i = 0; i < words_left; ++i, ++buffer) + dump[i] = htonll(*buffer); + + if (write_fun(data, dump, words_left * 8) != words_left * 8) + return -1; + } + + /** 32 bit -- position for the RLW */ + rlw_pos = (uint8_t*)self->rlw - (uint8_t *)self->buffer; + rlw_pos = htonl(rlw_pos / sizeof(eword_t)); + + if (write_fun(data, &rlw_pos, 4) != 4) + return -1; + + return (3 * 4) + (self->buffer_size * 8); +} + +static int write_helper(void *fd, const void *buf, size_t len) +{ + return write((intptr_t)fd, buf, len); +} + +int ewah_serialize(struct ewah_bitmap *self, int fd) +{ + return ewah_serialize_to(self, write_helper, (void *)(intptr_t)fd); +} + +int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len) +{ + uint8_t *ptr = map; + size_t i; + + self->bit_size = get_be32(ptr); + ptr += sizeof(uint32_t); + + self->buffer_size = self->alloc_size = get_be32(ptr); + ptr += sizeof(uint32_t); + + self->buffer = ewah_realloc(self->buffer, + self->alloc_size * sizeof(eword_t)); + + if (!self->buffer) + return -1; + + /* + * Copy the raw data for the bitmap as a whole chunk; + * if we're in a little-endian platform, we'll perform + * the endianness conversion in a separate pass to ensure + * we're loading 8-byte aligned words. + */ + memcpy(self->buffer, ptr, self->buffer_size * sizeof(uint64_t)); + ptr += self->buffer_size * sizeof(uint64_t); + + for (i = 0; i < self->buffer_size; ++i) + self->buffer[i] = ntohll(self->buffer[i]); + + self->rlw = self->buffer + get_be32(ptr); + + return (3 * 4) + (self->buffer_size * 8); +} + +int ewah_deserialize(struct ewah_bitmap *self, int fd) +{ + size_t i; + eword_t dump[2048]; + const size_t words_per_dump = sizeof(dump) / sizeof(eword_t); + uint32_t bitsize, word_count, rlw_pos; + + eword_t *buffer = NULL; + size_t words_left; + + ewah_clear(self); + + /* 32 bit -- bit size for the map */ + if (read(fd, &bitsize, 4) != 4) + return -1; + + self->bit_size = (size_t)ntohl(bitsize); + + /** 32 bit -- number of compressed 64-bit words */ + if (read(fd, &word_count, 4) != 4) + return -1; + + self->buffer_size = self->alloc_size = (size_t)ntohl(word_count); + self->buffer = ewah_realloc(self->buffer, + self->alloc_size * sizeof(eword_t)); + + if (!self->buffer) + return -1; + + /** 64 bit x N -- compressed words */ + buffer = self->buffer; + words_left = self->buffer_size; + + while (words_left >= words_per_dump) { + if (read(fd, dump, sizeof(dump)) != sizeof(dump)) + return -1; + + for (i = 0; i < words_per_dump; ++i, ++buffer) + *buffer = ntohll(dump[i]); + + words_left -= words_per_dump; + } + + if (words_left) { + if (read(fd, dump, words_left * 8) != words_left * 8) + return -1; + + for (i = 0; i < words_left; ++i, ++buffer) + *buffer = ntohll(dump[i]); + } + + /** 32 bit -- position for the RLW */ + if (read(fd, &rlw_pos, 4) != 4) + return -1; + + self->rlw = self->buffer + ntohl(rlw_pos); + return 0; +} diff --git a/ewah/ewah_rlw.c b/ewah/ewah_rlw.c new file mode 100644 index 0000000000..c723f1aefd --- /dev/null +++ b/ewah/ewah_rlw.c @@ -0,0 +1,115 @@ +/** + * Copyright 2013, GitHub, Inc + * Copyright 2009-2013, Daniel Lemire, Cliff Moon, + * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ +#include "git-compat-util.h" +#include "ewok.h" +#include "ewok_rlw.h" + +static inline int next_word(struct rlw_iterator *it) +{ + if (it->pointer >= it->size) + return 0; + + it->rlw.word = &it->buffer[it->pointer]; + it->pointer += rlw_get_literal_words(it->rlw.word) + 1; + + it->rlw.literal_words = rlw_get_literal_words(it->rlw.word); + it->rlw.running_len = rlw_get_running_len(it->rlw.word); + it->rlw.running_bit = rlw_get_run_bit(it->rlw.word); + it->rlw.literal_word_offset = 0; + + return 1; +} + +void rlwit_init(struct rlw_iterator *it, struct ewah_bitmap *from_ewah) +{ + it->buffer = from_ewah->buffer; + it->size = from_ewah->buffer_size; + it->pointer = 0; + + next_word(it); + + it->literal_word_start = rlwit_literal_words(it) + + it->rlw.literal_word_offset; +} + +void rlwit_discard_first_words(struct rlw_iterator *it, size_t x) +{ + while (x > 0) { + size_t discard; + + if (it->rlw.running_len > x) { + it->rlw.running_len -= x; + return; + } + + x -= it->rlw.running_len; + it->rlw.running_len = 0; + + discard = (x > it->rlw.literal_words) ? it->rlw.literal_words : x; + + it->literal_word_start += discard; + it->rlw.literal_words -= discard; + x -= discard; + + if (x > 0 || rlwit_word_size(it) == 0) { + if (!next_word(it)) + break; + + it->literal_word_start = + rlwit_literal_words(it) + it->rlw.literal_word_offset; + } + } +} + +size_t rlwit_discharge( + struct rlw_iterator *it, struct ewah_bitmap *out, size_t max, int negate) +{ + size_t index = 0; + + while (index < max && rlwit_word_size(it) > 0) { + size_t pd, pl = it->rlw.running_len; + + if (index + pl > max) + pl = max - index; + + ewah_add_empty_words(out, it->rlw.running_bit ^ negate, pl); + index += pl; + + pd = it->rlw.literal_words; + if (pd + index > max) + pd = max - index; + + ewah_add_dirty_words(out, + it->buffer + it->literal_word_start, pd, negate); + + rlwit_discard_first_words(it, pd + pl); + index += pd; + } + + return index; +} + +void rlwit_discharge_empty(struct rlw_iterator *it, struct ewah_bitmap *out) +{ + while (rlwit_word_size(it) > 0) { + ewah_add_empty_words(out, 0, rlwit_word_size(it)); + rlwit_discard_first_words(it, rlwit_word_size(it)); + } +} diff --git a/ewah/ewok.h b/ewah/ewok.h new file mode 100644 index 0000000000..43adeb5c68 --- /dev/null +++ b/ewah/ewok.h @@ -0,0 +1,233 @@ +/** + * Copyright 2013, GitHub, Inc + * Copyright 2009-2013, Daniel Lemire, Cliff Moon, + * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ +#ifndef __EWOK_BITMAP_H__ +#define __EWOK_BITMAP_H__ + +#ifndef ewah_malloc +# define ewah_malloc xmalloc +#endif +#ifndef ewah_realloc +# define ewah_realloc xrealloc +#endif +#ifndef ewah_calloc +# define ewah_calloc xcalloc +#endif + +typedef uint64_t eword_t; +#define BITS_IN_WORD (sizeof(eword_t) * 8) + +/** + * Do not use __builtin_popcountll. The GCC implementation + * is notoriously slow on all platforms. + * + * See: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041 + */ +static inline uint32_t ewah_bit_popcount64(uint64_t x) +{ + x = (x & 0x5555555555555555ULL) + ((x >> 1) & 0x5555555555555555ULL); + x = (x & 0x3333333333333333ULL) + ((x >> 2) & 0x3333333333333333ULL); + x = (x & 0x0F0F0F0F0F0F0F0FULL) + ((x >> 4) & 0x0F0F0F0F0F0F0F0FULL); + return (x * 0x0101010101010101ULL) >> 56; +} + +#ifdef __GNUC__ +#define ewah_bit_ctz64(x) __builtin_ctzll(x) +#else +static inline int ewah_bit_ctz64(uint64_t x) +{ + int n = 0; + if ((x & 0xffffffff) == 0) { x >>= 32; n += 32; } + if ((x & 0xffff) == 0) { x >>= 16; n += 16; } + if ((x & 0xff) == 0) { x >>= 8; n += 8; } + if ((x & 0xf) == 0) { x >>= 4; n += 4; } + if ((x & 0x3) == 0) { x >>= 2; n += 2; } + if ((x & 0x1) == 0) { x >>= 1; n += 1; } + return n + !x; +} +#endif + +struct ewah_bitmap { + eword_t *buffer; + size_t buffer_size; + size_t alloc_size; + size_t bit_size; + eword_t *rlw; +}; + +typedef void (*ewah_callback)(size_t pos, void *); + +struct ewah_bitmap *ewah_pool_new(void); +void ewah_pool_free(struct ewah_bitmap *self); + +/** + * Allocate a new EWAH Compressed bitmap + */ +struct ewah_bitmap *ewah_new(void); + +/** + * Clear all the bits in the bitmap. Does not free or resize + * memory. + */ +void ewah_clear(struct ewah_bitmap *self); + +/** + * Free all the memory of the bitmap + */ +void ewah_free(struct ewah_bitmap *self); + +int ewah_serialize_to(struct ewah_bitmap *self, + int (*write_fun)(void *out, const void *buf, size_t len), + void *out); +int ewah_serialize(struct ewah_bitmap *self, int fd); +int ewah_serialize_native(struct ewah_bitmap *self, int fd); + +int ewah_deserialize(struct ewah_bitmap *self, int fd); +int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len); +int ewah_read_mmap_native(struct ewah_bitmap *self, void *map, size_t len); + +uint32_t ewah_checksum(struct ewah_bitmap *self); + +/** + * Logical not (bitwise negation) in-place on the bitmap + * + * This operation is linear time based on the size of the bitmap. + */ +void ewah_not(struct ewah_bitmap *self); + +/** + * Call the given callback with the position of every single bit + * that has been set on the bitmap. + * + * This is an efficient operation that does not fully decompress + * the bitmap. + */ +void ewah_each_bit(struct ewah_bitmap *self, ewah_callback callback, void *payload); + +/** + * Set a given bit on the bitmap. + * + * The bit at position `pos` will be set to true. Because of the + * way that the bitmap is compressed, a set bit cannot be unset + * later on. + * + * Furthermore, since the bitmap uses streaming compression, bits + * can only set incrementally. + * + * E.g. + * ewah_set(bitmap, 1); // ok + * ewah_set(bitmap, 76); // ok + * ewah_set(bitmap, 77); // ok + * ewah_set(bitmap, 8712800127); // ok + * ewah_set(bitmap, 25); // failed, assert raised + */ +void ewah_set(struct ewah_bitmap *self, size_t i); + +struct ewah_iterator { + const eword_t *buffer; + size_t buffer_size; + + size_t pointer; + eword_t compressed, literals; + eword_t rl, lw; + int b; +}; + +/** + * Initialize a new iterator to run through the bitmap in uncompressed form. + * + * The iterator can be stack allocated. The underlying bitmap must not be freed + * before the iteration is over. + * + * E.g. + * + * struct ewah_bitmap *bitmap = ewah_new(); + * struct ewah_iterator it; + * + * ewah_iterator_init(&it, bitmap); + */ +void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent); + +/** + * Yield every single word in the bitmap in uncompressed form. This is: + * yield single words (32-64 bits) where each bit represents an actual + * bit from the bitmap. + * + * Return: true if a word was yield, false if there are no words left + */ +int ewah_iterator_next(eword_t *next, struct ewah_iterator *it); + +void ewah_or( + struct ewah_bitmap *ewah_i, + struct ewah_bitmap *ewah_j, + struct ewah_bitmap *out); + +void ewah_and_not( + struct ewah_bitmap *ewah_i, + struct ewah_bitmap *ewah_j, + struct ewah_bitmap *out); + +void ewah_xor( + struct ewah_bitmap *ewah_i, + struct ewah_bitmap *ewah_j, + struct ewah_bitmap *out); + +void ewah_and( + struct ewah_bitmap *ewah_i, + struct ewah_bitmap *ewah_j, + struct ewah_bitmap *out); + +/** + * Direct word access + */ +size_t ewah_add_empty_words(struct ewah_bitmap *self, int v, size_t number); +void ewah_add_dirty_words( + struct ewah_bitmap *self, const eword_t *buffer, size_t number, int negate); +size_t ewah_add(struct ewah_bitmap *self, eword_t word); + + +/** + * Uncompressed, old-school bitmap that can be efficiently compressed + * into an `ewah_bitmap`. + */ +struct bitmap { + eword_t *words; + size_t word_alloc; +}; + +struct bitmap *bitmap_new(void); +void bitmap_set(struct bitmap *self, size_t pos); +void bitmap_clear(struct bitmap *self, size_t pos); +int bitmap_get(struct bitmap *self, size_t pos); +void bitmap_reset(struct bitmap *self); +void bitmap_free(struct bitmap *self); +int bitmap_equals(struct bitmap *self, struct bitmap *other); +int bitmap_is_subset(struct bitmap *self, struct bitmap *super); + +struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap); +struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah); + +void bitmap_and_not(struct bitmap *self, struct bitmap *other); +void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other); +void bitmap_or(struct bitmap *self, const struct bitmap *other); + +void bitmap_each_bit(struct bitmap *self, ewah_callback callback, void *data); +size_t bitmap_popcount(struct bitmap *self); + +#endif diff --git a/ewah/ewok_rlw.h b/ewah/ewok_rlw.h new file mode 100644 index 0000000000..63efdf9698 --- /dev/null +++ b/ewah/ewok_rlw.h @@ -0,0 +1,114 @@ +/** + * Copyright 2013, GitHub, Inc + * Copyright 2009-2013, Daniel Lemire, Cliff Moon, + * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ +#ifndef __EWOK_RLW_H__ +#define __EWOK_RLW_H__ + +#define RLW_RUNNING_BITS (sizeof(eword_t) * 4) +#define RLW_LITERAL_BITS (sizeof(eword_t) * 8 - 1 - RLW_RUNNING_BITS) + +#define RLW_LARGEST_RUNNING_COUNT (((eword_t)1 << RLW_RUNNING_BITS) - 1) +#define RLW_LARGEST_LITERAL_COUNT (((eword_t)1 << RLW_LITERAL_BITS) - 1) + +#define RLW_LARGEST_RUNNING_COUNT_SHIFT (RLW_LARGEST_RUNNING_COUNT << 1) + +#define RLW_RUNNING_LEN_PLUS_BIT (((eword_t)1 << (RLW_RUNNING_BITS + 1)) - 1) + +static int rlw_get_run_bit(const eword_t *word) +{ + return *word & (eword_t)1; +} + +static inline void rlw_set_run_bit(eword_t *word, int b) +{ + if (b) { + *word |= (eword_t)1; + } else { + *word &= (eword_t)(~1); + } +} + +static inline void rlw_xor_run_bit(eword_t *word) +{ + if (*word & 1) { + *word &= (eword_t)(~1); + } else { + *word |= (eword_t)1; + } +} + +static inline void rlw_set_running_len(eword_t *word, eword_t l) +{ + *word |= RLW_LARGEST_RUNNING_COUNT_SHIFT; + *word &= (l << 1) | (~RLW_LARGEST_RUNNING_COUNT_SHIFT); +} + +static inline eword_t rlw_get_running_len(const eword_t *word) +{ + return (*word >> 1) & RLW_LARGEST_RUNNING_COUNT; +} + +static inline eword_t rlw_get_literal_words(const eword_t *word) +{ + return *word >> (1 + RLW_RUNNING_BITS); +} + +static inline void rlw_set_literal_words(eword_t *word, eword_t l) +{ + *word |= ~RLW_RUNNING_LEN_PLUS_BIT; + *word &= (l << (RLW_RUNNING_BITS + 1)) | RLW_RUNNING_LEN_PLUS_BIT; +} + +static inline eword_t rlw_size(const eword_t *self) +{ + return rlw_get_running_len(self) + rlw_get_literal_words(self); +} + +struct rlw_iterator { + const eword_t *buffer; + size_t size; + size_t pointer; + size_t literal_word_start; + + struct { + const eword_t *word; + int literal_words; + int running_len; + int literal_word_offset; + int running_bit; + } rlw; +}; + +void rlwit_init(struct rlw_iterator *it, struct ewah_bitmap *bitmap); +void rlwit_discard_first_words(struct rlw_iterator *it, size_t x); +size_t rlwit_discharge( + struct rlw_iterator *it, struct ewah_bitmap *out, size_t max, int negate); +void rlwit_discharge_empty(struct rlw_iterator *it, struct ewah_bitmap *out); + +static inline size_t rlwit_word_size(struct rlw_iterator *it) +{ + return it->rlw.running_len + it->rlw.literal_words; +} + +static inline size_t rlwit_literal_words(struct rlw_iterator *it) +{ + return it->pointer - it->rlw.literal_words; +} + +#endif diff --git a/khash.h b/khash.h new file mode 100644 index 0000000000..57ff6038c5 --- /dev/null +++ b/khash.h @@ -0,0 +1,338 @@ +/* The MIT License + + Copyright (c) 2008, 2009, 2011 by Attractive Chaos <attractor@live.co.uk> + + Permission is hereby granted, free of charge, to any person obtaining + a copy of this software and associated documentation files (the + "Software"), to deal in the Software without restriction, including + without limitation the rights to use, copy, modify, merge, publish, + distribute, sublicense, and/or sell copies of the Software, and to + permit persons to whom the Software is furnished to do so, subject to + the following conditions: + + The above copyright notice and this permission notice shall be + included in all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + SOFTWARE. +*/ + +#ifndef __AC_KHASH_H +#define __AC_KHASH_H + +#define AC_VERSION_KHASH_H "0.2.8" + +typedef uint32_t khint32_t; +typedef uint64_t khint64_t; + +typedef khint32_t khint_t; +typedef khint_t khiter_t; + +#define __ac_isempty(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&2) +#define __ac_isdel(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&1) +#define __ac_iseither(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&3) +#define __ac_set_isdel_false(flag, i) (flag[i>>4]&=~(1ul<<((i&0xfU)<<1))) +#define __ac_set_isempty_false(flag, i) (flag[i>>4]&=~(2ul<<((i&0xfU)<<1))) +#define __ac_set_isboth_false(flag, i) (flag[i>>4]&=~(3ul<<((i&0xfU)<<1))) +#define __ac_set_isdel_true(flag, i) (flag[i>>4]|=1ul<<((i&0xfU)<<1)) + +#define __ac_fsize(m) ((m) < 16? 1 : (m)>>4) + +#define kroundup32(x) (--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x)) + +static inline khint_t __ac_X31_hash_string(const char *s) +{ + khint_t h = (khint_t)*s; + if (h) for (++s ; *s; ++s) h = (h << 5) - h + (khint_t)*s; + return h; +} + +#define kh_str_hash_func(key) __ac_X31_hash_string(key) +#define kh_str_hash_equal(a, b) (strcmp(a, b) == 0) + +static const double __ac_HASH_UPPER = 0.77; + +#define __KHASH_TYPE(name, khkey_t, khval_t) \ + typedef struct { \ + khint_t n_buckets, size, n_occupied, upper_bound; \ + khint32_t *flags; \ + khkey_t *keys; \ + khval_t *vals; \ + } kh_##name##_t; + +#define __KHASH_PROTOTYPES(name, khkey_t, khval_t) \ + extern kh_##name##_t *kh_init_##name(void); \ + extern void kh_destroy_##name(kh_##name##_t *h); \ + extern void kh_clear_##name(kh_##name##_t *h); \ + extern khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key); \ + extern int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets); \ + extern khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret); \ + extern void kh_del_##name(kh_##name##_t *h, khint_t x); + +#define __KHASH_IMPL(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \ + SCOPE kh_##name##_t *kh_init_##name(void) { \ + return (kh_##name##_t*)xcalloc(1, sizeof(kh_##name##_t)); \ + } \ + SCOPE void kh_destroy_##name(kh_##name##_t *h) \ + { \ + if (h) { \ + free((void *)h->keys); free(h->flags); \ + free((void *)h->vals); \ + free(h); \ + } \ + } \ + SCOPE void kh_clear_##name(kh_##name##_t *h) \ + { \ + if (h && h->flags) { \ + memset(h->flags, 0xaa, __ac_fsize(h->n_buckets) * sizeof(khint32_t)); \ + h->size = h->n_occupied = 0; \ + } \ + } \ + SCOPE khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key) \ + { \ + if (h->n_buckets) { \ + khint_t k, i, last, mask, step = 0; \ + mask = h->n_buckets - 1; \ + k = __hash_func(key); i = k & mask; \ + last = i; \ + while (!__ac_isempty(h->flags, i) && (__ac_isdel(h->flags, i) || !__hash_equal(h->keys[i], key))) { \ + i = (i + (++step)) & mask; \ + if (i == last) return h->n_buckets; \ + } \ + return __ac_iseither(h->flags, i)? h->n_buckets : i; \ + } else return 0; \ + } \ + SCOPE int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets) \ + { /* This function uses 0.25*n_buckets bytes of working space instead of [sizeof(key_t+val_t)+.25]*n_buckets. */ \ + khint32_t *new_flags = NULL; \ + khint_t j = 1; \ + { \ + kroundup32(new_n_buckets); \ + if (new_n_buckets < 4) new_n_buckets = 4; \ + if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0; /* requested size is too small */ \ + else { /* hash table size to be changed (shrink or expand); rehash */ \ + new_flags = (khint32_t*)xmalloc(__ac_fsize(new_n_buckets) * sizeof(khint32_t)); \ + if (!new_flags) return -1; \ + memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \ + if (h->n_buckets < new_n_buckets) { /* expand */ \ + khkey_t *new_keys = (khkey_t*)xrealloc((void *)h->keys, new_n_buckets * sizeof(khkey_t)); \ + if (!new_keys) return -1; \ + h->keys = new_keys; \ + if (kh_is_map) { \ + khval_t *new_vals = (khval_t*)xrealloc((void *)h->vals, new_n_buckets * sizeof(khval_t)); \ + if (!new_vals) return -1; \ + h->vals = new_vals; \ + } \ + } /* otherwise shrink */ \ + } \ + } \ + if (j) { /* rehashing is needed */ \ + for (j = 0; j != h->n_buckets; ++j) { \ + if (__ac_iseither(h->flags, j) == 0) { \ + khkey_t key = h->keys[j]; \ + khval_t val; \ + khint_t new_mask; \ + new_mask = new_n_buckets - 1; \ + if (kh_is_map) val = h->vals[j]; \ + __ac_set_isdel_true(h->flags, j); \ + while (1) { /* kick-out process; sort of like in Cuckoo hashing */ \ + khint_t k, i, step = 0; \ + k = __hash_func(key); \ + i = k & new_mask; \ + while (!__ac_isempty(new_flags, i)) i = (i + (++step)) & new_mask; \ + __ac_set_isempty_false(new_flags, i); \ + if (i < h->n_buckets && __ac_iseither(h->flags, i) == 0) { /* kick out the existing element */ \ + { khkey_t tmp = h->keys[i]; h->keys[i] = key; key = tmp; } \ + if (kh_is_map) { khval_t tmp = h->vals[i]; h->vals[i] = val; val = tmp; } \ + __ac_set_isdel_true(h->flags, i); /* mark it as deleted in the old hash table */ \ + } else { /* write the element and jump out of the loop */ \ + h->keys[i] = key; \ + if (kh_is_map) h->vals[i] = val; \ + break; \ + } \ + } \ + } \ + } \ + if (h->n_buckets > new_n_buckets) { /* shrink the hash table */ \ + h->keys = (khkey_t*)xrealloc((void *)h->keys, new_n_buckets * sizeof(khkey_t)); \ + if (kh_is_map) h->vals = (khval_t*)xrealloc((void *)h->vals, new_n_buckets * sizeof(khval_t)); \ + } \ + free(h->flags); /* free the working space */ \ + h->flags = new_flags; \ + h->n_buckets = new_n_buckets; \ + h->n_occupied = h->size; \ + h->upper_bound = (khint_t)(h->n_buckets * __ac_HASH_UPPER + 0.5); \ + } \ + return 0; \ + } \ + SCOPE khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret) \ + { \ + khint_t x; \ + if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \ + if (h->n_buckets > (h->size<<1)) { \ + if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \ + *ret = -1; return h->n_buckets; \ + } \ + } else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \ + *ret = -1; return h->n_buckets; \ + } \ + } /* TODO: to implement automatically shrinking; resize() already support shrinking */ \ + { \ + khint_t k, i, site, last, mask = h->n_buckets - 1, step = 0; \ + x = site = h->n_buckets; k = __hash_func(key); i = k & mask; \ + if (__ac_isempty(h->flags, i)) x = i; /* for speed up */ \ + else { \ + last = i; \ + while (!__ac_isempty(h->flags, i) && (__ac_isdel(h->flags, i) || !__hash_equal(h->keys[i], key))) { \ + if (__ac_isdel(h->flags, i)) site = i; \ + i = (i + (++step)) & mask; \ + if (i == last) { x = site; break; } \ + } \ + if (x == h->n_buckets) { \ + if (__ac_isempty(h->flags, i) && site != h->n_buckets) x = site; \ + else x = i; \ + } \ + } \ + } \ + if (__ac_isempty(h->flags, x)) { /* not present at all */ \ + h->keys[x] = key; \ + __ac_set_isboth_false(h->flags, x); \ + ++h->size; ++h->n_occupied; \ + *ret = 1; \ + } else if (__ac_isdel(h->flags, x)) { /* deleted */ \ + h->keys[x] = key; \ + __ac_set_isboth_false(h->flags, x); \ + ++h->size; \ + *ret = 2; \ + } else *ret = 0; /* Don't touch h->keys[x] if present and not deleted */ \ + return x; \ + } \ + SCOPE void kh_del_##name(kh_##name##_t *h, khint_t x) \ + { \ + if (x != h->n_buckets && !__ac_iseither(h->flags, x)) { \ + __ac_set_isdel_true(h->flags, x); \ + --h->size; \ + } \ + } + +#define KHASH_DECLARE(name, khkey_t, khval_t) \ + __KHASH_TYPE(name, khkey_t, khval_t) \ + __KHASH_PROTOTYPES(name, khkey_t, khval_t) + +#define KHASH_INIT2(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \ + __KHASH_TYPE(name, khkey_t, khval_t) \ + __KHASH_IMPL(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) + +#define KHASH_INIT(name, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \ + KHASH_INIT2(name, static inline, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) + +/* Other convenient macros... */ + +/*! @function + @abstract Test whether a bucket contains data. + @param h Pointer to the hash table [khash_t(name)*] + @param x Iterator to the bucket [khint_t] + @return 1 if containing data; 0 otherwise [int] + */ +#define kh_exist(h, x) (!__ac_iseither((h)->flags, (x))) + +/*! @function + @abstract Get key given an iterator + @param h Pointer to the hash table [khash_t(name)*] + @param x Iterator to the bucket [khint_t] + @return Key [type of keys] + */ +#define kh_key(h, x) ((h)->keys[x]) + +/*! @function + @abstract Get value given an iterator + @param h Pointer to the hash table [khash_t(name)*] + @param x Iterator to the bucket [khint_t] + @return Value [type of values] + @discussion For hash sets, calling this results in segfault. + */ +#define kh_val(h, x) ((h)->vals[x]) + +/*! @function + @abstract Alias of kh_val() + */ +#define kh_value(h, x) ((h)->vals[x]) + +/*! @function + @abstract Get the start iterator + @param h Pointer to the hash table [khash_t(name)*] + @return The start iterator [khint_t] + */ +#define kh_begin(h) (khint_t)(0) + +/*! @function + @abstract Get the end iterator + @param h Pointer to the hash table [khash_t(name)*] + @return The end iterator [khint_t] + */ +#define kh_end(h) ((h)->n_buckets) + +/*! @function + @abstract Get the number of elements in the hash table + @param h Pointer to the hash table [khash_t(name)*] + @return Number of elements in the hash table [khint_t] + */ +#define kh_size(h) ((h)->size) + +/*! @function + @abstract Get the number of buckets in the hash table + @param h Pointer to the hash table [khash_t(name)*] + @return Number of buckets in the hash table [khint_t] + */ +#define kh_n_buckets(h) ((h)->n_buckets) + +/*! @function + @abstract Iterate over the entries in the hash table + @param h Pointer to the hash table [khash_t(name)*] + @param kvar Variable to which key will be assigned + @param vvar Variable to which value will be assigned + @param code Block of code to execute + */ +#define kh_foreach(h, kvar, vvar, code) { khint_t __i; \ + for (__i = kh_begin(h); __i != kh_end(h); ++__i) { \ + if (!kh_exist(h,__i)) continue; \ + (kvar) = kh_key(h,__i); \ + (vvar) = kh_val(h,__i); \ + code; \ + } } + +/*! @function + @abstract Iterate over the values in the hash table + @param h Pointer to the hash table [khash_t(name)*] + @param vvar Variable to which value will be assigned + @param code Block of code to execute + */ +#define kh_foreach_value(h, vvar, code) { khint_t __i; \ + for (__i = kh_begin(h); __i != kh_end(h); ++__i) { \ + if (!kh_exist(h,__i)) continue; \ + (vvar) = kh_val(h,__i); \ + code; \ + } } + +static inline khint_t __kh_oid_hash(const unsigned char *oid) +{ + khint_t hash; + memcpy(&hash, oid, sizeof(hash)); + return hash; +} + +#define __kh_oid_cmp(a, b) (hashcmp(a, b) == 0) + +KHASH_INIT(sha1, const unsigned char *, void *, 1, __kh_oid_hash, __kh_oid_cmp) +typedef kh_sha1_t khash_sha1; + +KHASH_INIT(sha1_pos, const unsigned char *, int, 1, __kh_oid_hash, __kh_oid_cmp) +typedef kh_sha1_pos_t khash_sha1_pos; + +#endif /* __AC_KHASH_H */ diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c new file mode 100644 index 0000000000..1218befaf2 --- /dev/null +++ b/pack-bitmap-write.c @@ -0,0 +1,552 @@ +#include "cache.h" +#include "commit.h" +#include "tag.h" +#include "diff.h" +#include "revision.h" +#include "list-objects.h" +#include "progress.h" +#include "pack-revindex.h" +#include "pack.h" +#include "pack-bitmap.h" +#include "sha1-lookup.h" +#include "pack-objects.h" + +struct bitmapped_commit { + struct commit *commit; + struct ewah_bitmap *bitmap; + struct ewah_bitmap *write_as; + int flags; + int xor_offset; + uint32_t commit_pos; +}; + +struct bitmap_writer { + struct ewah_bitmap *commits; + struct ewah_bitmap *trees; + struct ewah_bitmap *blobs; + struct ewah_bitmap *tags; + + khash_sha1 *bitmaps; + khash_sha1 *reused; + struct packing_data *to_pack; + + struct bitmapped_commit *selected; + unsigned int selected_nr, selected_alloc; + + struct progress *progress; + int show_progress; + unsigned char pack_checksum[20]; +}; + +static struct bitmap_writer writer; + +void bitmap_writer_show_progress(int show) +{ + writer.show_progress = show; +} + +/** + * Build the initial type index for the packfile + */ +void bitmap_writer_build_type_index(struct pack_idx_entry **index, + uint32_t index_nr) +{ + uint32_t i; + + writer.commits = ewah_new(); + writer.trees = ewah_new(); + writer.blobs = ewah_new(); + writer.tags = ewah_new(); + + for (i = 0; i < index_nr; ++i) { + struct object_entry *entry = (struct object_entry *)index[i]; + enum object_type real_type; + + entry->in_pack_pos = i; + + switch (entry->type) { + case OBJ_COMMIT: + case OBJ_TREE: + case OBJ_BLOB: + case OBJ_TAG: + real_type = entry->type; + break; + + default: + real_type = sha1_object_info(entry->idx.sha1, NULL); + break; + } + + switch (real_type) { + case OBJ_COMMIT: + ewah_set(writer.commits, i); + break; + + case OBJ_TREE: + ewah_set(writer.trees, i); + break; + + case OBJ_BLOB: + ewah_set(writer.blobs, i); + break; + + case OBJ_TAG: + ewah_set(writer.tags, i); + break; + + default: + die("Missing type information for %s (%d/%d)", + sha1_to_hex(entry->idx.sha1), real_type, entry->type); + } + } +} + +/** + * Compute the actual bitmaps + */ +static struct object **seen_objects; +static unsigned int seen_objects_nr, seen_objects_alloc; + +static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused) +{ + if (writer.selected_nr >= writer.selected_alloc) { + writer.selected_alloc = (writer.selected_alloc + 32) * 2; + writer.selected = xrealloc(writer.selected, + writer.selected_alloc * sizeof(struct bitmapped_commit)); + } + + writer.selected[writer.selected_nr].commit = commit; + writer.selected[writer.selected_nr].bitmap = reused; + writer.selected[writer.selected_nr].flags = 0; + + writer.selected_nr++; +} + +static inline void mark_as_seen(struct object *object) +{ + ALLOC_GROW(seen_objects, seen_objects_nr + 1, seen_objects_alloc); + seen_objects[seen_objects_nr++] = object; +} + +static inline void reset_all_seen(void) +{ + unsigned int i; + for (i = 0; i < seen_objects_nr; ++i) { + seen_objects[i]->flags &= ~(SEEN | ADDED | SHOWN); + } + seen_objects_nr = 0; +} + +static uint32_t find_object_pos(const unsigned char *sha1) +{ + struct object_entry *entry = packlist_find(writer.to_pack, sha1, NULL); + + if (!entry) { + die("Failed to write bitmap index. Packfile doesn't have full closure " + "(object %s is missing)", sha1_to_hex(sha1)); + } + + return entry->in_pack_pos; +} + +static void show_object(struct object *object, const struct name_path *path, + const char *last, void *data) +{ + struct bitmap *base = data; + bitmap_set(base, find_object_pos(object->sha1)); + mark_as_seen(object); +} + +static void show_commit(struct commit *commit, void *data) +{ + mark_as_seen((struct object *)commit); +} + +static int +add_to_include_set(struct bitmap *base, struct commit *commit) +{ + khiter_t hash_pos; + uint32_t bitmap_pos = find_object_pos(commit->object.sha1); + + if (bitmap_get(base, bitmap_pos)) + return 0; + + hash_pos = kh_get_sha1(writer.bitmaps, commit->object.sha1); + if (hash_pos < kh_end(writer.bitmaps)) { + struct bitmapped_commit *bc = kh_value(writer.bitmaps, hash_pos); + bitmap_or_ewah(base, bc->bitmap); + return 0; + } + + bitmap_set(base, bitmap_pos); + return 1; +} + +static int +should_include(struct commit *commit, void *_data) +{ + struct bitmap *base = _data; + + if (!add_to_include_set(base, commit)) { + struct commit_list *parent = commit->parents; + + mark_as_seen((struct object *)commit); + + while (parent) { + parent->item->object.flags |= SEEN; + mark_as_seen((struct object *)parent->item); + parent = parent->next; + } + + return 0; + } + + return 1; +} + +static void compute_xor_offsets(void) +{ + static const int MAX_XOR_OFFSET_SEARCH = 10; + + int i, next = 0; + + while (next < writer.selected_nr) { + struct bitmapped_commit *stored = &writer.selected[next]; + + int best_offset = 0; + struct ewah_bitmap *best_bitmap = stored->bitmap; + struct ewah_bitmap *test_xor; + + for (i = 1; i <= MAX_XOR_OFFSET_SEARCH; ++i) { + int curr = next - i; + + if (curr < 0) + break; + + test_xor = ewah_pool_new(); + ewah_xor(writer.selected[curr].bitmap, stored->bitmap, test_xor); + + if (test_xor->buffer_size < best_bitmap->buffer_size) { + if (best_bitmap != stored->bitmap) + ewah_pool_free(best_bitmap); + + best_bitmap = test_xor; + best_offset = i; + } else { + ewah_pool_free(test_xor); + } + } + + stored->xor_offset = best_offset; + stored->write_as = best_bitmap; + + next++; + } +} + +void bitmap_writer_build(struct packing_data *to_pack) +{ + static const double REUSE_BITMAP_THRESHOLD = 0.2; + + int i, reuse_after, need_reset; + struct bitmap *base = bitmap_new(); + struct rev_info revs; + + writer.bitmaps = kh_init_sha1(); + writer.to_pack = to_pack; + + if (writer.show_progress) + writer.progress = start_progress("Building bitmaps", writer.selected_nr); + + init_revisions(&revs, NULL); + revs.tag_objects = 1; + revs.tree_objects = 1; + revs.blob_objects = 1; + revs.no_walk = 0; + + revs.include_check = should_include; + reset_revision_walk(); + + reuse_after = writer.selected_nr * REUSE_BITMAP_THRESHOLD; + need_reset = 0; + + for (i = writer.selected_nr - 1; i >= 0; --i) { + struct bitmapped_commit *stored; + struct object *object; + + khiter_t hash_pos; + int hash_ret; + + stored = &writer.selected[i]; + object = (struct object *)stored->commit; + + if (stored->bitmap == NULL) { + if (i < writer.selected_nr - 1 && + (need_reset || + !in_merge_bases(writer.selected[i + 1].commit, + stored->commit))) { + bitmap_reset(base); + reset_all_seen(); + } + + add_pending_object(&revs, object, ""); + revs.include_check_data = base; + + if (prepare_revision_walk(&revs)) + die("revision walk setup failed"); + + traverse_commit_list(&revs, show_commit, show_object, base); + + revs.pending.nr = 0; + revs.pending.alloc = 0; + revs.pending.objects = NULL; + + stored->bitmap = bitmap_to_ewah(base); + need_reset = 0; + } else + need_reset = 1; + + if (i >= reuse_after) + stored->flags |= BITMAP_FLAG_REUSE; + + hash_pos = kh_put_sha1(writer.bitmaps, object->sha1, &hash_ret); + if (hash_ret == 0) + die("Duplicate entry when writing index: %s", + sha1_to_hex(object->sha1)); + + kh_value(writer.bitmaps, hash_pos) = stored; + display_progress(writer.progress, writer.selected_nr - i); + } + + bitmap_free(base); + stop_progress(&writer.progress); + + compute_xor_offsets(); +} + +/** + * Select the commits that will be bitmapped + */ +static inline unsigned int next_commit_index(unsigned int idx) +{ + static const unsigned int MIN_COMMITS = 100; + static const unsigned int MAX_COMMITS = 5000; + + static const unsigned int MUST_REGION = 100; + static const unsigned int MIN_REGION = 20000; + + unsigned int offset, next; + + if (idx <= MUST_REGION) + return 0; + + if (idx <= MIN_REGION) { + offset = idx - MUST_REGION; + return (offset < MIN_COMMITS) ? offset : MIN_COMMITS; + } + + offset = idx - MIN_REGION; + next = (offset < MAX_COMMITS) ? offset : MAX_COMMITS; + + return (next > MIN_COMMITS) ? next : MIN_COMMITS; +} + +static int date_compare(const void *_a, const void *_b) +{ + struct commit *a = *(struct commit **)_a; + struct commit *b = *(struct commit **)_b; + return (long)b->date - (long)a->date; +} + +void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack) +{ + if (prepare_bitmap_git() < 0) + return; + + writer.reused = kh_init_sha1(); + rebuild_existing_bitmaps(to_pack, writer.reused, writer.show_progress); +} + +static struct ewah_bitmap *find_reused_bitmap(const unsigned char *sha1) +{ + khiter_t hash_pos; + + if (!writer.reused) + return NULL; + + hash_pos = kh_get_sha1(writer.reused, sha1); + if (hash_pos >= kh_end(writer.reused)) + return NULL; + + return kh_value(writer.reused, hash_pos); +} + +void bitmap_writer_select_commits(struct commit **indexed_commits, + unsigned int indexed_commits_nr, + int max_bitmaps) +{ + unsigned int i = 0, j, next; + + qsort(indexed_commits, indexed_commits_nr, sizeof(indexed_commits[0]), + date_compare); + + if (writer.show_progress) + writer.progress = start_progress("Selecting bitmap commits", 0); + + if (indexed_commits_nr < 100) { + for (i = 0; i < indexed_commits_nr; ++i) + push_bitmapped_commit(indexed_commits[i], NULL); + return; + } + + for (;;) { + struct ewah_bitmap *reused_bitmap = NULL; + struct commit *chosen = NULL; + + next = next_commit_index(i); + + if (i + next >= indexed_commits_nr) + break; + + if (max_bitmaps > 0 && writer.selected_nr >= max_bitmaps) { + writer.selected_nr = max_bitmaps; + break; + } + + if (next == 0) { + chosen = indexed_commits[i]; + reused_bitmap = find_reused_bitmap(chosen->object.sha1); + } else { + chosen = indexed_commits[i + next]; + + for (j = 0; j <= next; ++j) { + struct commit *cm = indexed_commits[i + j]; + + reused_bitmap = find_reused_bitmap(cm->object.sha1); + if (reused_bitmap || (cm->object.flags & NEEDS_BITMAP) != 0) { + chosen = cm; + break; + } + + if (cm->parents && cm->parents->next) + chosen = cm; + } + } + + push_bitmapped_commit(chosen, reused_bitmap); + + i += next + 1; + display_progress(writer.progress, i); + } + + stop_progress(&writer.progress); +} + + +static int sha1write_ewah_helper(void *f, const void *buf, size_t len) +{ + /* sha1write will die on error */ + sha1write(f, buf, len); + return len; +} + +/** + * Write the bitmap index to disk + */ +static inline void dump_bitmap(struct sha1file *f, struct ewah_bitmap *bitmap) +{ + if (ewah_serialize_to(bitmap, sha1write_ewah_helper, f) < 0) + die("Failed to write bitmap index"); +} + +static const unsigned char *sha1_access(size_t pos, void *table) +{ + struct pack_idx_entry **index = table; + return index[pos]->sha1; +} + +static void write_selected_commits_v1(struct sha1file *f, + struct pack_idx_entry **index, + uint32_t index_nr) +{ + int i; + + for (i = 0; i < writer.selected_nr; ++i) { + struct bitmapped_commit *stored = &writer.selected[i]; + struct bitmap_disk_entry on_disk; + + int commit_pos = + sha1_pos(stored->commit->object.sha1, index, index_nr, sha1_access); + + if (commit_pos < 0) + die("BUG: trying to write commit not in index"); + + on_disk.object_pos = htonl(commit_pos); + on_disk.xor_offset = stored->xor_offset; + on_disk.flags = stored->flags; + + sha1write(f, &on_disk, sizeof(on_disk)); + dump_bitmap(f, stored->write_as); + } +} + +static void write_hash_cache(struct sha1file *f, + struct pack_idx_entry **index, + uint32_t index_nr) +{ + uint32_t i; + + for (i = 0; i < index_nr; ++i) { + struct object_entry *entry = (struct object_entry *)index[i]; + uint32_t hash_value = htonl(entry->hash); + sha1write(f, &hash_value, sizeof(hash_value)); + } +} + +void bitmap_writer_set_checksum(unsigned char *sha1) +{ + hashcpy(writer.pack_checksum, sha1); +} + +void bitmap_writer_finish(struct pack_idx_entry **index, + uint32_t index_nr, + const char *filename, + uint16_t options) +{ + static char tmp_file[PATH_MAX]; + static uint16_t default_version = 1; + static uint16_t flags = BITMAP_OPT_FULL_DAG; + struct sha1file *f; + + struct bitmap_disk_header header; + + int fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_bitmap_XXXXXX"); + + if (fd < 0) + die_errno("unable to create '%s'", tmp_file); + f = sha1fd(fd, tmp_file); + + memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)); + header.version = htons(default_version); + header.options = htons(flags | options); + header.entry_count = htonl(writer.selected_nr); + memcpy(header.checksum, writer.pack_checksum, 20); + + sha1write(f, &header, sizeof(header)); + dump_bitmap(f, writer.commits); + dump_bitmap(f, writer.trees); + dump_bitmap(f, writer.blobs); + dump_bitmap(f, writer.tags); + write_selected_commits_v1(f, index, index_nr); + + if (options & BITMAP_OPT_HASH_CACHE) + write_hash_cache(f, index, index_nr); + + sha1close(f, NULL, CSUM_FSYNC); + + if (adjust_shared_perm(tmp_file)) + die_errno("unable to make temporary bitmap file readable"); + + if (rename(tmp_file, filename)) + die_errno("unable to rename temporary bitmap file to '%s'", filename); +} diff --git a/pack-bitmap.c b/pack-bitmap.c new file mode 100644 index 0000000000..ae0b57b950 --- /dev/null +++ b/pack-bitmap.c @@ -0,0 +1,1073 @@ +#include "cache.h" +#include "commit.h" +#include "tag.h" +#include "diff.h" +#include "revision.h" +#include "progress.h" +#include "list-objects.h" +#include "pack.h" +#include "pack-bitmap.h" +#include "pack-revindex.h" +#include "pack-objects.h" + +/* + * An entry on the bitmap index, representing the bitmap for a given + * commit. + */ +struct stored_bitmap { + unsigned char sha1[20]; + struct ewah_bitmap *root; + struct stored_bitmap *xor; + int flags; +}; + +/* + * The currently active bitmap index. By design, repositories only have + * a single bitmap index available (the index for the biggest packfile in + * the repository), since bitmap indexes need full closure. + * + * If there is more than one bitmap index available (e.g. because of alternates), + * the active bitmap index is the largest one. + */ +static struct bitmap_index { + /* Packfile to which this bitmap index belongs to */ + struct packed_git *pack; + + /* reverse index for the packfile */ + struct pack_revindex *reverse_index; + + /* + * Mark the first `reuse_objects` in the packfile as reused: + * they will be sent as-is without using them for repacking + * calculations + */ + uint32_t reuse_objects; + + /* mmapped buffer of the whole bitmap index */ + unsigned char *map; + size_t map_size; /* size of the mmaped buffer */ + size_t map_pos; /* current position when loading the index */ + + /* + * Type indexes. + * + * Each bitmap marks which objects in the packfile are of the given + * type. This provides type information when yielding the objects from + * the packfile during a walk, which allows for better delta bases. + */ + struct ewah_bitmap *commits; + struct ewah_bitmap *trees; + struct ewah_bitmap *blobs; + struct ewah_bitmap *tags; + + /* Map from SHA1 -> `stored_bitmap` for all the bitmapped comits */ + khash_sha1 *bitmaps; + + /* Number of bitmapped commits */ + uint32_t entry_count; + + /* Name-hash cache (or NULL if not present). */ + uint32_t *hashes; + + /* + * Extended index. + * + * When trying to perform bitmap operations with objects that are not + * packed in `pack`, these objects are added to this "fake index" and + * are assumed to appear at the end of the packfile for all operations + */ + struct eindex { + struct object **objects; + uint32_t *hashes; + uint32_t count, alloc; + khash_sha1_pos *positions; + } ext_index; + + /* Bitmap result of the last performed walk */ + struct bitmap *result; + + /* Version of the bitmap index */ + unsigned int version; + + unsigned loaded : 1; + +} bitmap_git; + +static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st) +{ + struct ewah_bitmap *parent; + struct ewah_bitmap *composed; + + if (st->xor == NULL) + return st->root; + + composed = ewah_pool_new(); + parent = lookup_stored_bitmap(st->xor); + ewah_xor(st->root, parent, composed); + + ewah_pool_free(st->root); + st->root = composed; + st->xor = NULL; + + return composed; +} + +/* + * Read a bitmap from the current read position on the mmaped + * index, and increase the read position accordingly + */ +static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) +{ + struct ewah_bitmap *b = ewah_pool_new(); + + int bitmap_size = ewah_read_mmap(b, + index->map + index->map_pos, + index->map_size - index->map_pos); + + if (bitmap_size < 0) { + error("Failed to load bitmap index (corrupted?)"); + ewah_pool_free(b); + return NULL; + } + + index->map_pos += bitmap_size; + return b; +} + +static int load_bitmap_header(struct bitmap_index *index) +{ + struct bitmap_disk_header *header = (void *)index->map; + + if (index->map_size < sizeof(*header) + 20) + return error("Corrupted bitmap index (missing header data)"); + + if (memcmp(header->magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)) != 0) + return error("Corrupted bitmap index file (wrong header)"); + + index->version = ntohs(header->version); + if (index->version != 1) + return error("Unsupported version for bitmap index file (%d)", index->version); + + /* Parse known bitmap format options */ + { + uint32_t flags = ntohs(header->options); + + if ((flags & BITMAP_OPT_FULL_DAG) == 0) + return error("Unsupported options for bitmap index file " + "(Git requires BITMAP_OPT_FULL_DAG)"); + + if (flags & BITMAP_OPT_HASH_CACHE) { + unsigned char *end = index->map + index->map_size - 20; + index->hashes = ((uint32_t *)end) - index->pack->num_objects; + } + } + + index->entry_count = ntohl(header->entry_count); + index->map_pos += sizeof(*header); + return 0; +} + +static struct stored_bitmap *store_bitmap(struct bitmap_index *index, + struct ewah_bitmap *root, + const unsigned char *sha1, + struct stored_bitmap *xor_with, + int flags) +{ + struct stored_bitmap *stored; + khiter_t hash_pos; + int ret; + + stored = xmalloc(sizeof(struct stored_bitmap)); + stored->root = root; + stored->xor = xor_with; + stored->flags = flags; + hashcpy(stored->sha1, sha1); + + hash_pos = kh_put_sha1(index->bitmaps, stored->sha1, &ret); + + /* a 0 return code means the insertion succeeded with no changes, + * because the SHA1 already existed on the map. this is bad, there + * shouldn't be duplicated commits in the index */ + if (ret == 0) { + error("Duplicate entry in bitmap index: %s", sha1_to_hex(sha1)); + return NULL; + } + + kh_value(index->bitmaps, hash_pos) = stored; + return stored; +} + +static int load_bitmap_entries_v1(struct bitmap_index *index) +{ + static const size_t MAX_XOR_OFFSET = 160; + + uint32_t i; + struct stored_bitmap **recent_bitmaps; + struct bitmap_disk_entry *entry; + + recent_bitmaps = xcalloc(MAX_XOR_OFFSET, sizeof(struct stored_bitmap)); + + for (i = 0; i < index->entry_count; ++i) { + int xor_offset, flags; + struct ewah_bitmap *bitmap = NULL; + struct stored_bitmap *xor_bitmap = NULL; + uint32_t commit_idx_pos; + const unsigned char *sha1; + + entry = (struct bitmap_disk_entry *)(index->map + index->map_pos); + index->map_pos += sizeof(struct bitmap_disk_entry); + + commit_idx_pos = ntohl(entry->object_pos); + sha1 = nth_packed_object_sha1(index->pack, commit_idx_pos); + + xor_offset = (int)entry->xor_offset; + flags = (int)entry->flags; + + bitmap = read_bitmap_1(index); + if (!bitmap) + return -1; + + if (xor_offset > MAX_XOR_OFFSET || xor_offset > i) + return error("Corrupted bitmap pack index"); + + if (xor_offset > 0) { + xor_bitmap = recent_bitmaps[(i - xor_offset) % MAX_XOR_OFFSET]; + + if (xor_bitmap == NULL) + return error("Invalid XOR offset in bitmap pack index"); + } + + recent_bitmaps[i % MAX_XOR_OFFSET] = store_bitmap( + index, bitmap, sha1, xor_bitmap, flags); + } + + return 0; +} + +static int open_pack_bitmap_1(struct packed_git *packfile) +{ + int fd; + struct stat st; + char *idx_name; + + if (open_pack_index(packfile)) + return -1; + + idx_name = pack_bitmap_filename(packfile); + fd = git_open_noatime(idx_name); + free(idx_name); + + if (fd < 0) + return -1; + + if (fstat(fd, &st)) { + close(fd); + return -1; + } + + if (bitmap_git.pack) { + warning("ignoring extra bitmap file: %s", packfile->pack_name); + close(fd); + return -1; + } + + bitmap_git.pack = packfile; + bitmap_git.map_size = xsize_t(st.st_size); + bitmap_git.map = xmmap(NULL, bitmap_git.map_size, PROT_READ, MAP_PRIVATE, fd, 0); + bitmap_git.map_pos = 0; + close(fd); + + if (load_bitmap_header(&bitmap_git) < 0) { + munmap(bitmap_git.map, bitmap_git.map_size); + bitmap_git.map = NULL; + bitmap_git.map_size = 0; + return -1; + } + + return 0; +} + +static int load_pack_bitmap(void) +{ + assert(bitmap_git.map && !bitmap_git.loaded); + + bitmap_git.bitmaps = kh_init_sha1(); + bitmap_git.ext_index.positions = kh_init_sha1_pos(); + bitmap_git.reverse_index = revindex_for_pack(bitmap_git.pack); + + if (!(bitmap_git.commits = read_bitmap_1(&bitmap_git)) || + !(bitmap_git.trees = read_bitmap_1(&bitmap_git)) || + !(bitmap_git.blobs = read_bitmap_1(&bitmap_git)) || + !(bitmap_git.tags = read_bitmap_1(&bitmap_git))) + goto failed; + + if (load_bitmap_entries_v1(&bitmap_git) < 0) + goto failed; + + bitmap_git.loaded = 1; + return 0; + +failed: + munmap(bitmap_git.map, bitmap_git.map_size); + bitmap_git.map = NULL; + bitmap_git.map_size = 0; + return -1; +} + +char *pack_bitmap_filename(struct packed_git *p) +{ + char *idx_name; + int len; + + len = strlen(p->pack_name) - strlen(".pack"); + idx_name = xmalloc(len + strlen(".bitmap") + 1); + + memcpy(idx_name, p->pack_name, len); + memcpy(idx_name + len, ".bitmap", strlen(".bitmap") + 1); + + return idx_name; +} + +static int open_pack_bitmap(void) +{ + struct packed_git *p; + int ret = -1; + + assert(!bitmap_git.map && !bitmap_git.loaded); + + prepare_packed_git(); + for (p = packed_git; p; p = p->next) { + if (open_pack_bitmap_1(p) == 0) + ret = 0; + } + + return ret; +} + +int prepare_bitmap_git(void) +{ + if (bitmap_git.loaded) + return 0; + + if (!open_pack_bitmap()) + return load_pack_bitmap(); + + return -1; +} + +struct include_data { + struct bitmap *base; + struct bitmap *seen; +}; + +static inline int bitmap_position_extended(const unsigned char *sha1) +{ + khash_sha1_pos *positions = bitmap_git.ext_index.positions; + khiter_t pos = kh_get_sha1_pos(positions, sha1); + + if (pos < kh_end(positions)) { + int bitmap_pos = kh_value(positions, pos); + return bitmap_pos + bitmap_git.pack->num_objects; + } + + return -1; +} + +static inline int bitmap_position_packfile(const unsigned char *sha1) +{ + off_t offset = find_pack_entry_one(sha1, bitmap_git.pack); + if (!offset) + return -1; + + return find_revindex_position(bitmap_git.reverse_index, offset); +} + +static int bitmap_position(const unsigned char *sha1) +{ + int pos = bitmap_position_packfile(sha1); + return (pos >= 0) ? pos : bitmap_position_extended(sha1); +} + +static int ext_index_add_object(struct object *object, const char *name) +{ + struct eindex *eindex = &bitmap_git.ext_index; + + khiter_t hash_pos; + int hash_ret; + int bitmap_pos; + + hash_pos = kh_put_sha1_pos(eindex->positions, object->sha1, &hash_ret); + if (hash_ret > 0) { + if (eindex->count >= eindex->alloc) { + eindex->alloc = (eindex->alloc + 16) * 3 / 2; + eindex->objects = xrealloc(eindex->objects, + eindex->alloc * sizeof(struct object *)); + eindex->hashes = xrealloc(eindex->hashes, + eindex->alloc * sizeof(uint32_t)); + } + + bitmap_pos = eindex->count; + eindex->objects[eindex->count] = object; + eindex->hashes[eindex->count] = pack_name_hash(name); + kh_value(eindex->positions, hash_pos) = bitmap_pos; + eindex->count++; + } else { + bitmap_pos = kh_value(eindex->positions, hash_pos); + } + + return bitmap_pos + bitmap_git.pack->num_objects; +} + +static void show_object(struct object *object, const struct name_path *path, + const char *last, void *data) +{ + struct bitmap *base = data; + int bitmap_pos; + + bitmap_pos = bitmap_position(object->sha1); + + if (bitmap_pos < 0) { + char *name = path_name(path, last); + bitmap_pos = ext_index_add_object(object, name); + free(name); + } + + bitmap_set(base, bitmap_pos); +} + +static void show_commit(struct commit *commit, void *data) +{ +} + +static int add_to_include_set(struct include_data *data, + const unsigned char *sha1, + int bitmap_pos) +{ + khiter_t hash_pos; + + if (data->seen && bitmap_get(data->seen, bitmap_pos)) + return 0; + + if (bitmap_get(data->base, bitmap_pos)) + return 0; + + hash_pos = kh_get_sha1(bitmap_git.bitmaps, sha1); + if (hash_pos < kh_end(bitmap_git.bitmaps)) { + struct stored_bitmap *st = kh_value(bitmap_git.bitmaps, hash_pos); + bitmap_or_ewah(data->base, lookup_stored_bitmap(st)); + return 0; + } + + bitmap_set(data->base, bitmap_pos); + return 1; +} + +static int should_include(struct commit *commit, void *_data) +{ + struct include_data *data = _data; + int bitmap_pos; + + bitmap_pos = bitmap_position(commit->object.sha1); + if (bitmap_pos < 0) + bitmap_pos = ext_index_add_object((struct object *)commit, NULL); + + if (!add_to_include_set(data, commit->object.sha1, bitmap_pos)) { + struct commit_list *parent = commit->parents; + + while (parent) { + parent->item->object.flags |= SEEN; + parent = parent->next; + } + + return 0; + } + + return 1; +} + +static struct bitmap *find_objects(struct rev_info *revs, + struct object_list *roots, + struct bitmap *seen) +{ + struct bitmap *base = NULL; + int needs_walk = 0; + + struct object_list *not_mapped = NULL; + + /* + * Go through all the roots for the walk. The ones that have bitmaps + * on the bitmap index will be `or`ed together to form an initial + * global reachability analysis. + * + * The ones without bitmaps in the index will be stored in the + * `not_mapped_list` for further processing. + */ + while (roots) { + struct object *object = roots->item; + roots = roots->next; + + if (object->type == OBJ_COMMIT) { + khiter_t pos = kh_get_sha1(bitmap_git.bitmaps, object->sha1); + + if (pos < kh_end(bitmap_git.bitmaps)) { + struct stored_bitmap *st = kh_value(bitmap_git.bitmaps, pos); + struct ewah_bitmap *or_with = lookup_stored_bitmap(st); + + if (base == NULL) + base = ewah_to_bitmap(or_with); + else + bitmap_or_ewah(base, or_with); + + object->flags |= SEEN; + continue; + } + } + + object_list_insert(object, ¬_mapped); + } + + /* + * Best case scenario: We found bitmaps for all the roots, + * so the resulting `or` bitmap has the full reachability analysis + */ + if (not_mapped == NULL) + return base; + + roots = not_mapped; + + /* + * Let's iterate through all the roots that don't have bitmaps to + * check if we can determine them to be reachable from the existing + * global bitmap. + * + * If we cannot find them in the existing global bitmap, we'll need + * to push them to an actual walk and run it until we can confirm + * they are reachable + */ + while (roots) { + struct object *object = roots->item; + int pos; + + roots = roots->next; + pos = bitmap_position(object->sha1); + + if (pos < 0 || base == NULL || !bitmap_get(base, pos)) { + object->flags &= ~UNINTERESTING; + add_pending_object(revs, object, ""); + needs_walk = 1; + } else { + object->flags |= SEEN; + } + } + + if (needs_walk) { + struct include_data incdata; + + if (base == NULL) + base = bitmap_new(); + + incdata.base = base; + incdata.seen = seen; + + revs->include_check = should_include; + revs->include_check_data = &incdata; + + if (prepare_revision_walk(revs)) + die("revision walk setup failed"); + + traverse_commit_list(revs, show_commit, show_object, base); + } + + return base; +} + +static void show_extended_objects(struct bitmap *objects, + show_reachable_fn show_reach) +{ + struct eindex *eindex = &bitmap_git.ext_index; + uint32_t i; + + for (i = 0; i < eindex->count; ++i) { + struct object *obj; + + if (!bitmap_get(objects, bitmap_git.pack->num_objects + i)) + continue; + + obj = eindex->objects[i]; + show_reach(obj->sha1, obj->type, 0, eindex->hashes[i], NULL, 0); + } +} + +static void show_objects_for_type( + struct bitmap *objects, + struct ewah_bitmap *type_filter, + enum object_type object_type, + show_reachable_fn show_reach) +{ + size_t pos = 0, i = 0; + uint32_t offset; + + struct ewah_iterator it; + eword_t filter; + + if (bitmap_git.reuse_objects == bitmap_git.pack->num_objects) + return; + + ewah_iterator_init(&it, type_filter); + + while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) { + eword_t word = objects->words[i] & filter; + + for (offset = 0; offset < BITS_IN_WORD; ++offset) { + const unsigned char *sha1; + struct revindex_entry *entry; + uint32_t hash = 0; + + if ((word >> offset) == 0) + break; + + offset += ewah_bit_ctz64(word >> offset); + + if (pos + offset < bitmap_git.reuse_objects) + continue; + + entry = &bitmap_git.reverse_index->revindex[pos + offset]; + sha1 = nth_packed_object_sha1(bitmap_git.pack, entry->nr); + + if (bitmap_git.hashes) + hash = ntohl(bitmap_git.hashes[entry->nr]); + + show_reach(sha1, object_type, 0, hash, bitmap_git.pack, entry->offset); + } + + pos += BITS_IN_WORD; + i++; + } +} + +static int in_bitmapped_pack(struct object_list *roots) +{ + while (roots) { + struct object *object = roots->item; + roots = roots->next; + + if (find_pack_entry_one(object->sha1, bitmap_git.pack) > 0) + return 1; + } + + return 0; +} + +int prepare_bitmap_walk(struct rev_info *revs) +{ + unsigned int i; + unsigned int pending_nr = revs->pending.nr; + struct object_array_entry *pending_e = revs->pending.objects; + + struct object_list *wants = NULL; + struct object_list *haves = NULL; + + struct bitmap *wants_bitmap = NULL; + struct bitmap *haves_bitmap = NULL; + + if (!bitmap_git.loaded) { + /* try to open a bitmapped pack, but don't parse it yet + * because we may not need to use it */ + if (open_pack_bitmap() < 0) + return -1; + } + + for (i = 0; i < pending_nr; ++i) { + struct object *object = pending_e[i].item; + + if (object->type == OBJ_NONE) + parse_object_or_die(object->sha1, NULL); + + while (object->type == OBJ_TAG) { + struct tag *tag = (struct tag *) object; + + if (object->flags & UNINTERESTING) + object_list_insert(object, &haves); + else + object_list_insert(object, &wants); + + if (!tag->tagged) + die("bad tag"); + object = parse_object_or_die(tag->tagged->sha1, NULL); + } + + if (object->flags & UNINTERESTING) + object_list_insert(object, &haves); + else + object_list_insert(object, &wants); + } + + /* + * if we have a HAVES list, but none of those haves is contained + * in the packfile that has a bitmap, we don't have anything to + * optimize here + */ + if (haves && !in_bitmapped_pack(haves)) + return -1; + + /* if we don't want anything, we're done here */ + if (!wants) + return -1; + + /* + * now we're going to use bitmaps, so load the actual bitmap entries + * from disk. this is the point of no return; after this the rev_list + * becomes invalidated and we must perform the revwalk through bitmaps + */ + if (!bitmap_git.loaded && load_pack_bitmap() < 0) + return -1; + + revs->pending.nr = 0; + revs->pending.alloc = 0; + revs->pending.objects = NULL; + + if (haves) { + haves_bitmap = find_objects(revs, haves, NULL); + reset_revision_walk(); + + if (haves_bitmap == NULL) + die("BUG: failed to perform bitmap walk"); + } + + wants_bitmap = find_objects(revs, wants, haves_bitmap); + + if (!wants_bitmap) + die("BUG: failed to perform bitmap walk"); + + if (haves_bitmap) + bitmap_and_not(wants_bitmap, haves_bitmap); + + bitmap_git.result = wants_bitmap; + + bitmap_free(haves_bitmap); + return 0; +} + +int reuse_partial_packfile_from_bitmap(struct packed_git **packfile, + uint32_t *entries, + off_t *up_to) +{ + /* + * Reuse the packfile content if we need more than + * 90% of its objects + */ + static const double REUSE_PERCENT = 0.9; + + struct bitmap *result = bitmap_git.result; + uint32_t reuse_threshold; + uint32_t i, reuse_objects = 0; + + assert(result); + + for (i = 0; i < result->word_alloc; ++i) { + if (result->words[i] != (eword_t)~0) { + reuse_objects += ewah_bit_ctz64(~result->words[i]); + break; + } + + reuse_objects += BITS_IN_WORD; + } + +#ifdef GIT_BITMAP_DEBUG + { + const unsigned char *sha1; + struct revindex_entry *entry; + + entry = &bitmap_git.reverse_index->revindex[reuse_objects]; + sha1 = nth_packed_object_sha1(bitmap_git.pack, entry->nr); + + fprintf(stderr, "Failed to reuse at %d (%016llx)\n", + reuse_objects, result->words[i]); + fprintf(stderr, " %s\n", sha1_to_hex(sha1)); + } +#endif + + if (!reuse_objects) + return -1; + + if (reuse_objects >= bitmap_git.pack->num_objects) { + bitmap_git.reuse_objects = *entries = bitmap_git.pack->num_objects; + *up_to = -1; /* reuse the full pack */ + *packfile = bitmap_git.pack; + return 0; + } + + reuse_threshold = bitmap_popcount(bitmap_git.result) * REUSE_PERCENT; + + if (reuse_objects < reuse_threshold) + return -1; + + bitmap_git.reuse_objects = *entries = reuse_objects; + *up_to = bitmap_git.reverse_index->revindex[reuse_objects].offset; + *packfile = bitmap_git.pack; + + return 0; +} + +void traverse_bitmap_commit_list(show_reachable_fn show_reachable) +{ + assert(bitmap_git.result); + + show_objects_for_type(bitmap_git.result, bitmap_git.commits, + OBJ_COMMIT, show_reachable); + show_objects_for_type(bitmap_git.result, bitmap_git.trees, + OBJ_TREE, show_reachable); + show_objects_for_type(bitmap_git.result, bitmap_git.blobs, + OBJ_BLOB, show_reachable); + show_objects_for_type(bitmap_git.result, bitmap_git.tags, + OBJ_TAG, show_reachable); + + show_extended_objects(bitmap_git.result, show_reachable); + + bitmap_free(bitmap_git.result); + bitmap_git.result = NULL; +} + +static uint32_t count_object_type(struct bitmap *objects, + enum object_type type) +{ + struct eindex *eindex = &bitmap_git.ext_index; + + uint32_t i = 0, count = 0; + struct ewah_iterator it; + eword_t filter; + + switch (type) { + case OBJ_COMMIT: + ewah_iterator_init(&it, bitmap_git.commits); + break; + + case OBJ_TREE: + ewah_iterator_init(&it, bitmap_git.trees); + break; + + case OBJ_BLOB: + ewah_iterator_init(&it, bitmap_git.blobs); + break; + + case OBJ_TAG: + ewah_iterator_init(&it, bitmap_git.tags); + break; + + default: + return 0; + } + + while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) { + eword_t word = objects->words[i++] & filter; + count += ewah_bit_popcount64(word); + } + + for (i = 0; i < eindex->count; ++i) { + if (eindex->objects[i]->type == type && + bitmap_get(objects, bitmap_git.pack->num_objects + i)) + count++; + } + + return count; +} + +void count_bitmap_commit_list(uint32_t *commits, uint32_t *trees, + uint32_t *blobs, uint32_t *tags) +{ + assert(bitmap_git.result); + + if (commits) + *commits = count_object_type(bitmap_git.result, OBJ_COMMIT); + + if (trees) + *trees = count_object_type(bitmap_git.result, OBJ_TREE); + + if (blobs) + *blobs = count_object_type(bitmap_git.result, OBJ_BLOB); + + if (tags) + *tags = count_object_type(bitmap_git.result, OBJ_TAG); +} + +struct bitmap_test_data { + struct bitmap *base; + struct progress *prg; + size_t seen; +}; + +static void test_show_object(struct object *object, + const struct name_path *path, + const char *last, void *data) +{ + struct bitmap_test_data *tdata = data; + int bitmap_pos; + + bitmap_pos = bitmap_position(object->sha1); + if (bitmap_pos < 0) + die("Object not in bitmap: %s\n", sha1_to_hex(object->sha1)); + + bitmap_set(tdata->base, bitmap_pos); + display_progress(tdata->prg, ++tdata->seen); +} + +static void test_show_commit(struct commit *commit, void *data) +{ + struct bitmap_test_data *tdata = data; + int bitmap_pos; + + bitmap_pos = bitmap_position(commit->object.sha1); + if (bitmap_pos < 0) + die("Object not in bitmap: %s\n", sha1_to_hex(commit->object.sha1)); + + bitmap_set(tdata->base, bitmap_pos); + display_progress(tdata->prg, ++tdata->seen); +} + +void test_bitmap_walk(struct rev_info *revs) +{ + struct object *root; + struct bitmap *result = NULL; + khiter_t pos; + size_t result_popcnt; + struct bitmap_test_data tdata; + + if (prepare_bitmap_git()) + die("failed to load bitmap indexes"); + + if (revs->pending.nr != 1) + die("you must specify exactly one commit to test"); + + fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n", + bitmap_git.version, bitmap_git.entry_count); + + root = revs->pending.objects[0].item; + pos = kh_get_sha1(bitmap_git.bitmaps, root->sha1); + + if (pos < kh_end(bitmap_git.bitmaps)) { + struct stored_bitmap *st = kh_value(bitmap_git.bitmaps, pos); + struct ewah_bitmap *bm = lookup_stored_bitmap(st); + + fprintf(stderr, "Found bitmap for %s. %d bits / %08x checksum\n", + sha1_to_hex(root->sha1), (int)bm->bit_size, ewah_checksum(bm)); + + result = ewah_to_bitmap(bm); + } + + if (result == NULL) + die("Commit %s doesn't have an indexed bitmap", sha1_to_hex(root->sha1)); + + revs->tag_objects = 1; + revs->tree_objects = 1; + revs->blob_objects = 1; + + result_popcnt = bitmap_popcount(result); + + if (prepare_revision_walk(revs)) + die("revision walk setup failed"); + + tdata.base = bitmap_new(); + tdata.prg = start_progress("Verifying bitmap entries", result_popcnt); + tdata.seen = 0; + + traverse_commit_list(revs, &test_show_commit, &test_show_object, &tdata); + + stop_progress(&tdata.prg); + + if (bitmap_equals(result, tdata.base)) + fprintf(stderr, "OK!\n"); + else + fprintf(stderr, "Mismatch!\n"); +} + +static int rebuild_bitmap(uint32_t *reposition, + struct ewah_bitmap *source, + struct bitmap *dest) +{ + uint32_t pos = 0; + struct ewah_iterator it; + eword_t word; + + ewah_iterator_init(&it, source); + + while (ewah_iterator_next(&word, &it)) { + uint32_t offset, bit_pos; + + for (offset = 0; offset < BITS_IN_WORD; ++offset) { + if ((word >> offset) == 0) + break; + + offset += ewah_bit_ctz64(word >> offset); + + bit_pos = reposition[pos + offset]; + if (bit_pos > 0) + bitmap_set(dest, bit_pos - 1); + else /* can't reuse, we don't have the object */ + return -1; + } + + pos += BITS_IN_WORD; + } + return 0; +} + +int rebuild_existing_bitmaps(struct packing_data *mapping, + khash_sha1 *reused_bitmaps, + int show_progress) +{ + uint32_t i, num_objects; + uint32_t *reposition; + struct bitmap *rebuild; + struct stored_bitmap *stored; + struct progress *progress = NULL; + + khiter_t hash_pos; + int hash_ret; + + if (prepare_bitmap_git() < 0) + return -1; + + num_objects = bitmap_git.pack->num_objects; + reposition = xcalloc(num_objects, sizeof(uint32_t)); + + for (i = 0; i < num_objects; ++i) { + const unsigned char *sha1; + struct revindex_entry *entry; + struct object_entry *oe; + + entry = &bitmap_git.reverse_index->revindex[i]; + sha1 = nth_packed_object_sha1(bitmap_git.pack, entry->nr); + oe = packlist_find(mapping, sha1, NULL); + + if (oe) + reposition[i] = oe->in_pack_pos + 1; + } + + rebuild = bitmap_new(); + i = 0; + + if (show_progress) + progress = start_progress("Reusing bitmaps", 0); + + kh_foreach_value(bitmap_git.bitmaps, stored, { + if (stored->flags & BITMAP_FLAG_REUSE) { + if (!rebuild_bitmap(reposition, + lookup_stored_bitmap(stored), + rebuild)) { + hash_pos = kh_put_sha1(reused_bitmaps, + stored->sha1, + &hash_ret); + kh_value(reused_bitmaps, hash_pos) = + bitmap_to_ewah(rebuild); + } + bitmap_reset(rebuild); + display_progress(progress, ++i); + } + }); + + stop_progress(&progress); + + free(reposition); + bitmap_free(rebuild); + return 0; +} diff --git a/pack-bitmap.h b/pack-bitmap.h new file mode 100644 index 0000000000..8b7f4e9f0d --- /dev/null +++ b/pack-bitmap.h @@ -0,0 +1,64 @@ +#ifndef PACK_BITMAP_H +#define PACK_BITMAP_H + +#include "ewah/ewok.h" +#include "khash.h" +#include "pack-objects.h" + +struct bitmap_disk_entry { + uint32_t object_pos; + uint8_t xor_offset; + uint8_t flags; +} __attribute__((packed)); + +struct bitmap_disk_header { + char magic[4]; + uint16_t version; + uint16_t options; + uint32_t entry_count; + unsigned char checksum[20]; +}; + +static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'}; + +#define NEEDS_BITMAP (1u<<22) + +enum pack_bitmap_opts { + BITMAP_OPT_FULL_DAG = 1, + BITMAP_OPT_HASH_CACHE = 4, +}; + +enum pack_bitmap_flags { + BITMAP_FLAG_REUSE = 0x1 +}; + +typedef int (*show_reachable_fn)( + const unsigned char *sha1, + enum object_type type, + int flags, + uint32_t hash, + struct packed_git *found_pack, + off_t found_offset); + +int prepare_bitmap_git(void); +void count_bitmap_commit_list(uint32_t *commits, uint32_t *trees, uint32_t *blobs, uint32_t *tags); +void traverse_bitmap_commit_list(show_reachable_fn show_reachable); +void test_bitmap_walk(struct rev_info *revs); +char *pack_bitmap_filename(struct packed_git *p); +int prepare_bitmap_walk(struct rev_info *revs); +int reuse_partial_packfile_from_bitmap(struct packed_git **packfile, uint32_t *entries, off_t *up_to); +int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bitmaps, int show_progress); + +void bitmap_writer_show_progress(int show); +void bitmap_writer_set_checksum(unsigned char *sha1); +void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t index_nr); +void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack); +void bitmap_writer_select_commits(struct commit **indexed_commits, + unsigned int indexed_commits_nr, int max_bitmaps); +void bitmap_writer_build(struct packing_data *to_pack); +void bitmap_writer_finish(struct pack_idx_entry **index, + uint32_t index_nr, + const char *filename, + uint16_t options); + +#endif diff --git a/pack-objects.c b/pack-objects.c new file mode 100644 index 0000000000..d01d851ce9 --- /dev/null +++ b/pack-objects.c @@ -0,0 +1,111 @@ +#include "cache.h" +#include "object.h" +#include "pack.h" +#include "pack-objects.h" + +static uint32_t locate_object_entry_hash(struct packing_data *pdata, + const unsigned char *sha1, + int *found) +{ + uint32_t i, hash, mask = (pdata->index_size - 1); + + memcpy(&hash, sha1, sizeof(uint32_t)); + i = hash & mask; + + while (pdata->index[i] > 0) { + uint32_t pos = pdata->index[i] - 1; + + if (!hashcmp(sha1, pdata->objects[pos].idx.sha1)) { + *found = 1; + return i; + } + + i = (i + 1) & mask; + } + + *found = 0; + return i; +} + +static inline uint32_t closest_pow2(uint32_t v) +{ + v = v - 1; + v |= v >> 1; + v |= v >> 2; + v |= v >> 4; + v |= v >> 8; + v |= v >> 16; + return v + 1; +} + +static void rehash_objects(struct packing_data *pdata) +{ + uint32_t i; + struct object_entry *entry; + + pdata->index_size = closest_pow2(pdata->nr_objects * 3); + if (pdata->index_size < 1024) + pdata->index_size = 1024; + + pdata->index = xrealloc(pdata->index, sizeof(uint32_t) * pdata->index_size); + memset(pdata->index, 0, sizeof(int) * pdata->index_size); + + entry = pdata->objects; + + for (i = 0; i < pdata->nr_objects; i++) { + int found; + uint32_t ix = locate_object_entry_hash(pdata, entry->idx.sha1, &found); + + if (found) + die("BUG: Duplicate object in hash"); + + pdata->index[ix] = i + 1; + entry++; + } +} + +struct object_entry *packlist_find(struct packing_data *pdata, + const unsigned char *sha1, + uint32_t *index_pos) +{ + uint32_t i; + int found; + + if (!pdata->index_size) + return NULL; + + i = locate_object_entry_hash(pdata, sha1, &found); + + if (index_pos) + *index_pos = i; + + if (!found) + return NULL; + + return &pdata->objects[pdata->index[i] - 1]; +} + +struct object_entry *packlist_alloc(struct packing_data *pdata, + const unsigned char *sha1, + uint32_t index_pos) +{ + struct object_entry *new_entry; + + if (pdata->nr_objects >= pdata->nr_alloc) { + pdata->nr_alloc = (pdata->nr_alloc + 1024) * 3 / 2; + pdata->objects = xrealloc(pdata->objects, + pdata->nr_alloc * sizeof(*new_entry)); + } + + new_entry = pdata->objects + pdata->nr_objects++; + + memset(new_entry, 0, sizeof(*new_entry)); + hashcpy(new_entry->idx.sha1, sha1); + + if (pdata->index_size * 3 <= pdata->nr_objects * 4) + rehash_objects(pdata); + else + pdata->index[index_pos] = pdata->nr_objects; + + return new_entry; +} diff --git a/pack-objects.h b/pack-objects.h new file mode 100644 index 0000000000..d1b98b30ff --- /dev/null +++ b/pack-objects.h @@ -0,0 +1,68 @@ +#ifndef PACK_OBJECTS_H +#define PACK_OBJECTS_H + +struct object_entry { + struct pack_idx_entry idx; + unsigned long size; /* uncompressed size */ + struct packed_git *in_pack; /* already in pack */ + off_t in_pack_offset; + struct object_entry *delta; /* delta base object */ + struct object_entry *delta_child; /* deltified objects who bases me */ + struct object_entry *delta_sibling; /* other deltified objects who + * uses the same base as me + */ + void *delta_data; /* cached delta (uncompressed) */ + unsigned long delta_size; /* delta data size (uncompressed) */ + unsigned long z_delta_size; /* delta data size (compressed) */ + enum object_type type; + enum object_type in_pack_type; /* could be delta */ + uint32_t hash; /* name hint hash */ + unsigned int in_pack_pos; + unsigned char in_pack_header_size; + unsigned preferred_base:1; /* + * we do not pack this, but is available + * to be used as the base object to delta + * objects against. + */ + unsigned no_try_delta:1; + unsigned tagged:1; /* near the very tip of refs */ + unsigned filled:1; /* assigned write-order */ +}; + +struct packing_data { + struct object_entry *objects; + uint32_t nr_objects, nr_alloc; + + int32_t *index; + uint32_t index_size; +}; + +struct object_entry *packlist_alloc(struct packing_data *pdata, + const unsigned char *sha1, + uint32_t index_pos); + +struct object_entry *packlist_find(struct packing_data *pdata, + const unsigned char *sha1, + uint32_t *index_pos); + +static inline uint32_t pack_name_hash(const char *name) +{ + uint32_t c, hash = 0; + + if (!name) + return 0; + + /* + * This effectively just creates a sortable number from the + * last sixteen non-whitespace characters. Last characters + * count "most", so things that end in ".c" sort together. + */ + while ((c = *name++) != 0) { + if (isspace(c)) + continue; + hash = (hash >> 2) + (c << 24); + } + return hash; +} + +#endif diff --git a/pack-revindex.c b/pack-revindex.c index b4d2b35bb3..5bd7c61980 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -16,11 +16,6 @@ * get the object sha1 from the main index. */ -struct pack_revindex { - struct packed_git *p; - struct revindex_entry *revindex; -}; - static struct pack_revindex *pack_revindex; static int pack_revindex_hashsz; @@ -201,15 +196,14 @@ static void create_pack_revindex(struct pack_revindex *rix) sort_revindex(rix->revindex, num_ent, p->pack_size); } -struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) +struct pack_revindex *revindex_for_pack(struct packed_git *p) { int num; - unsigned lo, hi; struct pack_revindex *rix; - struct revindex_entry *revindex; if (!pack_revindex_hashsz) init_pack_revindex(); + num = pack_revindex_ix(p); if (num < 0) die("internal error: pack revindex fubar"); @@ -217,30 +211,37 @@ struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) rix = &pack_revindex[num]; if (!rix->revindex) create_pack_revindex(rix); - revindex = rix->revindex; - lo = 0; - hi = p->num_objects + 1; + return rix; +} + +int find_revindex_position(struct pack_revindex *pridx, off_t ofs) +{ + int lo = 0; + int hi = pridx->p->num_objects + 1; + struct revindex_entry *revindex = pridx->revindex; + do { unsigned mi = lo + (hi - lo) / 2; if (revindex[mi].offset == ofs) { - return revindex + mi; + return mi; } else if (ofs < revindex[mi].offset) hi = mi; else lo = mi + 1; } while (lo < hi); + error("bad offset for revindex"); - return NULL; + return -1; } -void discard_revindex(void) +struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) { - if (pack_revindex_hashsz) { - int i; - for (i = 0; i < pack_revindex_hashsz; i++) - free(pack_revindex[i].revindex); - free(pack_revindex); - pack_revindex_hashsz = 0; - } + struct pack_revindex *pridx = revindex_for_pack(p); + int pos = find_revindex_position(pridx, ofs); + + if (pos < 0) + return NULL; + + return pridx->revindex + pos; } diff --git a/pack-revindex.h b/pack-revindex.h index 8d5027ad91..d737f98965 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -6,7 +6,14 @@ struct revindex_entry { unsigned int nr; }; +struct pack_revindex { + struct packed_git *p; + struct revindex_entry *revindex; +}; + +struct pack_revindex *revindex_for_pack(struct packed_git *p); +int find_revindex_position(struct pack_revindex *pridx, off_t ofs); + struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs); -void discard_revindex(void); #endif diff --git a/pack-write.c b/pack-write.c index 605d01b25c..9b8308b759 100644 --- a/pack-write.c +++ b/pack-write.c @@ -364,5 +364,7 @@ void finish_tmp_packfile(char *name_buffer, if (rename(idx_tmp_name, name_buffer)) die_errno("unable to rename temporary index file"); + *end_of_name_prefix = '\0'; + free((void *)idx_tmp_name); } diff --git a/read-cache.c b/read-cache.c index 6d8ee3a792..fb440b4d9d 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1314,26 +1314,6 @@ int read_index(struct index_state *istate) return read_index_from(istate, get_index_file()); } -#ifndef NEEDS_ALIGNED_ACCESS -#define ntoh_s(var) ntohs(var) -#define ntoh_l(var) ntohl(var) -#else -static inline uint16_t ntoh_s_force_align(void *p) -{ - uint16_t x; - memcpy(&x, p, sizeof(x)); - return ntohs(x); -} -static inline uint32_t ntoh_l_force_align(void *p) -{ - uint32_t x; - memcpy(&x, p, sizeof(x)); - return ntohl(x); -} -#define ntoh_s(var) ntoh_s_force_align(&(var)) -#define ntoh_l(var) ntoh_l_force_align(&(var)) -#endif - static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk, unsigned int flags, const char *name, @@ -1341,16 +1321,16 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on { struct cache_entry *ce = xmalloc(cache_entry_size(len)); - ce->ce_stat_data.sd_ctime.sec = ntoh_l(ondisk->ctime.sec); - ce->ce_stat_data.sd_mtime.sec = ntoh_l(ondisk->mtime.sec); - ce->ce_stat_data.sd_ctime.nsec = ntoh_l(ondisk->ctime.nsec); - ce->ce_stat_data.sd_mtime.nsec = ntoh_l(ondisk->mtime.nsec); - ce->ce_stat_data.sd_dev = ntoh_l(ondisk->dev); - ce->ce_stat_data.sd_ino = ntoh_l(ondisk->ino); - ce->ce_mode = ntoh_l(ondisk->mode); - ce->ce_stat_data.sd_uid = ntoh_l(ondisk->uid); - ce->ce_stat_data.sd_gid = ntoh_l(ondisk->gid); - ce->ce_stat_data.sd_size = ntoh_l(ondisk->size); + ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec); + ce->ce_stat_data.sd_mtime.sec = get_be32(&ondisk->mtime.sec); + ce->ce_stat_data.sd_ctime.nsec = get_be32(&ondisk->ctime.nsec); + ce->ce_stat_data.sd_mtime.nsec = get_be32(&ondisk->mtime.nsec); + ce->ce_stat_data.sd_dev = get_be32(&ondisk->dev); + ce->ce_stat_data.sd_ino = get_be32(&ondisk->ino); + ce->ce_mode = get_be32(&ondisk->mode); + ce->ce_stat_data.sd_uid = get_be32(&ondisk->uid); + ce->ce_stat_data.sd_gid = get_be32(&ondisk->gid); + ce->ce_stat_data.sd_size = get_be32(&ondisk->size); ce->ce_flags = flags & ~CE_NAMEMASK; ce->ce_namelen = len; hashcpy(ce->sha1, ondisk->sha1); @@ -1390,14 +1370,14 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk, unsigned int flags; /* On-disk flags are just 16 bits */ - flags = ntoh_s(ondisk->flags); + flags = get_be16(&ondisk->flags); len = flags & CE_NAMEMASK; if (flags & CE_EXTENDED) { struct ondisk_cache_entry_extended *ondisk2; int extended_flags; ondisk2 = (struct ondisk_cache_entry_extended *)ondisk; - extended_flags = ntoh_s(ondisk2->flags2) << 16; + extended_flags = get_be16(&ondisk2->flags2) << 16; /* We do not yet understand any bit out of CE_EXTENDED_FLAGS */ if (extended_flags & ~CE_EXTENDED_FLAGS) die("Unknown index entry format %08x", extended_flags); diff --git a/revision.c b/revision.c index 61be08754a..bd027bc015 100644 --- a/revision.c +++ b/revision.c @@ -774,6 +774,10 @@ static int add_parents_to_list(struct rev_info *revs, struct commit *commit, return 0; commit->object.flags |= ADDED; + if (revs->include_check && + !revs->include_check(commit, revs->include_check_data)) + return 0; + /* * If the commit is uninteresting, don't try to * prune parents - we want the maximal uninteresting diff --git a/revision.h b/revision.h index 88967d6a24..1eb94c1548 100644 --- a/revision.h +++ b/revision.h @@ -172,6 +172,8 @@ struct rev_info { unsigned long min_age; int min_parents; int max_parents; + int (*include_check)(struct commit *, void *); + void *include_check_data; /* diff info for patches and for paths limiting */ struct diff_options diffopt; diff --git a/sha1_file.c b/sha1_file.c index 6e8c05d108..019628add5 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -252,8 +252,6 @@ char *sha1_pack_index_name(const unsigned char *sha1) struct alternate_object_database *alt_odb_list; static struct alternate_object_database **alt_odb_tail; -static int git_open_noatime(const char *name); - /* * Prepare alternate object database registry. * @@ -1232,6 +1230,7 @@ static void prepare_packed_git_one(char *objdir, int local) if (has_extension(de->d_name, ".idx") || has_extension(de->d_name, ".pack") || + has_extension(de->d_name, ".bitmap") || has_extension(de->d_name, ".keep")) string_list_append(&garbage, path); else @@ -1316,7 +1315,6 @@ void prepare_packed_git(void) void reprepare_packed_git(void) { - discard_revindex(); prepare_packed_git_run_once = 0; prepare_packed_git(); } @@ -1393,7 +1391,7 @@ int check_sha1_signature(const unsigned char *sha1, void *map, return hashcmp(sha1, real_sha1) ? -1 : 0; } -static int git_open_noatime(const char *name) +int git_open_noatime(const char *name) { static int sha1_file_open_flag = O_NOATIME; diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh new file mode 100755 index 0000000000..685d46f8b7 --- /dev/null +++ b/t/perf/p5310-pack-bitmaps.sh @@ -0,0 +1,57 @@ +#!/bin/sh + +test_description='Tests pack performance using bitmaps' +. ./perf-lib.sh + +test_perf_large_repo + +# note that we do everything through config, +# since we want to be able to compare bitmap-aware +# git versus non-bitmap git +test_expect_success 'setup bitmap config' ' + git config pack.writebitmaps true && + git config pack.writebitmaphashcache true +' + +test_perf 'repack to disk' ' + git repack -ad +' + +test_perf 'simulated clone' ' + git pack-objects --stdout --all </dev/null >/dev/null +' + +test_perf 'simulated fetch' ' + have=$(git rev-list HEAD~100 -1) && + { + echo HEAD && + echo ^$have + } | git pack-objects --revs --stdout >/dev/null +' + +test_expect_success 'create partial bitmap state' ' + # pick a commit to represent the repo tip in the past + cutoff=$(git rev-list HEAD~100 -1) && + orig_tip=$(git rev-parse HEAD) && + + # now kill off all of the refs and pretend we had + # just the one tip + rm -rf .git/logs .git/refs/* .git/packed-refs + git update-ref HEAD $cutoff + + # and then repack, which will leave us with a nice + # big bitmap pack of the "old" history, and all of + # the new history will be loose, as if it had been pushed + # up incrementally and exploded via unpack-objects + git repack -Ad + + # and now restore our original tip, as if the pushes + # had happened + git update-ref HEAD $orig_tip +' + +test_perf 'partial bitmap' ' + git pack-objects --stdout --all </dev/null >/dev/null +' + +test_done diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh new file mode 100755 index 0000000000..d3a3afaba8 --- /dev/null +++ b/t/t5310-pack-bitmaps.sh @@ -0,0 +1,139 @@ +#!/bin/sh + +test_description='exercise basic bitmap functionality' +. ./test-lib.sh + +test_expect_success 'setup repo with moderate-sized history' ' + for i in $(test_seq 1 10); do + test_commit $i + done && + git checkout -b other HEAD~5 && + for i in $(test_seq 1 10); do + test_commit side-$i + done && + git checkout master && + blob=$(echo tagged-blob | git hash-object -w --stdin) && + git tag tagged-blob $blob && + git config pack.writebitmaps true && + git config pack.writebitmaphashcache true +' + +test_expect_success 'full repack creates bitmaps' ' + git repack -ad && + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output +' + +test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' + git rev-list --test-bitmap HEAD +' + +rev_list_tests() { + state=$1 + + test_expect_success "counting commits via bitmap ($state)" ' + git rev-list --count HEAD >expect && + git rev-list --use-bitmap-index --count HEAD >actual && + test_cmp expect actual + ' + + test_expect_success "counting partial commits via bitmap ($state)" ' + git rev-list --count HEAD~5..HEAD >expect && + git rev-list --use-bitmap-index --count HEAD~5..HEAD >actual && + test_cmp expect actual + ' + + test_expect_success "counting non-linear history ($state)" ' + git rev-list --count other...master >expect && + git rev-list --use-bitmap-index --count other...master >actual && + test_cmp expect actual + ' + + test_expect_success "enumerate --objects ($state)" ' + git rev-list --objects --use-bitmap-index HEAD >tmp && + cut -d" " -f1 <tmp >tmp2 && + sort <tmp2 >actual && + git rev-list --objects HEAD >tmp && + cut -d" " -f1 <tmp >tmp2 && + sort <tmp2 >expect && + test_cmp expect actual + ' + + test_expect_success "bitmap --objects handles non-commit objects ($state)" ' + git rev-list --objects --use-bitmap-index HEAD tagged-blob >actual && + grep $blob actual + ' +} + +rev_list_tests 'full bitmap' + +test_expect_success 'clone from bitmapped repository' ' + git clone --no-local --bare . clone.git && + git rev-parse HEAD >expect && + git --git-dir=clone.git rev-parse HEAD >actual && + test_cmp expect actual +' + +test_expect_success 'setup further non-bitmapped commits' ' + for i in $(test_seq 1 10); do + test_commit further-$i + done +' + +rev_list_tests 'partial bitmap' + +test_expect_success 'fetch (partial bitmap)' ' + git --git-dir=clone.git fetch origin master:master && + git rev-parse HEAD >expect && + git --git-dir=clone.git rev-parse HEAD >actual && + test_cmp expect actual +' + +test_expect_success 'incremental repack cannot create bitmaps' ' + test_commit more-1 && + test_must_fail git repack -d +' + +test_expect_success 'incremental repack can disable bitmaps' ' + test_commit more-2 && + git repack -d --no-write-bitmap-index +' + +test_expect_success 'full repack, reusing previous bitmaps' ' + git repack -ad && + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output +' + +test_expect_success 'fetch (full bitmap)' ' + git --git-dir=clone.git fetch origin master:master && + git rev-parse HEAD >expect && + git --git-dir=clone.git rev-parse HEAD >actual && + test_cmp expect actual +' + +test_lazy_prereq JGIT ' + type jgit +' + +test_expect_success JGIT 'we can read jgit bitmaps' ' + git clone . compat-jgit && + ( + cd compat-jgit && + rm -f .git/objects/pack/*.bitmap && + jgit gc && + git rev-list --test-bitmap HEAD + ) +' + +test_expect_success JGIT 'jgit can read our bitmaps' ' + git clone . compat-us && + ( + cd compat-us && + git repack -adb && + # jgit gc will barf if it does not like our bitmaps + jgit gc + ) +' + +test_done |