summaryrefslogtreecommitdiff
path: root/diff-delta.c
Commit message (Collapse)AuthorAgeFilesLines
* Revert "move pack creation to version 3"Junio C Hamano2006-10-141-6/+2
| | | | | | | | | | | | | | This reverts commit 16854571aae6302f457c5fbee41ac64669b09595. Git as recent as v1.1.6 do not understand version 3 delta. v1.2.0 is Ok and I personally would say it is old enough, but the improvement between version 2 and version 3 delta is not bit enough to justify breaking older clients. We should resurrect this later, but when we do so we shold make it conditional. Signed-off-by: Junio C Hamano <junkio@cox.net>
* move pack creation to version 3Nicolas Pitre2006-09-221-2/+6
| | | | | | | | It's been quite a while now that GIT is able to read version 3 packs. Let's create them at last. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Use xrealloc instead of reallocJonas Fonseca2006-08-261-1/+1
| | | | | | | | | Change places that use realloc, without a proper error path, to instead use xrealloc. Drop an erroneous error path in the daemon code that used errno in the die message in favour of the simpler xrealloc. Signed-off-by: Jonas Fonseca <fonseca@diku.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Fix a comparison bug in diff-delta.cPierre Habouzit2006-08-231-1/+1
| | | | | | | | | (1 << i) < hspace is compared in the `int` space rather that in the unsigned one. the result will be wrong if hspace is between 0x40000000 and 0x80000000. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Fix more typos, primarily in the codePavel Roskin2006-07-101-3/+3
| | | | | | | | | The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Remove all void-pointer arithmetic.Florian Forster2006-06-201-1/+1
| | | | | | | | ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Initialize FAMs using `FLEX_ARRAY'.Florian Forster2006-06-181-1/+2
| | | | | | | | When initializing a `flexible array member' the macro `FLEX_ARRAY' should be used. This was forgotten in `diff-delta.c'. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* fix diff-delta bad memory accessNicolas Pitre2006-05-101-5/+0
| | | | | | | | | | | | It cannot be assumed that the given buffer will never be moved when shrinking the allocated memory size with realloc(). So let's ignore that optimization for now. This patch makes Electric Fence happy on Linux. Signed-off-by: Nicolas Pitre <nico@cam.org> Acked-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* improve diff-delta with sparse and/or repetitive dataNicolas Pitre2006-05-021-13/+27
| | | | | | | | | | | | | | | | | | It is useless to preserve multiple hash entries for consecutive blocks with the same hash. Keeping only the first one will allow for matching the longest string of identical bytes while subsequent blocks will only allow for shorter matches. The backward matching code will match the end of it as necessary. This improves both performances (no repeated string compare with long successions of identical bytes, or even small group of bytes), as well as compression (less likely to need random hash bucket entry culling), especially with sparse files. With well behaved data sets this patch doesn't change much. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* tiny optimization to diff-deltaNicolas Pitre2006-05-021-3/+2
| | | | | | | | | | This is my assembly freak side looking at generated code again. And since create_delta() is certainly pretty high on the radar every bits count. In this case shorter code is generated if hash_mask is not copied to a local variable. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* replace adler32 with Rabin's polynomial in diff-deltaNicolas Pitre2006-04-281-34/+150
| | | | | | | | | | | | | | | This brings another small repacking speedup for sensibly the same pack size. On the Linux kernel repo, git-repack -a -f is 3.7% faster for a 0.4% larger pack. Credits to Geert Bosch who brought the Rabin's polynomial idea to my attention. This also eliminate the issue of adler32() reading past the data buffer, as noticed by Johannes Schindelin. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* split the diff-delta interfaceNicolas Pitre2006-04-241-83/+85
| | | | | | | | | | | | | | | This patch splits the diff-delta interface into index creation and delta generation. A wrapper is provided to preserve the diff-delta() call. This will allow for an optimization in pack-objects.c where the source object could be fixed and a full window of objects tentatively tried against that same source object without recomputing the source index each time. This patch only restructure things, plus a couple cleanups for good measure. There is no performance change yet. Signed-off-by: Nicolas Pitre <nico@cam.org>
* 3% tighter packs for freeNicolas Pitre2006-03-171-1/+16
| | | | | | | | | | This patch makes for 3.4% smaller pack with the git repository, and a bit more than 3% smaller pack with the kernel repository. And so with _no_ measurable CPU difference. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* diff-delta: bound hash list length to avoid O(m*n) behaviorNicolas Pitre2006-03-091-30/+71
| | | | | | | | | | | | | | | | | The diff-delta code can exhibit O(m*n) behavior with some patological data set where most hash entries end up in the same hash bucket. To prevent this, a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a tiny bit more expensive on average, even if some small optimizations were added as well to atenuate the overhead. But the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* diff-delta: big code simplificationNicolas Pitre2006-02-221-146/+85
| | | | | | | This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* diff-delta: fold two special tests into one plus cleanupsNicolas Pitre2006-02-221-10/+14
| | | | | | | | | | | | | | | Testing for realloc and size limit can be done with only one test per loop. Make it so and fix a theoretical off-by-one comparison error in the process. The output buffer memory allocation is also bounded by max_size when specified. Finally make some variable unsigned to allow the handling of files up to 4GB in size instead of 2GB. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Use adler32() from zlib instead of defining our own.Peter Eriksen2006-02-051-38/+1
| | | | | | | | Since we already depend on zlib, we don't need to define our own adler32(). Spotted by oprofile. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>
* small cleanup for diff-delta.cNicolas Pitre2005-12-151-22/+12
| | | | | | | | This patch removes unused remnants of the original xdiff source. No functional change. Possible tiny speed improvement. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Revert "diff-delta.c: allow delta with empty blob."Junio C Hamano2005-12-121-1/+1
| | | | | This reverts 962537a3eb03a118cf27d9d0da365a3216ed1caa commit to play safe.
* diff-delta.c: allow delta with empty blob.Junio C Hamano2005-12-121-1/+1
| | | | | | | | Delta computation with an empty blob used to punt and returned NULL. This commit allows creation with empty blob; all combination of empty->empty, empty->something, and something->empty are allowed. Signed-off-by: Junio C Hamano <junkio@cox.net>
* [PATCH] assorted delta code cleanupNicolas Pitre2005-06-291-1/+2
| | | | | | | | | This is a wrap-up patch including all the cleanups I've done to the delta code and its usage. The most important change is the factorization of the delta header handling code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] denser delta header encodingNicolas Pitre2005-06-281-20/+14
| | | | | | | | | | | Since the delta data format is not tied to any actual git object anymore, now is the time to add a small improvement to the delta data header as it is been done for packed object header. This patch allows for reducing the delta header of about 2 bytes and makes for simpler code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Add a "max_size" parameter to diff_delta()Linus Torvalds2005-06-251-1/+7
| | | | | | | Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens.
* [PATCH] Deltification library work by Nicolas Pitre.Nicolas Pitre2005-05-191-0/+333
This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>