summaryrefslogtreecommitdiff
path: root/t/t5000-tar-tree.sh
diff options
context:
space:
mode:
authorRené Scharfe <l.s.r@web.de>2022-06-15 19:02:33 +0200
committerJunio C Hamano <gitster@pobox.com>2022-06-15 13:19:47 -0700
commit76d7602631a9d0cb67cc1b848d580b862dc5de8b (patch)
treefb72986437cfb791d2a109f891ab08382014abe5 /t/t5000-tar-tree.sh
parentdfce1186c6034d6f4ea283f5178fd25cbd8f4fc0 (diff)
downloadgit-76d7602631a9d0cb67cc1b848d580b862dc5de8b.tar.gz
archive-tar: add internal gzip implementation
Git uses zlib for its own object store, but calls gzip when creating tgz archives. Add an option to perform the gzip compression for the latter using zlib, without depending on the external gzip binary. Plug it in by making write_block a function pointer and switching to a compressing variant if the filter command has the magic value "git archive gzip". Does that indirection slow down tar creation? Not really, at least not in this test: $ hyperfine -w3 -L rev HEAD,origin/main -p 'git checkout {rev} && make' \ './git -C ../linux archive --format=tar HEAD # {rev}' Benchmark #1: ./git -C ../linux archive --format=tar HEAD # HEAD Time (mean ± σ): 4.044 s ± 0.007 s [User: 3.901 s, System: 0.137 s] Range (min … max): 4.038 s … 4.059 s 10 runs Benchmark #2: ./git -C ../linux archive --format=tar HEAD # origin/main Time (mean ± σ): 4.047 s ± 0.009 s [User: 3.903 s, System: 0.138 s] Range (min … max): 4.038 s … 4.066 s 10 runs How does tgz creation perform? $ hyperfine -w3 -L command 'gzip -cn','git archive gzip' \ './git -c tar.tgz.command="{command}" -C ../linux archive --format=tgz HEAD' Benchmark #1: ./git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD Time (mean ± σ): 20.404 s ± 0.006 s [User: 23.943 s, System: 0.401 s] Range (min … max): 20.395 s … 20.414 s 10 runs Benchmark #2: ./git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD Time (mean ± σ): 23.807 s ± 0.023 s [User: 23.655 s, System: 0.145 s] Range (min … max): 23.782 s … 23.857 s 10 runs Summary './git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD' ran 1.17 ± 0.00 times faster than './git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD' So the internal implementation takes 17% longer on the Linux repo, but uses 2% less CPU time. That's because the external gzip can run in parallel on its own processor, while the internal one works sequentially and avoids the inter-process communication overhead. What are the benefits? Only an internal sequential implementation can offer this eco mode, and it allows avoiding the gzip(1) requirement. This implementation uses the helper functions from our zlib.c instead of the convenient gz* functions from zlib, because the latter doesn't give the control over the generated gzip header that the next patch requires. Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t5000-tar-tree.sh')
-rwxr-xr-xt/t5000-tar-tree.sh16
1 files changed, 16 insertions, 0 deletions
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 7f8d2ab0a7..9ac0ec67fe 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -374,6 +374,22 @@ test_expect_success GZIP 'remote tar.gz can be disabled' '
>remote.tar.gz
'
+test_expect_success 'git archive --format=tgz (internal gzip)' '
+ test_config tar.tgz.command "git archive gzip" &&
+ git archive --format=tgz HEAD >internal_gzip.tgz
+'
+
+test_expect_success 'git archive --format=tar.gz (internal gzip)' '
+ test_config tar.tar.gz.command "git archive gzip" &&
+ git archive --format=tar.gz HEAD >internal_gzip.tar.gz &&
+ test_cmp_bin internal_gzip.tgz internal_gzip.tar.gz
+'
+
+test_expect_success GZIP 'extract tgz file (internal gzip)' '
+ gzip -d -c <internal_gzip.tgz >internal_gzip.tar &&
+ test_cmp_bin b.tar internal_gzip.tar
+'
+
test_expect_success 'archive and :(glob)' '
git archive -v HEAD -- ":(glob)**/sh" >/dev/null 2>actual &&
cat >expect <<EOF &&