<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/git.git/t/perf, branch jk/commit-date-approxidate</title>
<subtitle>github.com: git/git.git
</subtitle>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/git.git/'/>
<entry>
<title>Merge branch 'jk/pack-bitmap'</title>
<updated>2014-02-27T22:01:48+00:00</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2014-02-27T22:01:48+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/git.git/commit/?id=0f9e62e0847c075678a7a5a748567d1e881d16f8'/>
<id>0f9e62e0847c075678a7a5a748567d1e881d16f8</id>
<content type='text'>
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.

* jk/pack-bitmap: (26 commits)
  ewah: unconditionally ntohll ewah data
  ewah: support platforms that require aligned reads
  read-cache: use get_be32 instead of hand-rolled ntoh_l
  block-sha1: factor out get_be and put_be wrappers
  do not discard revindex when re-preparing packfiles
  pack-bitmap: implement optional name_hash cache
  t/perf: add tests for pack bitmaps
  t: add basic bitmap functionality tests
  count-objects: recognize .bitmap in garbage-checking
  repack: consider bitmaps when performing repacks
  repack: handle optional files created by pack-objects
  repack: turn exts array into array-of-struct
  repack: stop using magic number for ARRAY_SIZE(exts)
  pack-objects: implement bitmap writing
  rev-list: add bitmap mode to speed up object lists
  pack-objects: use bitmaps when packing objects
  pack-objects: split add_object_entry
  pack-bitmap: add support for bitmap indexes
  documentation: add documentation for the bitmap format
  ewah: compressed bitmap implementation
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.

* jk/pack-bitmap: (26 commits)
  ewah: unconditionally ntohll ewah data
  ewah: support platforms that require aligned reads
  read-cache: use get_be32 instead of hand-rolled ntoh_l
  block-sha1: factor out get_be and put_be wrappers
  do not discard revindex when re-preparing packfiles
  pack-bitmap: implement optional name_hash cache
  t/perf: add tests for pack bitmaps
  t: add basic bitmap functionality tests
  count-objects: recognize .bitmap in garbage-checking
  repack: consider bitmaps when performing repacks
  repack: handle optional files created by pack-objects
  repack: turn exts array into array-of-struct
  repack: stop using magic number for ARRAY_SIZE(exts)
  pack-objects: implement bitmap writing
  rev-list: add bitmap mode to speed up object lists
  pack-objects: use bitmaps when packing objects
  pack-objects: split add_object_entry
  pack-bitmap: add support for bitmap indexes
  documentation: add documentation for the bitmap format
  ewah: compressed bitmap implementation
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'jk/mark-edges-uninteresting'</title>
<updated>2014-01-27T18:45:08+00:00</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2014-01-27T18:45:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/git.git/commit/?id=a6bec00145da3013e693072122f2fa53076e73cd'/>
<id>a6bec00145da3013e693072122f2fa53076e73cd</id>
<content type='text'>
Fix performance regression in v1.8.4.x and later.

* jk/mark-edges-uninteresting:
  list-objects: only look at cmdline trees with edge_hint
  t/perf: time rev-list with UNINTERESTING commits
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix performance regression in v1.8.4.x and later.

* jk/mark-edges-uninteresting:
  list-objects: only look at cmdline trees with edge_hint
  t/perf: time rev-list with UNINTERESTING commits
</pre>
</div>
</content>
</entry>
<entry>
<title>t/perf: time rev-list with UNINTERESTING commits</title>
<updated>2014-01-21T22:46:17+00:00</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-01-21T02:25:12+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/git.git/commit/?id=ea97002fc9f682a804ac05212d069e38fa3e365c'/>
<id>ea97002fc9f682a804ac05212d069e38fa3e365c</id>
<content type='text'>
We time a straight "rev-list --all" and its "--object"
counterpart, both going all the way to the root. However, we
do not time a partial history walk. This patch adds an
extreme case: a walk over a very small slice of history, but
with a very large set of UNINTERESTING tips. This is similar
to the connectivity check run by git on a small fetch, or
the walk done by any pre-receive hooks that want to check
incoming commits.

This test reveals a performance regression in git v1.8.4.2,
caused by fbd4a70 (list-objects: mark more commits as edges
in mark_edges_uninteresting, 2013-08-16):

Test                                             fbd4a703^         fbd4a703
------------------------------------------------------------------------------------------
0001.1: rev-list --all                           0.69(0.67+0.02)   0.69(0.68+0.01) +0.0%
0001.2: rev-list --all --objects                 3.47(3.44+0.02)   3.48(3.44+0.03) +0.3%
0001.4: rev-list $commit --not --all             0.04(0.04+0.00)   0.04(0.04+0.00) +0.0%
0001.5: rev-list --objects $commit --not --all   0.04(0.03+0.00)   0.27(0.24+0.02) +575.0%

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We time a straight "rev-list --all" and its "--object"
counterpart, both going all the way to the root. However, we
do not time a partial history walk. This patch adds an
extreme case: a walk over a very small slice of history, but
with a very large set of UNINTERESTING tips. This is similar
to the connectivity check run by git on a small fetch, or
the walk done by any pre-receive hooks that want to check
incoming commits.

This test reveals a performance regression in git v1.8.4.2,
caused by fbd4a70 (list-objects: mark more commits as edges
in mark_edges_uninteresting, 2013-08-16):

Test                                             fbd4a703^         fbd4a703
------------------------------------------------------------------------------------------
0001.1: rev-list --all                           0.69(0.67+0.02)   0.69(0.68+0.01) +0.0%
0001.2: rev-list --all --objects                 3.47(3.44+0.02)   3.48(3.44+0.03) +0.3%
0001.4: rev-list $commit --not --all             0.04(0.04+0.00)   0.04(0.04+0.00) +0.0%
0001.5: rev-list --objects $commit --not --all   0.04(0.03+0.00)   0.27(0.24+0.02) +575.0%

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pack-bitmap: implement optional name_hash cache</title>
<updated>2013-12-30T20:19:23+00:00</updated>
<author>
<name>Vicent Marti</name>
<email>tanoku@gmail.com</email>
</author>
<published>2013-12-21T14:00:45+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/git.git/commit/?id=ae4f07fbccaab6dc93be52c0f34e137dd9fcbcf4'/>
<id>ae4f07fbccaab6dc93be52c0f34e137dd9fcbcf4</id>
<content type='text'>
When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.

In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.

As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have "have" tips in the
bitmapped history, but "want" newer objects.

The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.

But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
"new" and "old" objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.

This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.

Here are perf results for p5310:

Test                      origin/master       HEAD^                      HEAD
-------------------------------------------------------------------------------------------------
5310.2: repack to disk    36.81(37.82+1.43)   47.70(48.74+1.41) +29.6%   47.75(48.70+1.51) +29.7%
5310.3: simulated clone   30.78(29.70+2.14)   1.08(0.97+0.10) -96.5%     1.07(0.94+0.12) -96.5%
5310.4: simulated fetch   3.16(6.10+0.08)     3.54(10.65+0.06) +12.0%    1.70(3.07+0.06) -46.2%
5310.6: partial bitmap    36.76(43.19+1.81)   6.71(11.25+0.76) -81.7%    4.08(6.26+0.46) -88.9%

You can see that the time spent on an incremental fetch goes
down, as our delta heuristics are able to do their work.
And we save time on the partial bitmap clone for the same
reason.

Signed-off-by: Vicent Marti &lt;tanoku@gmail.com&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.

In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.

As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have "have" tips in the
bitmapped history, but "want" newer objects.

The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.

But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
"new" and "old" objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.

This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.

Here are perf results for p5310:

Test                      origin/master       HEAD^                      HEAD
-------------------------------------------------------------------------------------------------
5310.2: repack to disk    36.81(37.82+1.43)   47.70(48.74+1.41) +29.6%   47.75(48.70+1.51) +29.7%
5310.3: simulated clone   30.78(29.70+2.14)   1.08(0.97+0.10) -96.5%     1.07(0.94+0.12) -96.5%
5310.4: simulated fetch   3.16(6.10+0.08)     3.54(10.65+0.06) +12.0%    1.70(3.07+0.06) -46.2%
5310.6: partial bitmap    36.76(43.19+1.81)   6.71(11.25+0.76) -81.7%    4.08(6.26+0.46) -88.9%

You can see that the time spent on an incremental fetch goes
down, as our delta heuristics are able to do their work.
And we save time on the partial bitmap clone for the same
reason.

Signed-off-by: Vicent Marti &lt;tanoku@gmail.com&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>t/perf: add tests for pack bitmaps</title>
<updated>2013-12-30T20:19:23+00:00</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2013-12-21T14:00:42+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/git.git/commit/?id=bbcefa1f3f8355921137dd7a097b3ee3db66f023'/>
<id>bbcefa1f3f8355921137dd7a097b3ee3db66f023</id>
<content type='text'>
This adds a few basic perf tests for the pack bitmap code to
show off its improvements. The tests are:

  1. How long does it take to do a repack (it gets slower
     with bitmaps, since we have to do extra work)?

  2. How long does it take to do a clone (it gets faster
     with bitmaps)?

  3. How does a small fetch perform when we've just
     repacked?

  4. How does a clone perform when we haven't repacked since
     a week of pushes?

Here are results against linux.git:

Test                      origin/master       this tree
-----------------------------------------------------------------------
5310.2: repack to disk    33.64(32.64+2.04)   67.67(66.75+1.84) +101.2%
5310.3: simulated clone   30.49(29.47+2.05)   1.20(1.10+0.10) -96.1%
5310.4: simulated fetch   3.49(6.79+0.06)     5.57(22.35+0.07) +59.6%
5310.6: partial bitmap    36.70(43.87+1.81)   8.18(21.92+0.73) -77.7%

You can see that we do take longer to repack, but we do way
better for further clones. A small fetch performs a bit
worse, as we spend way more time on delta compression (note
the heavy user CPU time, as we have 8 threads) due to the
lack of name hashes for the bitmapped objects.

The final test shows how the bitmaps degrade over time
between packs. There's still a significant speedup over the
non-bitmap case, but we don't do quite as well (we have to
spend time accessing the "new" objects the old fashioned
way, including delta compression).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This adds a few basic perf tests for the pack bitmap code to
show off its improvements. The tests are:

  1. How long does it take to do a repack (it gets slower
     with bitmaps, since we have to do extra work)?

  2. How long does it take to do a clone (it gets faster
     with bitmaps)?

  3. How does a small fetch perform when we've just
     repacked?

  4. How does a clone perform when we haven't repacked since
     a week of pushes?

Here are results against linux.git:

Test                      origin/master       this tree
-----------------------------------------------------------------------
5310.2: repack to disk    33.64(32.64+2.04)   67.67(66.75+1.84) +101.2%
5310.3: simulated clone   30.49(29.47+2.05)   1.20(1.10+0.10) -96.1%
5310.4: simulated fetch   3.49(6.79+0.06)     5.57(22.35+0.07) +59.6%
5310.6: partial bitmap    36.70(43.87+1.81)   8.18(21.92+0.73) -77.7%

You can see that we do take longer to repack, but we do way
better for further clones. A small fetch performs a bit
worse, as we spend way more time on delta compression (note
the heavy user CPU time, as we have 8 threads) due to the
lack of name hashes for the bitmapped objects.

The final test shows how the bitmaps degrade over time
between packs. There's still a significant speedup over the
non-bitmap case, but we don't do quite as well (we have to
spend time accessing the "new" objects the old fashioned
way, including delta compression).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'tg/diff-no-index-refactor'</title>
<updated>2013-12-27T22:58:17+00:00</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2013-12-27T22:58:17+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/git.git/commit/?id=73b063130befa7475316a142343de87da61f31e3'/>
<id>73b063130befa7475316a142343de87da61f31e3</id>
<content type='text'>
"git diff ../else/where/A ../else/where/B" when ../else/where is
clearly outside the repository, and "git diff --no-index A B", do
not have to look at the index at all, but we used to read the index
unconditionally.

* tg/diff-no-index-refactor:
  diff: avoid some nesting
  diff: add test for --no-index executed outside repo
  diff: don't read index when --no-index is given
  diff: move no-index detection to builtin/diff.c
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
"git diff ../else/where/A ../else/where/B" when ../else/where is
clearly outside the repository, and "git diff --no-index A B", do
not have to look at the index at all, but we used to read the index
unconditionally.

* tg/diff-no-index-refactor:
  diff: avoid some nesting
  diff: add test for --no-index executed outside repo
  diff: don't read index when --no-index is given
  diff: move no-index detection to builtin/diff.c
</pre>
</div>
</content>
</entry>
<entry>
<title>diff: don't read index when --no-index is given</title>
<updated>2013-12-12T20:23:02+00:00</updated>
<author>
<name>Thomas Gummerer</name>
<email>t.gummerer@gmail.com</email>
</author>
<published>2013-12-11T09:58:43+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/git.git/commit/?id=6df5762db354ca55a0cf77451d06b332b7de0b82'/>
<id>6df5762db354ca55a0cf77451d06b332b7de0b82</id>
<content type='text'>
git diff --no-index ... currently reads the index, during setup, when
calling gitmodules_config().  This results in worse performance when the
index is not actually needed.  This patch avoids calling
gitmodules_config() when the --no-index option is given.  The times for
executing "git diff --no-index" in the WebKit repository are improved as
follows:

Test                      HEAD~3            HEAD
------------------------------------------------------------------
4001.1: diff --no-index   0.24(0.15+0.09)   0.01(0.00+0.00) -95.8%

An additional improvement of this patch is that "git diff --no-index" no
longer breaks when the index file is corrupt, which makes it possible to
use it for investigating the broken repository.

To improve the possible usage as investigation tool for broken
repositories, setup_git_directory_gently() is also not called when the
--no-index option is given.

Also add a test to guard against future breakages, and a performance
test to show the improvements.

Signed-off-by: Thomas Gummerer &lt;t.gummerer@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
git diff --no-index ... currently reads the index, during setup, when
calling gitmodules_config().  This results in worse performance when the
index is not actually needed.  This patch avoids calling
gitmodules_config() when the --no-index option is given.  The times for
executing "git diff --no-index" in the WebKit repository are improved as
follows:

Test                      HEAD~3            HEAD
------------------------------------------------------------------
4001.1: diff --no-index   0.24(0.15+0.09)   0.01(0.00+0.00) -95.8%

An additional improvement of this patch is that "git diff --no-index" no
longer breaks when the index file is corrupt, which makes it possible to
use it for investigating the broken repository.

To improve the possible usage as investigation tool for broken
repositories, setup_git_directory_gently() is also not called when the
--no-index option is given.

Also add a test to guard against future breakages, and a performance
test to show the improvements.

Signed-off-by: Thomas Gummerer &lt;t.gummerer@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>test: replace shebangs with descriptions in shell libraries</title>
<updated>2013-11-26T22:23:52+00:00</updated>
<author>
<name>Jonathan Nieder</name>
<email>jrnieder@gmail.com</email>
</author>
<published>2013-11-25T21:03:06+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/git.git/commit/?id=c74c72034f061d1d9d5b8b1fba20ce0138d423b4'/>
<id>c74c72034f061d1d9d5b8b1fba20ce0138d423b4</id>
<content type='text'>
A #! line in these files is misleading, since these scriptlets are
meant to be sourced with '.' (using whatever shell sources them)
instead of run directly using the interpreter named on the #! line.

Removing the #! line shouldn't hurt syntax highlighting since
these files have filenames ending with '.sh'.  For documentation,
add a brief description of how the files are meant to be used in
place of the shebang line.

Signed-off-by: Jonathan Nieder &lt;jrnieder@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A #! line in these files is misleading, since these scriptlets are
meant to be sourced with '.' (using whatever shell sources them)
instead of run directly using the interpreter named on the #! line.

Removing the #! line shouldn't hurt syntax highlighting since
these files have filenames ending with '.sh'.  For documentation,
add a brief description of how the files are meant to be used in
place of the shebang line.

Signed-off-by: Jonathan Nieder &lt;jrnieder@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'lf/echo-n-is-not-portable'</title>
<updated>2013-08-01T18:52:43+00:00</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2013-08-01T18:52:43+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/git.git/commit/?id=a5203a3f044ced7904800f3f233451474c1d5120'/>
<id>a5203a3f044ced7904800f3f233451474c1d5120</id>
<content type='text'>
* lf/echo-n-is-not-portable:
  Avoid using `echo -n` anywhere
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* lf/echo-n-is-not-portable:
  Avoid using `echo -n` anywhere
</pre>
</div>
</content>
</entry>
<entry>
<title>Avoid using `echo -n` anywhere</title>
<updated>2013-07-29T16:56:58+00:00</updated>
<author>
<name>Lukas Fleischer</name>
<email>git@cryptocrack.de</email>
</author>
<published>2013-07-27T12:11:33+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/git.git/commit/?id=19c3c5fdcb35b66b792534c5dc4e8d87a3952d2a'/>
<id>19c3c5fdcb35b66b792534c5dc4e8d87a3952d2a</id>
<content type='text'>
`echo -n` is non-portable. The POSIX specification says:

    Conforming applications that wish to do prompting without &lt;newline&gt;
    characters or that could possibly be expecting to echo a -n, should
    use the printf utility derived from the Ninth Edition system.

Since all of the affected shell scripts use a POSIX shell shebang,
replace `echo -n` invocations with printf.

Signed-off-by: Lukas Fleischer &lt;git@cryptocrack.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
`echo -n` is non-portable. The POSIX specification says:

    Conforming applications that wish to do prompting without &lt;newline&gt;
    characters or that could possibly be expecting to echo a -n, should
    use the printf utility derived from the Ninth Edition system.

Since all of the affected shell scripts use a POSIX shell shebang,
replace `echo -n` invocations with printf.

Signed-off-by: Lukas Fleischer &lt;git@cryptocrack.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
