summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* pack-bitmap: pass object filter to fill-in traversalJeff King2020-05-042-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes a bitmap traversal still has to walk some commits manually, because those commits aren't included in the bitmap packfile (e.g., due to a push or commit since the last full repack). If we're given an object filter, we don't pass it down to this traversal. It's not necessary for correctness because the bitmap code has its own filters to post-process the bitmap result (which it must, to filter out the objects that _are_ mentioned in the bitmapped packfile). And with blob filters, there was no performance reason to pass along those filters, either. The fill-in traversal could omit them from the result, but it wouldn't save us any time to do so, since we'd still have to walk each tree entry to see if it's a blob or not. But now that we support tree filters, there's opportunity for savings. A tree:depth=0 filter means we can avoid accessing trees entirely, since we know we won't them (or any of the subtrees or blobs they point to). The new test in p5310 shows this off (the "partial bitmap" state is one where HEAD~100 and its ancestors are all in a bitmapped pack, but HEAD~100..HEAD are not). Here are the results (run against linux.git): Test HEAD^ HEAD ------------------------------------------------------------------------------------------------- [...] 5310.16: rev-list with tree filter (partial bitmap) 0.19(0.17+0.02) 0.03(0.02+0.01) -84.2% The absolute number of savings isn't _huge_, but keep in mind that we only omitted 100 first-parent links (in the version of linux.git here, that's 894 actual commits). In a more pathological case, we might have a much larger proportion of non-bitmapped commits. I didn't bother creating such a case in the perf script because the setup is expensive, and this is plenty to show the savings as a percentage. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pack-bitmap.c: support 'tree:0' filteringTaylor Blau2020-05-043-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | In the previous patch, we made it easy to define other filters that exclude all objects of a certain type. Use that in order to implement bitmap-level filtering for the '--filter=tree:<n>' filter when 'n' is equal to 0. The general case is not helped by bitmaps, since for values of 'n > 0', the object filtering machinery requires a full-blown tree traversal in order to determine the depth of a given tree. Caching this is non-obvious, too, since the same tree object can have a different depth depending on the context (e.g., a tree was moved up in the directory hierarchy between two commits). But, the 'n = 0' case can be helped, and this patch does so. Running p5310.11 in this tree and on master with the kernel, we can see that this case is helped substantially: Test master this tree -------------------------------------------------------------------------------- 5310.11: rev-list count with tree:0 10.68(10.39+0.27) 0.06(0.04+0.01) -99.4% Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pack-bitmap.c: make object filtering functions genericTaylor Blau2020-05-041-11/+24
| | | | | | | | | | | | | | | | | | | | In 4f3bd5606a (pack-bitmap: implement BLOB_NONE filtering, 2020-02-14), filtering support for bitmaps was added for the 'LOFC_BLOB_NONE' filter. In the future, we would like to add support for filters that behave as if they exclude a certain type of object, for e.g., the tree depth filter with depth 0. To prepare for this, make some of the functions used for filtering more generic, such as 'find_tip_blobs' and 'filter_bitmap_blob_none' so that they can work over arbitrary object types. To that end, create 'find_tip_objects' and 'filter_bitmap_exclude_type', and redefine the aforementioned functions in terms of those. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* list-objects-filter: treat NULL filter_options as "disabled"Jeff King2020-05-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | In most callers, we have an actual list_objects_filter_options struct, and if no filtering is desired its "choice" element will be LOFC_DISABLED. However, some code may have only a pointer to such a struct which may be NULL (because _their_ callers didn't care about filtering, either). Rather than forcing them to handle this explicitly like: if (filter_options) traverse_commit_list_filtered(filter_options, revs, show_commit, show_object, show_data, NULL); else traverse_commit_list(revs, show_commit, show_object, show_data); let's just treat a NULL filter_options the same as LOFC_DISABLED. We only need a small change, since that option struct is converted into a real filter only in the "init" function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* The sixth batchJunio C Hamano2020-05-011-0/+52
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'jt/v2-fetch-nego-fix'Junio C Hamano2020-05-012-12/+86
|\ | | | | | | | | | | | | | | | | | | | | | | The upload-pack protocol v2 gave up too early before finding a common ancestor, resulting in a wasteful fetch from a fork of a project. This has been corrected to match the behaviour of v0 protocol. * jt/v2-fetch-nego-fix: fetch-pack: in protocol v2, reset in_vain upon ACK fetch-pack: in protocol v2, in_vain only after ACK fetch-pack: return enum from process_acks()
| * fetch-pack: in protocol v2, reset in_vain upon ACKJonathan Tan2020-04-282-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the function process_acks() in fetch-pack.c, the variable received_ack is meant to track that an ACK was received, but it was never set. This results in negotiation terminating prematurely through the in_vain counter, when the counter should have been reset upon every ACK. Therefore, reset the in_vain counter upon every ACK. Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * fetch-pack: in protocol v2, in_vain only after ACKJonathan Tan2020-04-282-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When fetching, Git stops negotiation when it has sent at least MAX_IN_VAIN (which is 256) "have" lines without having any of them ACK-ed. But this is supposed to trigger only after the first ACK, as pack-protocol.txt says: However, the 256 limit *only* turns on in the canonical client implementation if we have received at least one "ACK %s continue" during a prior round. This helps to ensure that at least one common ancestor is found before we give up entirely. The code path for protocol v0 observes this, but not protocol v2, resulting in shorter negotiation rounds but significantly larger packfiles. Teach the code path for protocol v2 to check this criterion only after at least one ACK was received. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * fetch-pack: return enum from process_acks()Jonathan Tan2020-04-281-8/+28
| | | | | | | | | | | | | | | | | | | | process_acks() returns 0, 1, or 2, depending on whether "ready" was received and if not, whether at least one commit was found to be common. Replace these magic numbers with a documented enum. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'js/anonymise-push-url-in-errors'Junio C Hamano2020-05-011-2/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | Error and verbose trace messages from "git push" did not redact credential material embedded in URLs. * js/anonymise-push-url-in-errors: push: anonymize URLs in error messages and warnings
| * | push: anonymize URLs in error messages and warningsJohannes Schindelin2020-04-281-2/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | Just like 47abd85ba0 (fetch: Strip usernames from url's before storing them, 2009-04-17) and later 882d49ca5c (push: anonymize URL in status output, 2016-07-13), and even later c1284b21f243 (curl: anonymize URLs in error messages and warnings, 2019-03-04) this change anonymizes URLs (read: strips them of user names and especially passwords) in user-facing error messages and warnings. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'es/bugreport'Junio C Hamano2020-05-0116-131/+458
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "bugreport" tool. * es/bugreport: bugreport: drop extraneous includes bugreport: add compiler info bugreport: add uname info bugreport: gather git version and build info bugreport: add tool to generate debugging info help: move list_config_help to builtin/help
| * | bugreport: drop extraneous includesEmily Shaffer2020-04-271-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the generic parts of the source files, system headers like <time.h> and <stdio.h> are supposed to be included indirectly by including "git-compat-util.h", which manages portability issues. Drop our explicit inclusions and rely on "cache.h", which includes "git-compat-util.h". Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | bugreport: add compiler infoEmily Shaffer2020-04-163-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To help pinpoint the source of a regression, it is useful to know some info about the compiler which the user's Git client was built with. By adding a generic get_compiler_info() in 'compat/' we can choose which relevant information to share per compiler; to get started, let's demonstrate the version of glibc if the user built with 'gcc'. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | bugreport: add uname infoEmily Shaffer2020-04-162-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The contents of uname() can give us some insight into what sort of system the user is running on, and help us replicate their setup if need be. The domainname field is not guaranteed to be available, so don't collect it. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | bugreport: gather git version and build infoEmily Shaffer2020-04-164-19/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Knowing which version of Git a user has and how it was built allows us to more precisely pin down the circumstances when a certain issue occurs, so teach bugreport how to tell us the same output as 'git version --build-options'. It's not ideal to directly call 'git version --build-options' because that output goes to stdout. Instead, wrap the version string in a helper within help.[ch] library, and call that helper from within the bugreport library. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | bugreport: add tool to generate debugging infoEmily Shaffer2020-04-168-0/+224
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach Git how to prompt the user for a good bug report: reproduction steps, expected behavior, and actual behavior. Later, Git can learn how to collect some diagnostic information from the repository. If users can send us a well-written bug report which contains diagnostic information we would otherwise need to ask the user for, we can reduce the number of question-and-answer round trips between the reporter and the Git contributor. Users may also wish to send a report like this to their local "Git expert" if they have put their repository into a state they are confused by. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | help: move list_config_help to builtin/helpEmily Shaffer2020-04-169-113/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting in 3ac68a93fd2, help.o began to depend on builtin/branch.o, builtin/clean.o, and builtin/config.o. This meant that help.o was unusable outside of the context of the main Git executable. To make help.o usable by other commands again, move list_config_help() into builtin/help.c (where it makes sense to assume other builtin libraries are present). When command-list.h is included but a member is not used, we start to hear a compiler warning. Since the config list is generated in a fairly different way than the command list, and since commands and config options are semantically different, move the config list into its own header and move the generator into its own script and build rule. For reasons explained in 976aaedc (msvc: add a Makefile target to pre-generate the Visual Studio solution, 2019-07-29), some build artifacts we consider non-source files cannot be generated in the Visual Studio environment, and we already have some Makefile tweaks to help Visual Studio to use generated command-list.h header file. Do the same to a new generated file, config-list.h, introduced by this change. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
* | | Merge branch 'en/rebase-root-and-fork-point-are-incompatible'Junio C Hamano2020-05-012-2/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Incompatible options "--root" and "--fork-point" of "git rebase" have been marked and documented as being incompatible. * en/rebase-root-and-fork-point-are-incompatible: rebase: display an error if --root and --fork-point are both provided
| * | | rebase: display an error if --root and --fork-point are both providedElijah Newren2020-04-272-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | --root implies we want to rebase all commits since the beginning of history. --fork-point means we want to use the reflog of the specified upstream to find the best common ancestor between <upstream> and <branch> and only rebase commits since that common ancestor. These options are clearly contradictory, so throw an error (instead of segfaulting on a NULL pointer) if both are specified. Reported-by: Alexander Berg <alexander.berg@atos.net> Documentation-by: Alban Gruin <alban.gruin@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'ds/build-homebrew-gettext-fix'Junio C Hamano2020-05-011-2/+11
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent update to Homebrew used by macOS folks breaks build by moving gettext library and necessary headers. * ds/build-homebrew-gettext-fix: macOS/brew: let the build find gettext headers/libraries/msgfmt
| * | | | macOS/brew: let the build find gettext headers/libraries/msgfmtJohannes Schindelin2020-04-271-2/+11
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Apparently a recent Homebrew update now installs `gettext` into the subdirectory /usr/local/opt/gettext/[lib/include]. Sometimes the ci job succeeds: brew link --force gettext Linking /usr/local/Cellar/gettext/0.20.1... 179 symlinks created And sometimes installing the package "gettext" with force-link fails: brew link --force gettext Warning: Refusing to link macOS provided/shadowed software: gettext If you need to have gettext first in your PATH run: echo 'export PATH="/usr/local/opt/gettext/bin:$PATH"' >> ~/.bash_profile (And the is not the final word either, since macOS itself says: The default interactive shell is now zsh.) Anyway, The latter requires CFLAGS to include /usr/local/opt/gettext/include and LDFLAGS to include /usr/local/opt/gettext/lib. Likewise, the `msgfmt` tool is no longer in the `PATH`. While it is unclear which change is responsible for this breakage (that most notably only occurs on CI build agents that updated very recently), https://github.com/Homebrew/homebrew-core/pull/53489 has fixed it. Nevertheless, let's work around this issue, as there are still quite a few build agents out there that need some help in this regard: we explicitly do not call `brew update` in our CI/PR builds anymore. Helped-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'dd/sparse-fixes'Junio C Hamano2020-05-018-15/+16
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compilation fix. * dd/sparse-fixes: progress.c: silence cgcc suggestion about internal linkage graph.c: limit linkage of internal variable compat/regex: move stdlib.h up in inclusion chain test-parse-pathspec-file.c: s/0/NULL/ for pointer type
| * | | | progress.c: silence cgcc suggestion about internal linkageĐoàn Trần Công Danh2020-04-273-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Reviewed-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | graph.c: limit linkage of internal variableĐoàn Trần Công Danh2020-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Reviewed-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | compat/regex: move stdlib.h up in inclusion chainĐoàn Trần Công Danh2020-04-272-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In Linux with musl libc, we have this inclusion chain: compat/regex/regex.c:69 `-> compat/regex/regex_internal.h `-> /usr/include/stdlib.h `-> /usr/include/features.h `-> /usr/include/alloca.h In that inclusion chain, `<features.h>` claims it's _BSD_SOURCE compatible when it's NOT asked to be either {_POSIX,_GNU,_XOPEN,_BSD}_SOURCE, or __STRICT_ANSI__. And, `<stdlib.h>` will include `<alloca.h>` to be compatible with software written for GNU and BSD. Thus, redefine `alloca` macro, which was defined before at compat/regex/regex.c:66. Considering this is only compat code, we've taken from other project, it's not our business to decide which source should we adhere to. Include `<stdlib.h>` early to prevent the redefinition of alloca. This also remove a potential warning about alloca not defined on: #undef alloca Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | test-parse-pathspec-file.c: s/0/NULL/ for pointer typeĐoàn Trần Công Danh2020-04-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Reviewed-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | | Merge branch 'mt/doc-worktree-ref'Junio C Hamano2020-05-011-5/+6
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Docfix. * mt/doc-worktree-ref: config doc: fix reference to config.worktree info
| * | | | | config doc: fix reference to config.worktree infoMatheus Tavares2020-04-241-5/+6
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 356aea6 ("doc: move extensions.worktreeConfig to the right place", 2018-11-14) moved the explanation of extension.worktreeConfig from config.txt to technical/repository-version.txt. However, the former still contains a reference to the removed paragraph. We could fix it referencing the gitrepository-layout man page, which contains the moved explanation. But the git-worktree man page has additional information and recommendations for the worktree config file, so let's reference it instead. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | | Merge branch 'eb/gitweb-more-trailers'Junio C Hamano2020-05-011-1/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gitweb updates. * eb/gitweb-more-trailers: gitweb: Recognize *-to and Closes/Fixes trailers
| * | | | | gitweb: Recognize *-to and Closes/Fixes trailersEmma Brooks2020-04-241-1/+1
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit trailers like "Thanks-to:", "Fixes:", and "Closes:" are fairly common, but gitweb didn't highlight them like other trailers. Signed-off-by: Emma Brooks <me@pluvano.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | | Merge branch 'ds/multi-pack-index'Junio C Hamano2020-05-012-5/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The multi-pack-index left mmapped file descriptors open when it does not have to. * ds/multi-pack-index: multi-pack-index: close file descriptor after mmap
| * | | | | multi-pack-index: close file descriptor after mmapDerrick Stolee2020-04-242-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The multi-pack-index subsystem was not closing its file descriptor after memory-mapping the file contents. After this mmap() succeeds, there is no need to keep the file descriptor open. In fact, there is signficant reason to close it so we do not run out of descriptors. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | | | Merge branch 'ds/blame-on-bloom'Junio C Hamano2020-05-019-23/+211
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "git blame" learns to take advantage of the "changed-paths" Bloom filter stored in the commit-graph file. * ds/blame-on-bloom: test-bloom: check that we have expected arguments test-bloom: fix some whitespace issues blame: drop unused parameter from maybe_changed_path blame: use changed-path Bloom filters tests: write commit-graph with Bloom filters revision: complicated pathspecs disable filters
| * | | | | | test-bloom: check that we have expected argumentsJeff King2020-04-231-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If "test-tool bloom" is not fed a command, or if arguments are missing for some commands, it will just segfault. Let's check argc and write a friendlier usage message. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | test-bloom: fix some whitespace issuesJeff King2020-04-231-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | blame: drop unused parameter from maybe_changed_pathJeff King2020-04-231-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't use the "parent" parameter at all (probably because the bloom filter for a commit is always defined against a single parent anyway). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | blame: use changed-path Bloom filtersDerrick Stolee2020-04-163-9/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The changed-path Bloom filters help reduce the amount of tree parsing required during history queries. Before calculating a diff, we can ask the filter if a path changed between a commit and its first parent. If the filter says "no" then we can move on without parsing trees. If the filter says "maybe" then we parse trees to discover if the answer is actually "yes" or "no". When computing a blame, there is a section in find_origin() that computes a diff between a commit and one of its parents. When this is the first parent, we can check the Bloom filters before calling diff_tree_oid(). In order to make this work with the blame machinery, we need to initialize a struct bloom_key with the initial path. But also, we need to add more keys to a list if a rename is detected. We then check to see if _any_ of these keys answer "maybe" in the diff. During development, I purposefully left out this "add a new key when a rename is detected" to see if the test suite would catch my error. That is how I discovered the issues with GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS from the previous change. With that change, we can feel some confidence in the coverage of this change. If a user requests copy detection using "git blame -C", then there are more places where the set of "important" files can expand. I do not know enough about how this happens in the blame machinery. Thus, the Bloom filter integration is explicitly disabled in this mode. A later change could expand the bloom_key data with an appropriate call (or calls) to add_bloom_key(). If we did not disable this mode, then the following tests would fail: t8003-blame-corner-cases.sh t8011-blame-split-file.sh Generally, this is a performance enhancement and should not change the behavior of 'git blame' in any way. If a repo has a commit-graph file with computed changed-path Bloom filters, then they should notice improved performance for their 'git blame' commands. Here are some example timings that I found by blaming some paths in the Linux kernel repository: git blame arch/x86/kernel/topology.c >/dev/null Before: 0.83s After: 0.24s git blame kernel/time/time.c >/dev/null Before: 0.72s After: 0.24s git blame tools/perf/ui/stdio/hist.c >/dev/null Before: 0.27s After: 0.11s I specifically looked for "deep" paths that were also edited many times. As a counterpoint, the MAINTAINERS file was edited many times but is located in the root tree. This means that the cost of computing a diff relative to the pathspec is very small. Here are the timings for that command: git blame MAINTAINERS >/dev/null Before: 20.1s After: 18.0s These timings are the best of five. The worst-case runs were on the order of 2.5 minutes for both cases. Note that the MAINTAINERS file has 18,740 lines across 17,000+ commits. This happens to be one of the cases where this change provides the least improvement. The lack of improvement for the MAINTAINERS file and the relatively modest improvement for the other examples can be easily explained. The blame machinery needs to compute line-level diffs to determine which lines were changed by each commit. That makes up a large proportion of the computation time, and this change does not attempt to improve on that section of the algorithm. The MAINTAINERS file is large and changed often, so it takes time to determine which lines were updated by which commit. In contrast, the code files are much smaller, and it takes longer to comute the line-by-line diff for a single patch on the Linux mailing lists. Outside of the "-C" integration, I believe there is little more to gain from the changed-path Bloom filters for 'git blame' after this patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | tests: write commit-graph with Bloom filtersDerrick Stolee2020-04-164-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GIT_TEST_COMMIT_GRAPH environment variable updates the commit- graph file whenever "git commit" is run, ensuring that we always have an updated commit-graph throughout the test suite. The GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS environment variable was introduced to write the changed-path Bloom filters whenever "git commit-graph write" is run. However, the GIT_TEST_COMMIT_GRAPH trick doesn't launch a separate process and instead writes it directly. To expand the number of tests that have commits in the commit-graph file, add a helper method that computes the commit-graph and place that helper inside "git commit" and "git merge". In the helper method, check GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS to ensure we are writing changed-path Bloom filters whenever possible. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | revision: complicated pathspecs disable filtersDerrick Stolee2020-04-161-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The changed-path Bloom filters work only when we can compute an explicit Bloom filter key in advance. When a pathspec is given that allows case-insensitive checks or wildcard matching, we must disable the Bloom filter performance checks. By checking the pathspec in prepare_to_use_bloom_filters(), we avoid setting up the Bloom filter data and thus revert to the usual logic. Before this change, the following tests would fail*: t6004-rev-list-path-optim.sh (Tests 6-7) t6130-pathspec-noglob.sh (Tests 3-6) t6131-pathspec-icase.sh (Tests 3-5) *These tests would fail when using GIT_TEST_COMMIT_GRAPH and GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS except that the latter environment variable was not set up correctly to write the changed- path Bloom filters in the test suite. That will be fixed in the next change. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | | | | Merge branch 'gs/commit-graph-path-filter'Junio C Hamano2020-05-0122-11/+1140
|\ \ \ \ \ \ \ | |/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce an extension to the commit-graph to make it efficient to check for the paths that were modified at each commit using Bloom filters. * gs/commit-graph-path-filter: bloom: ignore renames when computing changed paths commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag t4216: add end to end tests for git log with Bloom filters revision.c: add trace2 stats around Bloom filter usage revision.c: use Bloom filters to speed up path based revision walks commit-graph: add --changed-paths option to write subcommand commit-graph: reuse existing Bloom filters during write commit-graph: write Bloom filters to commit graph file commit-graph: examine commits by generation number commit-graph: examine changed-path objects in pack order commit-graph: compute Bloom filters for changed paths diff: halt tree-diff early after max_changes bloom.c: core Bloom filter implementation for changed paths. bloom.c: introduce core Bloom filter constructs bloom.c: add the murmur3 hash implementation commit-graph: define and use MAX_NUM_CHUNKS
| * | | | | | bloom: ignore renames when computing changed pathsDerrick Stolee2020-04-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The changed-path Bloom filters record an entry in the filter for every path that was changed. This includes every add and delete, regardless of whether a rename was detected. Detecting renames causes significant performance issues, but also will trigger downloading missing blobs in partial clone. The simple fix is to disable rename detection when computing a changed-path Bloom filter. This should already be disabled by default, but it is good to explicitly enforce the intended behavior. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flagGarima Singh2020-04-066-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag to the test setup suite in order to toggle writing Bloom filters when running any of the git tests. If set to true, we will compute and write Bloom filters every time a test calls `git commit-graph write`, as if the `--changed-paths` option was passed in. The test suite passes when GIT_TEST_COMMIT_GRAPH and GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS are enabled. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | t4216: add end to end tests for git log with Bloom filtersGarima Singh2020-04-062-0/+159
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These tests exercises writing commit graph with Bloom filters and exercises 'git log -- path' with all the applicable options. They check that the output is the same with and without Bloom filters, confirm Bloom filters were used by checking if trace2 statistics were logged correctly. Also confirms cases where Bloom filters are not used: 1. Multiple path specs, 2. --walk-reflogs (see patch titled 'revision.c: use Bloom filters...' for details, 3. If the latest commit graph does not have Bloom filters Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | revision.c: add trace2 stats around Bloom filter usageGarima Singh2020-04-061-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add trace2 statistics around Bloom filter usage and behavior for 'git log -- path' commands that are hoping to benefit from the presence of computed changed paths Bloom filters. These statistics are great for performance analysis work and for formal testing, which we will see in the commit following this one. Helped-by: Derrick Stolee <dstolee@microsoft.com Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | revision.c: use Bloom filters to speed up path based revision walksGarima Singh2020-04-064-2/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revision walk will now use Bloom filters for commits to speed up revision walks for a particular path (for computing history for that path), if they are present in the commit-graph file. We load the Bloom filters during the prepare_revision_walk step, currently only when dealing with a single pathspec. Extending it to work with multiple pathspecs can be explored and built on top of this series in the future. While comparing trees in rev_compare_trees(), if the Bloom filter says that the file is not different between the two trees, we don't need to compute the expensive diff. This is where we get our performance gains. The other response of the Bloom filter is '`:maybe', in which case we fall back to the full diff calculation to determine if the path was changed in the commit. We do not try to use Bloom filters when the '--walk-reflogs' option is specified. The '--walk-reflogs' option does not walk the commit ancestry chain like the rest of the options. Incorporating the performance gains when walking reflog entries would add more complexity, and can be explored in a later series. Performance Gains: We tested the performance of `git log -- <path>` on the git repo, the linux and some internal large repos, with a variety of paths of varying depths. On the git and linux repos: - we observed a 2x to 5x speed up. On a large internal repo with files seated 6-10 levels deep in the tree: - we observed 10x to 20x speed ups, with some paths going up to 28 times faster. Helped-by: Derrick Stolee <dstolee@microsoft.com Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | commit-graph: add --changed-paths option to write subcommandGarima Singh2020-04-062-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add --changed-paths option to git commit-graph write. This option will allow users to compute information about the paths that have changed between a commit and its first parent, and write it into the commit graph file. If the option is passed to the write subcommand we set the COMMIT_GRAPH_WRITE_BLOOM_FILTERS flag and pass it down to the commit-graph logic. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | commit-graph: reuse existing Bloom filters during writeGarima Singh2020-04-064-6/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add logic to a) parse Bloom filter information from the commit graph file and, b) re-use existing Bloom filters. See Documentation/technical/commit-graph-format for the format in which the Bloom filter information is written to the commit graph file. To read Bloom filter for a given commit with lexicographic position 'i' we need to: 1. Read BIDX[i] which essentially gives us the starting index in BDAT for filter of commit i+1. It is essentially the index past the end of the filter of commit i. It is called end_index in the code. 2. For i>0, read BIDX[i-1] which will give us the starting index in BDAT for filter of commit i. It is called the start_index in the code. For the first commit, where i = 0, Bloom filter data starts at the beginning, just past the header in the BDAT chunk. Hence, start_index will be 0. 3. The length of the filter will be end_index - start_index, because BIDX[i] gives the cumulative 8-byte words including the ith commit's filter. We toggle whether Bloom filters should be recomputed based on the compute_if_not_present flag. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | commit-graph: write Bloom filters to commit graph fileGarima Singh2020-04-063-1/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the technical documentation for commit-graph-format with the formats for the Bloom filter index (BIDX) and Bloom filter data (BDAT) chunks. Write the computed Bloom filters information to the commit graph file using this format. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | | | commit-graph: examine commits by generation numberGarima Singh2020-03-301-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running 'git commit-graph write --changed-paths', we sort the commits by pack-order to save time when computing the changed-paths bloom filters. This does not help when finding the commits via the '--reachable' flag. If not using pack-order, then sort by generation number before examining the diff. Commits with similar generation are more likely to have many trees in common, making the diff faster. On the Linux kernel repository, this change reduced the computation time for 'git commit-graph write --reachable --changed-paths' from 3m00s to 1m37s. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>