diff options
author | SZEDER Gábor <szeder.dev@gmail.com> | 2018-04-17 00:41:13 +0200 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2018-04-17 12:49:36 +0900 |
commit | c1bc0a0e927265c38aa6e4cf1c9b9e7866225b68 (patch) | |
tree | dfc0b7b270102121842aa3ae33abae10fe72369a /contrib | |
parent | 9703797c9d6b4d1831bf32bc0ab721a9411c716a (diff) | |
download | git-c1bc0a0e927265c38aa6e4cf1c9b9e7866225b68.tar.gz |
completion: remove repeated dirnames with 'awk' during path completion
During git-aware path completion, after all the trailing path
components have been removed from the output of 'git ls-files' and
'git diff-index' (see previous patch), each directory name is repeated
as many times as the number of listed paths it contains. This can be
a lot of repetitions, especially when invoking path completion close
to the root of a big worktree, which would cause a considerable
overhead downstream of __git_index_files(), in particular in the shell
loop that fills the COMPREPLY array. To reduce this overhead,
__git_index_files() runs the classic '... |sort |uniq' pattern to
remove those repetitions from the function's output.
While removing repeated directory names is effective in reducing the
number of iterations in that shell loop, it still imposes the overhead
of fork()+exec()ing two external processes, and two additional stages
in the pipeline, where potentially relatively large amount of data can
be passed between two subsequent pipeline stages.
Extend __git_index_files()'s 'awk' script to remove repeated path
components by first creating and filling an associative array indexed
by all encountered path components (after the trailing path components
have been removed), and then iterating over this array and printing
the indices, i.e. unique path components. This way we can remove the
'|sort |uniq' pipeline stages, and their eliminated overhead results
in faster path completion.
Listing all tracked files (12) and directories (23) at the top of the
worktree in linux.git (over 62k files), i.e. what's doing all the hard
work behind 'git rm <TAB>':
Before this patch, best of five, using GNU awk on Linux:
real 0m0.069s
user 0m0.089s
sys 0m0.026s
After:
real 0m0.052s
user 0m0.072s
sys 0m0.014s
Difference: -24.6%
Note that this changes order of elements in __git_index_files()'s
output. This is not an issue, because this function was only ever
intended to feed paths into the COMPREPLY array, and Bash will sort
its elements (according to the users locale) anyway.
Note also that using 'awk' to remove repeated path components is also
beneficial for the performance of the next two patches:
- The first will extend this 'awk' script to dequote quoted paths in
the output of 'git ls-files' and 'git diff-index'. With this
patch it will only have to dequote unique path components, not
all.
- The second will, among other things, extend this 'awk' script to
prepend prefix path components from the command line to the
currently completed path component. Consequently, each line in
'awk's output will grow longer. Without this patch that '|sort
|uniq' would have to exchange and process that much more data.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'contrib')
-rw-r--r-- | contrib/completion/git-completion.bash | 8 |
1 files changed, 6 insertions, 2 deletions
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index a71db47d09..1cd019d2e9 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -458,8 +458,12 @@ __git_index_files () __git_ls_files_helper "$root" "$1" "$match" | awk -F / '{ - print $1 - }' | sort | uniq + paths[$1] = 1 + } + END { + for (p in paths) + print p + }' } # __git_complete_index_file requires 1 argument: |