From 83c094ad0dd2104adbbec034f802dceb1d052981 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?= Date: Sun, 8 Mar 2015 17:12:33 +0700 Subject: untracked cache: save to an index extension MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Helped-by: Stefan Beller Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- Documentation/technical/index-format.txt | 58 ++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) (limited to 'Documentation') diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index 35112e4966..db59a13600 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -233,3 +233,61 @@ Git index format The remaining index entries after replaced ones will be added to the final index. These added entries are also sorted by entry name then stage. + +== Untracked cache + + Untracked cache saves the untracked file list and necessary data to + verify the cache. The signature for this extension is { 'U', 'N', + 'T', 'R' }. + + The extension starts with + + - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from + ctime field until "file size". + + - Stat data of core.excludesfile + + - 32-bit dir_flags (see struct dir_struct) + + - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file + does not exist. + + - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does + not exist. + + - NUL-terminated string of per-dir exclude file name. This usually + is ".gitignore". + + - The number of following directory blocks, variable width + encoding. If this number is zero, the extension ends here with a + following NUL. + + - A number of directory blocks in depth-first-search order, each + consists of + + - The number of untracked entries, variable width encoding. + + - The number of sub-directory blocks, variable width encoding. + + - The directory name terminated by NUL. + + - A number of untrached file/dir names terminated by NUL. + +The remaining data of each directory block is grouped by type: + + - An ewah bitmap, the n-th bit marks whether the n-th directory has + valid untracked cache entries. + + - An ewah bitmap, the n-th bit records "check-only" bit of + read_directory_recursive() for the n-th directory. + + - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data + is valid for the n-th directory and exists in the next data. + + - An array of stat data. The n-th data corresponds with the n-th + "one" bit in the previous ewah bitmap. + + - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit + in the previous ewah bitmap. + + - One NUL. -- cgit v1.2.1 From 9e5972413b4873dc143c4046c6e74eb608ace32b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?= Date: Sun, 8 Mar 2015 17:12:42 +0700 Subject: update-index: manually enable or disable untracked cache MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Overall time saving on "git status" is about 40% in the best case scenario, removing ..collect_untracked() as the most time consuming function. read and refresh index operations are now at the top (which should drop when index-helper and/or watchman support is added). More numbers and analysis below. webkit.git ========== 169k files. 6k dirs. Lots of test data (i.e. not touched most of the time) Base status ----------- Index version 4 in split index mode and cache-tree populated. No untracked cache. It shows how time is consumed by "git status". The same settings are used for other repos below. 18:28:10.199679 builtin/commit.c:1394 performance: 0.000000451 s: cmd_status:setup 18:28:10.474847 read-cache.c:1407 performance: 0.274873831 s: read_index 18:28:10.475295 read-cache.c:1407 performance: 0.000000656 s: read_index 18:28:10.728443 preload-index.c:131 performance: 0.253147487 s: read_index_preload 18:28:10.741422 read-cache.c:1254 performance: 0.012868340 s: refresh_index 18:28:10.752300 wt-status.c:623 performance: 0.010421357 s: wt_status_collect_changes_worktree 18:28:10.762069 wt-status.c:629 performance: 0.009644748 s: wt_status_collect_changes_index 18:28:11.601019 wt-status.c:632 performance: 0.838859547 s: wt_status_collect_untracked 18:28:11.605939 builtin/commit.c:1421 performance: 0.004835004 s: cmd_status:update_index 18:28:11.606580 trace.c:415 performance: 1.407878388 s: git command: 'git' 'status' Populating status ----------------- This is after enabling untracked cache and the cache is still empty. We see a slight increase in .._collect_untracked() and update_index (because new cache has to be written to $GIT_DIR/index). 18:28:18.915213 builtin/commit.c:1394 performance: 0.000000326 s: cmd_status:setup 18:28:19.197364 read-cache.c:1407 performance: 0.281901416 s: read_index 18:28:19.197754 read-cache.c:1407 performance: 0.000000546 s: read_index 18:28:19.451355 preload-index.c:131 performance: 0.253599607 s: read_index_preload 18:28:19.464400 read-cache.c:1254 performance: 0.012935336 s: refresh_index 18:28:19.475115 wt-status.c:623 performance: 0.010236920 s: wt_status_collect_changes_worktree 18:28:19.486022 wt-status.c:629 performance: 0.010801685 s: wt_status_collect_changes_index 18:28:20.362660 wt-status.c:632 performance: 0.876551366 s: wt_status_collect_untracked 18:28:20.396199 builtin/commit.c:1421 performance: 0.033447969 s: cmd_status:update_index 18:28:20.396939 trace.c:415 performance: 1.482695902 s: git command: 'git' 'status' Populated status ---------------- After the cache is populated, wt_status_collect_untracked() drops 82% from 0.838s to 0.144s. Overall time drops 45%. Top offenders are now read_index() and read_index_preload(). 18:28:20.408605 builtin/commit.c:1394 performance: 0.000000457 s: cmd_status:setup 18:28:20.692864 read-cache.c:1407 performance: 0.283980458 s: read_index 18:28:20.693273 read-cache.c:1407 performance: 0.000000661 s: read_index 18:28:20.958814 preload-index.c:131 performance: 0.265540254 s: read_index_preload 18:28:20.972375 read-cache.c:1254 performance: 0.013437429 s: refresh_index 18:28:20.983959 wt-status.c:623 performance: 0.011146646 s: wt_status_collect_changes_worktree 18:28:20.993948 wt-status.c:629 performance: 0.009879094 s: wt_status_collect_changes_index 18:28:21.138125 wt-status.c:632 performance: 0.144084737 s: wt_status_collect_untracked 18:28:21.173678 builtin/commit.c:1421 performance: 0.035463949 s: cmd_status:update_index 18:28:21.174251 trace.c:415 performance: 0.766707355 s: git command: 'git' 'status' gentoo-x86.git ============== This repository is a strange one with a balanced, wide and shallow worktree (about 100k files and 23k dirs) and no .gitignore in worktree. .._collect_untracked() time drops 88%, total time drops 56%. Base status ----------- 18:20:40.828642 builtin/commit.c:1394 performance: 0.000000496 s: cmd_status:setup 18:20:41.027233 read-cache.c:1407 performance: 0.198130532 s: read_index 18:20:41.027670 read-cache.c:1407 performance: 0.000000581 s: read_index 18:20:41.171716 preload-index.c:131 performance: 0.144045594 s: read_index_preload 18:20:41.179171 read-cache.c:1254 performance: 0.007320424 s: refresh_index 18:20:41.185785 wt-status.c:623 performance: 0.006144638 s: wt_status_collect_changes_worktree 18:20:41.192701 wt-status.c:629 performance: 0.006780184 s: wt_status_collect_changes_index 18:20:41.991723 wt-status.c:632 performance: 0.798927029 s: wt_status_collect_untracked 18:20:41.994664 builtin/commit.c:1421 performance: 0.002852772 s: cmd_status:update_index 18:20:41.995458 trace.c:415 performance: 1.168427502 s: git command: 'git' 'status' Populating status ----------------- 18:20:48.968848 builtin/commit.c:1394 performance: 0.000000380 s: cmd_status:setup 18:20:49.172918 read-cache.c:1407 performance: 0.203734214 s: read_index 18:20:49.173341 read-cache.c:1407 performance: 0.000000562 s: read_index 18:20:49.320013 preload-index.c:131 performance: 0.146671391 s: read_index_preload 18:20:49.328039 read-cache.c:1254 performance: 0.007921957 s: refresh_index 18:20:49.334680 wt-status.c:623 performance: 0.006172020 s: wt_status_collect_changes_worktree 18:20:49.342526 wt-status.c:629 performance: 0.007731746 s: wt_status_collect_changes_index 18:20:50.257510 wt-status.c:632 performance: 0.914864222 s: wt_status_collect_untracked 18:20:50.338371 builtin/commit.c:1421 performance: 0.080776477 s: cmd_status:update_index 18:20:50.338900 trace.c:415 performance: 1.371462446 s: git command: 'git' 'status' Populated status ---------------- 18:20:50.351160 builtin/commit.c:1394 performance: 0.000000571 s: cmd_status:setup 18:20:50.577358 read-cache.c:1407 performance: 0.225917338 s: read_index 18:20:50.577794 read-cache.c:1407 performance: 0.000000617 s: read_index 18:20:50.734140 preload-index.c:131 performance: 0.156345564 s: read_index_preload 18:20:50.745717 read-cache.c:1254 performance: 0.011463075 s: refresh_index 18:20:50.755176 wt-status.c:623 performance: 0.008877929 s: wt_status_collect_changes_worktree 18:20:50.763768 wt-status.c:629 performance: 0.008471633 s: wt_status_collect_changes_index 18:20:50.854885 wt-status.c:632 performance: 0.090988721 s: wt_status_collect_untracked 18:20:50.857765 builtin/commit.c:1421 performance: 0.002789097 s: cmd_status:update_index 18:20:50.858411 trace.c:415 performance: 0.508647673 s: git command: 'git' 'status' linux-2.6 ========= Reference repo. Not too big. .._collect_status() drops 84%. Total time drops 42%. Base status ----------- 18:34:09.870122 builtin/commit.c:1394 performance: 0.000000385 s: cmd_status:setup 18:34:09.943218 read-cache.c:1407 performance: 0.072871177 s: read_index 18:34:09.943614 read-cache.c:1407 performance: 0.000000491 s: read_index 18:34:10.004364 preload-index.c:131 performance: 0.060748102 s: read_index_preload 18:34:10.008190 read-cache.c:1254 performance: 0.003714285 s: refresh_index 18:34:10.012087 wt-status.c:623 performance: 0.002775446 s: wt_status_collect_changes_worktree 18:34:10.016054 wt-status.c:629 performance: 0.003862140 s: wt_status_collect_changes_index 18:34:10.214747 wt-status.c:632 performance: 0.198604837 s: wt_status_collect_untracked 18:34:10.216102 builtin/commit.c:1421 performance: 0.001244166 s: cmd_status:update_index 18:34:10.216817 trace.c:415 performance: 0.347670735 s: git command: 'git' 'status' Populating status ----------------- 18:34:16.595102 builtin/commit.c:1394 performance: 0.000000456 s: cmd_status:setup 18:34:16.666600 read-cache.c:1407 performance: 0.070992413 s: read_index 18:34:16.667012 read-cache.c:1407 performance: 0.000000606 s: read_index 18:34:16.729375 preload-index.c:131 performance: 0.062362492 s: read_index_preload 18:34:16.732565 read-cache.c:1254 performance: 0.003075517 s: refresh_index 18:34:16.736148 wt-status.c:623 performance: 0.002422201 s: wt_status_collect_changes_worktree 18:34:16.739990 wt-status.c:629 performance: 0.003746618 s: wt_status_collect_changes_index 18:34:16.948505 wt-status.c:632 performance: 0.208426710 s: wt_status_collect_untracked 18:34:16.961744 builtin/commit.c:1421 performance: 0.013151887 s: cmd_status:update_index 18:34:16.962233 trace.c:415 performance: 0.368537535 s: git command: 'git' 'status' Populated status ---------------- 18:34:16.970026 builtin/commit.c:1394 performance: 0.000000631 s: cmd_status:setup 18:34:17.046235 read-cache.c:1407 performance: 0.075904673 s: read_index 18:34:17.046644 read-cache.c:1407 performance: 0.000000681 s: read_index 18:34:17.113564 preload-index.c:131 performance: 0.066920253 s: read_index_preload 18:34:17.117281 read-cache.c:1254 performance: 0.003604055 s: refresh_index 18:34:17.121115 wt-status.c:623 performance: 0.002508345 s: wt_status_collect_changes_worktree 18:34:17.125089 wt-status.c:629 performance: 0.003871636 s: wt_status_collect_changes_index 18:34:17.156089 wt-status.c:632 performance: 0.030895703 s: wt_status_collect_untracked 18:34:17.169861 builtin/commit.c:1421 performance: 0.013686404 s: cmd_status:update_index 18:34:17.170391 trace.c:415 performance: 0.201474531 s: git command: 'git' 'status' Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- Documentation/git-update-index.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation') diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt index aff01798cd..6bc296787e 100644 --- a/Documentation/git-update-index.txt +++ b/Documentation/git-update-index.txt @@ -170,6 +170,14 @@ may not support it yet. the shared index file. This mode is designed for very large indexes that take a significant amount of time to read or write. +--untracked-cache:: +--no-untracked-cache:: + Enable or disable untracked cache extension. This could speed + up for commands that involve determining untracked files such + as `git status`. The underlying operating system and file + system must change `st_mtime` field of a directory if files + are added or deleted in that directory. + \--:: Do not interpret any more arguments as options. -- cgit v1.2.1 From f64cb88d3521b64b2db9353d14148328063745dc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?= Date: Sun, 8 Mar 2015 17:12:43 +0700 Subject: update-index: test the system before enabling untracked cache MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Helped-by: Eric Sunshine Helped-by: Junio C Hamano Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- Documentation/git-update-index.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt index 6bc296787e..1a296bc29a 100644 --- a/Documentation/git-update-index.txt +++ b/Documentation/git-update-index.txt @@ -178,6 +178,12 @@ may not support it yet. system must change `st_mtime` field of a directory if files are added or deleted in that directory. +--force-untracked-cache:: + For safety, `--untracked-cache` performs tests on the working + directory to make sure untracked cache can be used. These + tests can take a few seconds. `--force-untracked-cache` can be + used to skip the tests. + \--:: Do not interpret any more arguments as options. -- cgit v1.2.1 From 1e8fef609e78110e276df633c5ba1fb1f1589fa5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?= Date: Sun, 8 Mar 2015 17:12:46 +0700 Subject: untracked cache: guard and disable on system changes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If the user enables untracked cache, then - move worktree to an unsupported filesystem - or simply upgrade OS - or move the whole (portable) disk from one machine to another - or access a shared fs from another machine there's no guarantee that untracked cache can still function properly. Record the worktree location and OS footprint in the cache. If it changes, err on the safe side and disable the cache. The user can 'update-index --untracked-cache' again to make sure all conditions are met. This adds a new requirement that setup_git_directory* must be called before read_cache() because we need worktree location by then, or the cache is dropped. This change does not cover all bases, you can fool it if you try hard. The point is to stop accidents. Helped-by: Eric Sunshine Helped-by: brian m. carlson Helped-by: Torsten Bögershausen Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- Documentation/technical/index-format.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation') diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index db59a13600..b7093af8b2 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -242,6 +242,10 @@ Git index format The extension starts with + - A sequence of NUL-terminated strings, preceded by the size of the + sequence in variable width encoding. Each string describes the + environment where the cache can be used. + - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from ctime field until "file size". -- cgit v1.2.1 From aeb6f8b3a2bbfd8b48a967139fbf4581e5345182 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?= Date: Sun, 8 Mar 2015 17:12:47 +0700 Subject: git-status.txt: advertisement for untracked cache MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When a good user sees the "too long, consider -uno" advice when running `git status`, they should check out the man page to find out more. This change suggests they try untracked cache before -uno. Helped-by: brian m. carlson Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- Documentation/git-status.txt | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt index 4d8d530d35..56573bd86e 100644 --- a/Documentation/git-status.txt +++ b/Documentation/git-status.txt @@ -58,7 +58,10 @@ When `-u` option is not used, untracked files and directories are shown (i.e. the same as specifying `normal`), to help you avoid forgetting to add newly created files. Because it takes extra work to find untracked files in the filesystem, this mode may take some -time in a large working tree. You can use `no` to have `git status` +time in a large working tree. +Consider enabling untracked cache and split index if supported (see +`git update-index --untracked-cache` and `git update-index +--split-index`), Otherwise you can use `no` to have `git status` return more quickly without showing untracked files. + The default can be changed using the status.showUntrackedFiles -- cgit v1.2.1