diff options
author | Elijah Newren <newren@gmail.com> | 2019-12-19 21:28:24 +0000 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2019-12-19 13:45:47 -0800 |
commit | b9670c1f5e6b98837c489a03ac0d343d30e08505 (patch) | |
tree | 070525c94163f410143e34d99fdebc4057e789d4 /dir.c | |
parent | c5c4eddd56ccd2ffd6a193856a660573993e9a41 (diff) | |
download | git-b9670c1f5e6b98837c489a03ac0d343d30e08505.tar.gz |
dir: fix checks on common prefix directory
Many years ago, the directory traversing logic had an optimization that
would always recurse into any directory that was a common prefix of all
the pathspecs without walking the leading directories to get down to
the desired directory. Thus,
git ls-files -o .git/ # case A
would notice that .git/ was a common prefix of all pathspecs (since
it is the only pathspec listed), and then traverse into it and start
showing unknown files under that directory. Unfortunately, .git/ is not
a directory we should be traversing into, which made this optimization
problematic. This also affected cases like
git ls-files -o --exclude-standard t/ # case B
where t/ was in the .gitignore file and thus isn't interesting and
shouldn't be recursed into. It also affected cases like
git ls-files -o --directory untracked_dir/ # case C
where untracked_dir/ is indeed untracked and thus interesting, but the
--directory flag means we only want to show the directory itself, not
recurse into it and start listing untracked files below it.
The case B class of bugs were noted and fixed in commits 16e2cfa90993
("read_directory(): further split treat_path()", 2010-01-08) and
48ffef966c76 ("ls-files: fix overeager pathspec optimization",
2010-01-08), with the idea being that we first wanted to check whether
the common prefix was interesting. The former patch noted that
treat_path() couldn't be used when checking the common prefix because
treat_path() requires a dir_entry() and we haven't read any directories
at the point we are checking the common prefix. So, that patch split
treat_one_path() out of treat_path(). The latter patch then created a
new treat_leading_path() which duplicated by hand the bits of
treat_path() that couldn't be broken out and then called
treat_one_path() for the remainder. There were three problems with this
approach:
* The duplicated logic in treat_leading_path() accidentally missed the
check for special paths (such as is_dot_or_dotdot and matching
".git"), causing case A types of bugs to continue to be an issue.
* The treat_leading_path() logic assumed we should traverse into
anything where path_treatment was not path_none, i.e. it perpetuated
class C types of bugs.
* It meant we had split logic that needed to kept in sync, running the
risk that people introduced new inconsistencies (such as in commit
be8a84c52669, which we reverted earlier in this series, or in commit
df5bcdf83ae which we'll fix in a subsequent commit)
Fix most these problems by making treat_leading_path() not only loop
over each leading path component, but calling treat_path() directly on
each. To do so, we have to create a synthetic dir_entry, but that only
takes a few lines. Then, pay attention to the path_treatment result we
get from treat_path() and don't treat path_excluded, path_untracked, and
path_recurse all the same as path_recurse.
This leaves one remaining problem, the new inconsistency from commit
df5bcdf83ae. That will be addressed in a subsequent commit.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'dir.c')
-rw-r--r-- | dir.c | 67 |
1 files changed, 56 insertions, 11 deletions
@@ -2102,37 +2102,82 @@ static int treat_leading_path(struct dir_struct *dir, const struct pathspec *pathspec) { struct strbuf sb = STRBUF_INIT; - int baselen, rc = 0; + int prevlen, baselen; const char *cp; + struct cached_dir cdir; + struct dirent *de; + enum path_treatment state = path_none; + + /* + * For each directory component of path, we are going to check whether + * that path is relevant given the pathspec. For example, if path is + * foo/bar/baz/ + * then we will ask treat_path() whether we should go into foo, then + * whether we should go into bar, then whether baz is relevant. + * Checking each is important because e.g. if path is + * .git/info/ + * then we need to check .git to know we shouldn't traverse it. + * If the return from treat_path() is: + * * path_none, for any path, we return false. + * * path_recurse, for all path components, we return true + * * <anything else> for some intermediate component, we make sure + * to add that path to the relevant list but return false + * signifying that we shouldn't recurse into it. + */ while (len && path[len - 1] == '/') len--; if (!len) return 1; + + /* + * We need a manufactured dirent with sufficient space to store a + * leading directory component of path in its d_name. Here, we + * assume that the dirent's d_name is either declared as + * char d_name[BIG_ENOUGH] + * or that it is declared at the end of the struct as + * char d_name[] + * For either case, padding with len+1 bytes at the end will ensure + * sufficient storage space. + */ + de = xcalloc(1, sizeof(struct dirent)+len+1); + memset(&cdir, 0, sizeof(cdir)); + cdir.de = de; +#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT) + de->d_type = DT_DIR; +#endif baselen = 0; + prevlen = 0; while (1) { - cp = path + baselen + !!baselen; + prevlen = baselen + !!baselen; + cp = path + prevlen; cp = memchr(cp, '/', path + len - cp); if (!cp) baselen = len; else baselen = cp - path; - strbuf_setlen(&sb, 0); + strbuf_reset(&sb); strbuf_add(&sb, path, baselen); if (!is_directory(sb.buf)) break; - if (simplify_away(sb.buf, sb.len, pathspec)) - break; - if (treat_one_path(dir, NULL, istate, &sb, baselen, pathspec, - DT_DIR, NULL) == path_none) + strbuf_reset(&sb); + strbuf_add(&sb, path, prevlen); + memcpy(de->d_name, path+prevlen, baselen-prevlen); + de->d_name[baselen-prevlen] = '\0'; + state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, + pathspec); + if (state != path_recurse) break; /* do not recurse into it */ - if (len <= baselen) { - rc = 1; + if (len <= baselen) break; /* finished checking */ - } } + add_path_to_appropriate_result_list(dir, NULL, &cdir, istate, + &sb, baselen, pathspec, + state); + + free(de); strbuf_release(&sb); - return rc; + return state == path_recurse; } static const char *get_ident_string(void) |