From 712d2c7dd893212756c21787fc12d6f71327e167 Mon Sep 17 00:00:00 2001 From: Johan Herland Date: Fri, 29 Apr 2011 11:36:20 +0200 Subject: Allow specifying --dirstat cut-off percentage as a floating point number Only the first digit after the decimal point is kept, as the dirstat calculations all happen in permille. Selftests verifying floating-point percentage input has been added. Improved-by: Junio C Hamano Improved-by: Linus Torvalds Signed-off-by: Johan Herland Signed-off-by: Junio C Hamano --- diff.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'diff.h') diff --git a/diff.h b/diff.h index 6fe1597785..90f491e9e7 100644 --- a/diff.h +++ b/diff.h @@ -114,7 +114,7 @@ struct diff_options { int needed_rename_limit; int degraded_cc_to_c; int show_rename_progress; - int dirstat_percent; + int dirstat_permille; int setup; int abbrev; const char *prefix; -- cgit v1.2.1 From 1c57a627bf269f3c83c48ad724cd8b14292502ef Mon Sep 17 00:00:00 2001 From: Johan Herland Date: Fri, 29 Apr 2011 11:36:21 +0200 Subject: New --dirstat=lines mode, doing dirstat analysis based on diffstat This patch adds an alternative implementation of show_dirstat(), called show_dirstat_by_line(), which uses the more expensive diffstat analysis (as opposed to show_dirstat()'s own (relatively inexpensive) analysis) to derive the numbers from which the --dirstat output is computed. The alternative implementation is controlled by the new "lines" parameter to the --dirstat option (or the diff.dirstat config variable). For binary files, the diffstat analysis counts bytes instead of lines, so to prevent binary files from dominating the dirstat results, the byte counts for binary files are divided by 64 before being compared to their textual/line-based counterparts. This is a stupid and ugly - but very cheap - heuristic. In linux-2.6.git, running the three different --dirstat modes: time git diff v2.6.20..v2.6.30 --dirstat=changes > /dev/null vs. time git diff v2.6.20..v2.6.30 --dirstat=lines > /dev/null vs. time git diff v2.6.20..v2.6.30 --dirstat=files > /dev/null yields the following average runtimes on my machine: - "changes" (default): ~6.0 s - "lines": ~9.6 s - "files": ~0.1 s So, as expected, there's a considerable performance hit (~60%) by going through the full diffstat analysis as compared to the default "changes" analysis (obviously, "files" is much faster than both). As such, the "lines" mode is probably only useful if you really need the --dirstat numbers to be consistent with the numbers returned from the other --*stat options. The patch also includes documentation and tests for the new dirstat mode. Improved-by: Junio C Hamano Signed-off-by: Johan Herland Signed-off-by: Junio C Hamano --- diff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'diff.h') diff --git a/diff.h b/diff.h index 90f491e9e7..937a903dce 100644 --- a/diff.h +++ b/diff.h @@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data) #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25) #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26) #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27) +#define DIFF_OPT_DIRSTAT_BY_LINE (1 << 28) #define DIFF_OPT_TST(opts, flag) ((opts)->flags & DIFF_OPT_##flag) #define DIFF_OPT_SET(opts, flag) ((opts)->flags |= DIFF_OPT_##flag) -- cgit v1.2.1