diff options
author | Stefan Beller <sbeller@google.com> | 2017-06-30 13:53:07 -0700 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2017-06-30 13:59:42 -0700 |
commit | 2e2d5ac184de8facde4e14cec8b4e2a154480ed8 (patch) | |
tree | 7000d0063d6eacf35fab609487416284da838b05 /diff.h | |
parent | e6e045f80314a4f37745676cbb92c8271ad07815 (diff) | |
download | git-2e2d5ac184de8facde4e14cec8b4e2a154480ed8.tar.gz |
diff.c: color moved lines differently
When a patch consists mostly of moving blocks of code around, it can
be quite tedious to ensure that the blocks are moved verbatim, and not
undesirably modified in the move. To that end, color blocks that are
moved within the same patch differently. For example (OM, del, add,
and NM are different colors):
[OM] -void sensitive_stuff(void)
[OM] -{
[OM] - if (!is_authorized_user())
[OM] - die("unauthorized");
[OM] - sensitive_stuff(spanning,
[OM] - multiple,
[OM] - lines);
[OM] -}
void another_function()
{
[del] - printf("foo");
[add] + printf("bar");
}
[NM] +void sensitive_stuff(void)
[NM] +{
[NM] + if (!is_authorized_user())
[NM] + die("unauthorized");
[NM] + sensitive_stuff(spanning,
[NM] + multiple,
[NM] + lines);
[NM] +}
However adjacent blocks may be problematic. For example, in this
potentially malicious patch, the swapping of blocks can be spotted:
[OM] -void sensitive_stuff(void)
[OM] -{
[OMA] - if (!is_authorized_user())
[OMA] - die("unauthorized");
[OM] - sensitive_stuff(spanning,
[OM] - multiple,
[OM] - lines);
[OMA] -}
void another_function()
{
[del] - printf("foo");
[add] + printf("bar");
}
[NM] +void sensitive_stuff(void)
[NM] +{
[NMA] + sensitive_stuff(spanning,
[NMA] + multiple,
[NMA] + lines);
[NM] + if (!is_authorized_user())
[NM] + die("unauthorized");
[NMA] +}
If the moved code is larger, it is easier to hide some permutation in the
code, which is why some alternative coloring is needed.
This patch implements the first mode:
* basic alternating 'Zebra' mode
This conveys all information needed to the user. Defer customization to
later patches.
First I implemented an alternative design, which would try to fingerprint
a line by its neighbors to detect if we are in a block or at the boundary.
This idea iss error prone as it inspected each line and its neighboring
lines to determine if the line was (a) moved and (b) if was deep inside
a hunk by having matching neighboring lines. This is unreliable as the
we can construct hunks which have equal neighbors that just exceed the
number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter
as a line, that is permutated to AXYZCXYZBXYZD..').
Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then has several modes on highlighting bounds.
A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.
It is useful to have moved lines colored, but there are annoying corner
cases, such as a single line moved, that is very common. For example
in a typical patch of C code, we have closing braces that end statement
blocks or functions.
While it is technically true that these lines are moved as they show up
elsewhere, it is harmful for the review as the reviewers attention is
drawn to such a minor side annoyance.
For now let's have a simple solution of hardcoding the number of
moved lines to be at least 3 before coloring them. Note, that the
length is applied across all blocks to find the 'lonely' blocks
that pollute new code, but do not interfere with a permutated
block where each permutation has less lines than 3.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'diff.h')
-rw-r--r-- | diff.h | 12 |
1 files changed, 11 insertions, 1 deletions
@@ -188,6 +188,12 @@ struct diff_options { int diff_path_counter; struct emitted_diff_symbols *emitted_symbols; + enum { + COLOR_MOVED_NO = 0, + COLOR_MOVED_ZEBRA = 2, + } color_moved; + #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA + #define COLOR_MOVED_MIN_BLOCK_LENGTH 3 }; void diff_emit_submodule_del(struct diff_options *o, const char *line); @@ -208,7 +214,11 @@ enum color_diff { DIFF_FILE_NEW = 5, DIFF_COMMIT = 6, DIFF_WHITESPACE = 7, - DIFF_FUNCINFO = 8 + DIFF_FUNCINFO = 8, + DIFF_FILE_OLD_MOVED = 9, + DIFF_FILE_OLD_MOVED_ALT = 10, + DIFF_FILE_NEW_MOVED = 11, + DIFF_FILE_NEW_MOVED_ALT = 12 }; const char *diff_get_color(int diff_use_color, enum color_diff ix); #define diff_get_color_opt(o, ix) \ |