| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change introduces a new API function
`git_graph_reachable_from_any()`, that answers the question whether a
commit is reachable from any of the provided commits through following
parent edges.
This function can take advantage of optimizations provided by the
existence of a `commit-graph` file, since it makes it faster to know
whether, given two commits X and Y, X cannot possibly be an reachable
from Y.
Part of: #5757
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change makes calculations of merge-bases a bit faster when there
are complex graphs and the commit times cause visiting nodes multiple
times. This is done by visiting the nodes in the graph in reverse
generation order when the generation number is available instead of
commit timestamp. If the generation number is missing in any pair of
commits, it can safely fall back to the old heuristic with no negative
side-effects.
Part of: #5757
|
| |
|
|
|
|
| |
insert_head_ids can fail due to allocation error
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When compiling libgit2 with -DDEPRECATE_HARD, we add a preprocessor
definition `GIT_DEPRECATE_HARD` which causes the "git2/deprecated.h"
header to be empty. As a result, no function declarations are made
available to callers, but the implementations are still available to
link against. This has the problem that function declarations also
aren't visible to the implementations, meaning that the symbol's
visibility will not be set up correctly. As a result, the resulting
library may not expose those deprecated symbols at all on some platforms
and thus cause linking errors.
Fix the issue by conditionally compiling deprecated functions, only.
While it becomes impossible to link against such a library in case one
uses deprecated functions, distributors of libgit2 aren't expected to
pass -DDEPRECATE_HARD anyway. Instead, users of libgit2 should manually
define GIT_DEPRECATE_HARD to hide deprecated functions. Using "real"
hard deprecation still makes sense in the context of CI to test we don't
use deprecated symbols ourselves and in case a dependant uses libgit2 in
a vendored way and knows it won't ever use any of the deprecated symbols
anyway.
|
|
|
|
|
|
|
| |
We've accumulated quite some functions which are never used outside of
their respective code unit, but which are lacking the `static` keyword.
Add it to reduce their linkage scope and allow the compiler to optimize
better.
|
|
|
|
| |
Propagate failures caused by pool initialization errors.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When computing renames, we cache the hash signatures for each of the
potentially conflicting entries so that we do not need to repeatedly
read the file and can at least halfway efficiently determine whether two
files are similar enough to be deemed a rename. In order to make the
hash signatures meaningful, we require at least four lines of data to be
present, resulting in at least four different hashes that can be
compared. Files that are deemed too small are not cached at all and
will thus be repeatedly re-hashed, which is usually not a huge issue.
The issue with above heuristic is in case a file does _not_ have at
least four lines, where a line is anything separated by a consecutive
run of "\n" or "\0" characters. For example "a\nb" is two lines, but
"a\0\0b" is also just two lines. Taken to the extreme, a file that has
megabytes of consecutive space- or NUL-only may also be deemed as too
small and thus not get cached. As a result, we will repeatedly load its
blob, calculate its hash signature just to finally throw it away as we
notice it's not of any value. When you've got a comparitively big file
that you compare against a big set of potentially renamed files, then
the cost simply expodes.
The issue can be trivially fixed by introducing negative cache entries.
Whenever we determine that a given blob does not have a meaningful
representation via a hash signature, we store this negative cache marker
and will from then on not hash it again, but also ignore it as a
potential rename target. This should help the "normal" case already
where you have a lot of small files as rename candidates, but in the
above scenario it's savings are extraordinarily high.
To verify we do not hit the issue anymore with described solution, this
commit adds a test that uses the exact same setup described above with
one 50 megabyte blob of '\0' characters and 1000 other files that get
renamed. Without the negative cache:
$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
real 11m48.377s
user 11m11.576s
sys 0m35.187s
And with the negative cache:
$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
real 0m1.972s
user 0m1.851s
sys 0m0.118s
So this represents a ~350-fold performance improvement, but it obviously
depends on how many files you have and how big the blob is. The test
number were chosen in a way that one will immediately notice as soon as
the bug resurfaces.
|
|
|
|
|
| |
Instead of using a signed type (`off_t`) use a new `git_object_size_t`
for the sizes of objects.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The flag GIT_FILEBUF_FORCE currently does two things:
1. It will cause the filebuf to create non-existing leading
directories for the file that is about to be written.
2. It will forcibly remove any pre-existing locks.
While most call sites actually do want (1), they do not want to
remove pre-existing locks, as that renders the locking mechanisms
effectively useless.
Introduce a new flag `GIT_FILEBUF_CREATE_LEADING_DIRS` to
separate both behaviours cleanly from each other and convert
callers to use it instead of `GIT_FILEBUF_FORCE` to have them
honor locked files correctly.
As this conversion removes all current users of `GIT_FILEBUF_FORCE`,
this commit removes the flag altogether.
|
|
|
|
|
|
|
|
|
| |
The function `git_commit_list_insert` dynamically allocates memory and
may thus fail to insert a given commit, but we didn't check for that in
several places in "merge.c".
Convert surrounding functions to return error codes and check whether
`git_commit_list_insert` was successful, returning an error if not.
|
|
|
|
| |
Explicitly truncate the file size to a `uint32_t`.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In libgit2 nomenclature, when we need to verb a direct object, we name
a function `git_directobject_verb`. Thus, if we need to init an options
structure named `git_foo_options`, then the name of the function that
does that should be `git_foo_options_init`.
The previous names of `git_foo_init_options` is close - it _sounds_ as
if it's initializing the options of a `foo`, but in fact
`git_foo_options` is its own noun that should be respected.
Deprecate the old names; they'll now call directly to the new ones.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, one would use either `git_oidmap_insert` to insert key/value pairs
into a map or `git_oidmap_put` to insert a key only. These function have
historically been macros, which is why their syntax is kind of weird: instead of
returning an error code directly, they instead have to be passed a pointer to
where the return value shall be stored. This does not match libgit2's common
idiom of directly returning error codes.Furthermore, `git_oidmap_put` is tightly
coupled with implementation details of the map as it exposes the index of
inserted entries.
Introduce a new function `git_oidmap_set`, which takes as parameters the map,
key and value and directly returns an error code. Convert all trivial callers of
`git_oidmap_insert` and `git_oidmap_put` to make use of it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current way of looking up an entry from a map is tightly coupled with the
map implementation, as one first has to look up the index of the key and then
retrieve the associated value by using the index. As a caller, you usually do
not care about any indices at all, though, so this is more complicated than
really necessary. Furthermore, it invites for errors to happen if the correct
error checking sequence is not being followed.
Introduce a new high-level function `git_oidmap_get` that takes a map and a key
and returns a pointer to the associated value if such a key exists. Otherwise,
a `NULL` pointer is returned. Adjust all callers that can trivially be
converted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the lifecycle functions for maps (allocation, deallocation, resize)
are not named in a uniform way and do not have a uniform function signature.
Rename the functions to fix that, and stick to libgit2's naming scheme of saying
`git_foo_new`. This results in the following new interface for allocation:
- `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an
error code if we ran out of memory
- `void git_<t>map_free(git_<t>map *map)` to free a map
- `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map
This commit also fixes all existing callers.
|
|
|
|
|
| |
Move to the `git_error` name in the internal API for error-related
functions.
|
|
|
|
| |
Use the new object_type enumeration names within the codebase.
|
|\
| |
| | |
Allow merge analysis against any reference
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
This moves the current merge analysis code into a more generic version
that can work against any reference.
Also change the tests to check returned analysis values exactly.
|
| |
| |
| |
| |
| |
| |
| | |
Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types,
simply use `size_t` instead. This decouples code from the khash stuff
and makes it possible to move the khash includes into the implementation
files.
|
|\ \
| |/
|/| |
|
| | |
|
| |
| |
| |
| | |
Cleans up should git_repository_index or git_index_read fail
|
| |
| |
| |
| |
| |
| |
| |
| | |
If the index in memory is different from the index on the disk,
previously merge would abort with GIT_ECONFLICT.
Reload the index before merging to fix this.
Fixes #4203
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Our virtual commit must be the last argument to merge-base: since our
algorithm pushes _both_ parents of the virtual commit, it needs to be
the last argument, since merge-base:
> Given three commits A, B and C, git merge-base A B C will compute the
> merge base between A and a hypothetical commit M
We want to calculate the merge base between the actual commit ("two")
and the virtual commit ("one") - since one actually pushes its parents
to the merge-base calculation, we need to calculate the merge base of
"two" and the parents of one.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When the commits being merged have multiple merge bases, reverse the
order when creating the virtual merge base. This is for compatibility
with git's merge-recursive algorithm, and ensures that we build
identical trees.
Git does this to try to use older merge bases first. Per 8918b0c:
> It seems to be the only sane way to do it: when a two-head merge is
> done, and the merge-base and one of the two branches agree, the
> merge assumes that the other branch has something new.
>
> If we start creating virtual commits from newer merge-bases, and go
> back to older merge-bases, and then merge with newer commits again,
> chances are that a patch is lost, _because_ the merge-base and the
> head agree on it. Unlikely, yes, but it happened to me.
|
|/
|
|
|
|
|
|
|
|
|
| |
Git uses longer conflict markers in the recursive merge base - two more
than the default (thus, 9 character long conflict markers). This allows
users to tell the difference between the recursive merge conflicts and
conflicts between the ours and theirs branches.
This was introduced in git d694a17986a28bbc19e2a6c32404ca24572e400f.
Update our tests to expect this as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.
This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.
This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.
|
|
|
|
|
|
|
|
|
|
| |
The function `merge_diff_mark_similarity_exact` may error our early and,
when it does so, free the `ours_deletes_by_oid` and
`theirs_deletes_by_oid` variables. While the first one can never be
uninitialized due to the first call actually assigning to it, the second
variable can be freed without being initialized.
Fix the issue by initializing both variables to `NULL`.
|
|
|
|
|
|
|
|
|
| |
The current exact rename detection has order n^2 complexity.
We can do better by using a map to first aggregate deletes and
using that to match deletes to adds.
This results in a substantial performance improvement for merges
with a large quantity of adds and deletes.
|
|\ |
|
| | |
|
| |
| |
| |
| | |
When one side of a merge is treesame to the ancestor, we can take the other side and skip all the expensive merge operations. This optimization can only be performed when the generation of REUC extension data is skipped.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The `path_repository` variable is actually confusing to think
about, as it is not always clear what the repository actually is.
It may either be the path to the folder containing worktree and
.git directory, the path to .git itself, a worktree or something
entirely different. Actually, the intent of the variable is to
hold the path to the gitdir, which is either the .git directory
or the bare repository.
Rename the variable to `gitdir` to avoid confusion. While at it,
also rename `path_gitlink` to `gitlink` to improve consistency.
|
| | |
|
|\ \
| | |
| | | |
merge: set default rename threshold
|
| | |
| | |
| | |
| | |
| | | |
When `GIT_MERGE_FIND_RENAMES` is set, provide a default for
`rename_threshold` when it is unset.
|
|/ /
| |
| |
| |
| |
| |
| |
| | |
Error messages should be sentence fragments, and therefore:
1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
|
|/ |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Since the `apply` callback can defer, the `check` callback is not
necessary. Removing the `check` callback further makes the `payload`
unnecessary along with the `cleanup` callback.
|
| |
|
| |
|