| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|\
| |
| | |
pack: Improve error handling for get_delta_base()
|
| | |
|
| |
| |
| |
| |
| | |
This makes get_delta_base() return the error code as the return value
and the delta base as an out-parameter.
|
| |
| |
| |
| |
| |
| |
| |
| | |
This change moves the responsibility of setting the error upon failures
of get_delta_base() to get_delta_base() instead of its callers. That
way, the caller chan always check if the return value is negative and
mark the whole operation as an error instead of using garbage values,
which can lead to crashes if the .pack files are malformed.
|
|\ \
| | |
| | | |
merge: cache negative cache results for similarity metrics
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When computing renames, we cache the hash signatures for each of the
potentially conflicting entries so that we do not need to repeatedly
read the file and can at least halfway efficiently determine whether two
files are similar enough to be deemed a rename. In order to make the
hash signatures meaningful, we require at least four lines of data to be
present, resulting in at least four different hashes that can be
compared. Files that are deemed too small are not cached at all and
will thus be repeatedly re-hashed, which is usually not a huge issue.
The issue with above heuristic is in case a file does _not_ have at
least four lines, where a line is anything separated by a consecutive
run of "\n" or "\0" characters. For example "a\nb" is two lines, but
"a\0\0b" is also just two lines. Taken to the extreme, a file that has
megabytes of consecutive space- or NUL-only may also be deemed as too
small and thus not get cached. As a result, we will repeatedly load its
blob, calculate its hash signature just to finally throw it away as we
notice it's not of any value. When you've got a comparitively big file
that you compare against a big set of potentially renamed files, then
the cost simply expodes.
The issue can be trivially fixed by introducing negative cache entries.
Whenever we determine that a given blob does not have a meaningful
representation via a hash signature, we store this negative cache marker
and will from then on not hash it again, but also ignore it as a
potential rename target. This should help the "normal" case already
where you have a lot of small files as rename candidates, but in the
above scenario it's savings are extraordinarily high.
To verify we do not hit the issue anymore with described solution, this
commit adds a test that uses the exact same setup described above with
one 50 megabyte blob of '\0' characters and 1000 other files that get
renamed. Without the negative cache:
$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
real 11m48.377s
user 11m11.576s
sys 0m35.187s
And with the negative cache:
$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
real 0m1.972s
user 0m1.851s
sys 0m0.118s
So this represents a ~350-fold performance improvement, but it obviously
depends on how many files you have and how big the blob is. The test
number were chosen in a way that one will immediately notice as soon as
the bug resurfaces.
|
|\ \ \
| | | |
| | | | |
Handle repository format v1
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Git has supported repository format version 1 for some time. This
format is just like version 0, but it supports extensions.
Implementations must reject extensions that they don't support.
Add support for this format version and reject any extensions but
extensions.noop, which is the only extension we currently support.
While we're at it, also clean up an error message.
|
|\ \ \ \
| |_|_|/
|/| | | |
refdb_fs: remove unused header file
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The "refdb_fs.h" header contains a single struct `git_refcache` that is
not used anywhere. As a result, we can just delete the header altogether
as it doesn't have any purpose and may confuse readers.
|
| |_|/
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When generating a patch for a renamed file whose mode bits have changed
in addition to the rename, then we currently fail to parse the generated
patch. Furthermore, when generating a diff we output mode bits after the
similarity metric, which is different to how upstream git handles it.
Fix both issues by adding another state transition that allows
similarity indices after mode changes and by printing mode changes
before the similarity index.
|
|\ \ \
| | | |
| | | | |
Fix segfault when calling git_blame_buffer()
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This change makes sure that the hunk is not null before trying to
dereference it. This avoids segfaults, especially when blaming against a
modified buffer (i.e. the index).
Fixes: #5443
|
|/ /
| |
| |
| | |
Signed-off-by: Utkarsh Gupta <utkarsh@debian.org>
|
| |
| |
| |
| |
| |
| | |
While the `git_refdb_backend()` struct has a version, we do not
initialize it correctly when calling `git_refdb_backend_fs()`. Fix this
by adding the call to `git_refdb_init_backend()`.
|
|\ \
| | |
| | | |
cmake: use install directories provided via GNUInstallDirs
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We currently hand-code logic to configure where to install our artifacts
via the `LIB_INSTALL_DIR`, `INCLUDE_INSTALL_DIR` and `BIN_INSTALL_DIR`
variables. This is reinventing the wheel, as CMake already provide a way
to do that via `CMAKE_INSTALL_<DIR>` paths, e.g. `CMAKE_INSTALL_LIB`.
This requires users of libgit2 to know about the discrepancy and will
require special hacks for any build systems that handle these variables
in an automated way. One such example is Gentoo Linux, which sets up
these paths in both the cmake and cmake-utils eclass.
So let's stop doing that: the GNUInstallDirs module handles it in a
better way for us, especially so as the actual values are dependent on
CMAKE_INSTALL_PREFIX. This commit removes our own set of variables and
instead refers users to use the standard ones.
As a second benefit, this commit also fixes our pkgconfig generation to
use the GNUInstallDirs module. We had a bug there where we ignored the
CMAKE_INSTALL_PREFIX when configuring the libdir and includedir keys, so
if libdir was set to "lib64", then libdir would be an invalid path. With
GNUInstallDirs, we can now use `CMAKE_INSTALL_FULL_LIBDIR`, which
handles the prefix for us.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The Secure Transport interface we're currently using has been deprecated
with macOS 10.15. As we're currently in code freeze, we cannot migrate
to newer interfaces. As such, let's disable deprecation warnings for
our "schannel.c" stream.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Don't canonicalize symlink targets; our win32 path canonicalization
routines expect an absolute path. In particular, using the path
canonicalization routines for symlink targets (introduced in commit
7d55bee6d, "win32: fix relative symlinks pointing into dirs",
2020-01-10).
Now, use the utf8 -> utf16 relative path handling functions, so that
paths like "../foo" will be translated to "..\foo".
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a function that takes a (possibly) relative UTF-8 path and emits a
UTF-16 path with forward slashes translated to backslashes. If the
given path is, in fact, absolute, it will be translated to absolute path
handling rules.
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The path canonicalization functions on win32 are intended to
canonicalize absolute paths; those with prefixes. In other words,
things start with drive letters (`C:\`), share names (`\\server\share`),
or other prefixes (`\\?\`).
This function removes leading `..` that occur after the prefix but
before the directory/file portion (eg, turning `C:\..\..\..\foo` into
`C:\foo`). This translation is not appropriate for local paths.
|
|\ \
| | |
| | | |
CMake booleans
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In order to check whether tracing support should be turned on, we check
whether ENABLE_TRACE equals "ON". This is being much too strict, as
CMake will also treat "on", "true", "yes" and others as true-ish, but
passing them will disable tracing support now.
Fix the issue by simply removing the STREQUAL, which will cause CMake to
do the right thing automatically.
|
|\ \
| | |
| | | |
Set proper pkg-config dependency for pcre2
|
| | |
| | |
| | |
| | | |
Signed-off-by: Igor Raits <i.gnatenko.brain@gmail.com>
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use a 16kb read buffer for compatibility with macOS SecureTransport.
SecureTransport `SSLRead` has the following behavior:
1. It will return _at most_ one TLS packet's worth of data, and
2. It will try to give you as much data as you asked for
This means that if you call `SSLRead` with a buffer size that is smaller
than what _it_ reads (in other words, the maximum size of a TLS packet),
then it will buffer that data for subsequent calls. However, it will
also attempt to give you as much data as you requested in your SSLRead
call. This means that it will guarantee a network read in the event
that it has buffered data.
Consider our 8kb buffer and a server sending us 12kb of data on an HTTP
Keep-Alive session. Our first `SSLRead` will read the TLS packet off
the network. It will return us the 8kb that we requested and buffer the
remaining 4kb. Our second `SSLRead` call will see the 4kb that's
buffered and decide that it could give us an additional 4kb. So it will
do a network read.
But there's nothing left to read; that was the end of the data. The
HTTP server is waiting for us to provide a new request. The server will
eventually time out, our `read` system call will return, `SSLRead` can
return back to us and we can make progress.
While technically correct, this is wildly ineffecient. (Thanks, Tim
Apple!)
Moving us to use an internal buffer that is the maximum size of a TLS
packet (16kb) ensures that `SSLRead` will never buffer and it will
always return everything that it read (albeit decrypted).
|
|\ \
| | |
| | | |
deps: ntlmclient: fix missing htonll symbols on FreeBSD and SunOS
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In the NTLM authentication code, we accidentally use strdup(3P) and
strndup(3P) instead of our own wrappers git__strdup and git__strndup,
respectively.
Fix the issue by using our own functions.
|
| | |
| | |
| | |
| | | |
Signed-off-by: Sven Strickroth <email@cs-ware.de>
|
|\ \ \
| | | |
| | | | |
sha1_lookup: inline its only function into "pack.c"
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The file "sha1_lookup.c" contains a single function `sha1_position`
only which is used only in the packfile implementation. As the function
is comparatively small, to enable the compiler to optimize better and to
remove symbol visibility, move it into "pack.c".
|
|\ \ \ \
| |_|/ /
|/| | | |
Coverity fixes
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
OpenSSL pre-v1.1 required us to set up a locking function to properly
support multithreading. The locking function signature cannot return any
error codes, and as a result we can't do anything if `git_mutex_lock`
fails. To silence static analysis tools, let's just explicitly ignore
its return value by casting it to `void`.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When adding a new entry to our cache where an entry with the same OID
exists already, then we only update the existing entry in case it is
unparsed and the new entry is parsed. Currently, we do not check the
return value of `git_oidmap_set` though when updating the existing
entry. As a result, we will _not_ have updated the existing entry if
`git_oidmap_set` fails, but have decremented its refcount and
incremented the new entry's refcount. Later on, this may likely lead to
dereferencing invalid memory.
Fix the issue by checking the return value of `git_oidmap_set`. In case
it fails, we will simply keep the existing stored instead, even though
it's unparsed.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Git worktree's have the ability to be locked in order to spare them from
deletion, e.g. if a worktree is absent due to being located on a
removable disk it is a good idea to lock it. When locking such
worktrees, it is possible to give a locking reason in order to help the
user later on when inspecting status of any such locked trees.
The function `git_worktree_is_locked` serves to read out the locking
status. It currently does not properly report any errors when reading
the reason file, and callers are unexpecting of any negative return
values, too. Fix this by converting callers to expect error codes and
checking the return code of `git_futils_readbuffer`.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When checking whether a path is a valid repository path, we try to read
the "commondir" link file. In the process, we neither confirm that
constructing the file's path succeeded nor do we verify that reading the
file succeeded, which might cause us to verify repositories on an empty
or bogus path later on.
Fix this by checking return values. As the function to verify repos
doesn't currently support returning errors, this commit also refactors
the function to return an error code, passing validity of the repo via
an out parameter instead, and adjusts all existing callers.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
While `git_zstream_set_input` cannot fail right now, it might change in
the future if we ever decide to have it check its parameters more
vigorously. Let's thus check whether its return code signals an error.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Initialization of the hashing context may fail on some systems, most
notably on Win32 via the legacy hashing context. As such, we need to
always check the error code of `git_hash_ctx_init`, which is not done
when creating a new indexer.
Fix the issue by adding checks.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When queueing objects we want to push, we call `git_revwalk_hide` to
hide all objects already known to the remote from our revwalk. We do not
check its return value though, where the orginial intent was to ignore
the case where the pushed OID is not a known committish. As
`git_revwalk_hide` can fail due to other reasons like out-of-memory
exceptions, we should still check its return value.
Fix the issue by checking the function's return value, ignoring
errors hinting that it's not a committish. As `git_revwalk__push_commit`
currently clobbers these error codes, we need to adjust it as well in
order to make it available downstream.
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When calling `git_note_next`, we end up calling `git_iterator_advance`
but ignore its error code. The intent is that we do not want to return
an error if it returns `GIT_ITEROVER`, as we want to return that value
on the next invocation of `git_note_next`. We should still check for any
other error codes returned by `git_iterator_advance` to catch unexpected
internal errors.
Fix this by checking the function's return value, ignoring
`GIT_ITEROVER`.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
As OpenSSL loves using uninitialized bytes as another source of entropy,
we need to mark them as defined so that Valgrind won't complain about
use of these bytes. Traditionally, we've been using the macro
`VALGRIND_MAKE_MEM_DEFINED` provided by Valgrind, but starting with
OpenSSL 1.1 the code doesn't compile anymore due to `struct SSL` having
become opaque. As such, we also can't set it as defined anymore, as we
have no way of knowing its size.
Let's change gears instead by just swapping out the allocator functions
of OpenSSL with our own ones. The twist is that instead of calling
`malloc`, we just call `calloc` to have the bytes initialized
automatically. Next to soothing Valgrind, this approach has the benefit
of being completely agnostic of the memory sanitizer and is neatly
contained at a single place.
Note that we shouldn't do this for non-Valgrind builds. As we cannot
set up memory functions for a given SSL context, only, we need to swap
them at a global context. Furthermore, as it's possible to call
`OPENSSL_set_mem_functions` once only, we'd prevent users of libgit2 to
set up their own allocators.
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
OpenSSL doesn't initialize bytes on purpose in order to generate
additional entropy. Valgrind isn't too happy about that though, causing
it to generate warninings about various issues regarding use of
uninitialized bytes.
We traditionally had some infrastructure to silence these errors in our
OpenSSL stream implementation, where we invoke the Valgrind macro
`VALGRIND_MAKE_MEMDEFINED` in various callbacks that we provide to
OpenSSL. Naturally, we only include these instructions if a preprocessor
define "VALGRIND" is set, and that in turn is only set if passing
"-DVALGRIND" to CMake. We do that in our usual Azure pipelines, but we
in fact forgot to do this in our nightly build. As a result, we get a
slew of warnings for these nightly builds, but not for our normal
builds.
To fix this, we could just add "-DVALGRIND" to our nightly builds. But
starting with commit d827b11b6 (tests: execute leak checker via CTest
directly, 2019-06-28), we do have a secondary variable that directs
whether we want to use memory sanitizers for our builds. As such, every
user wishing to use Valgrind for our tests needs to pass both options
"VALGRIND" and "USE_LEAK_CHECKER", which is cumbersome and error prone,
as can be seen by our own builds.
Instead, let's consolidate this into a single option, removing the old
"-DVALGRIND" one. Instead, let's just add the preprocessor directive if
USE_LEAK_CHECKER equals "valgrind" and remove "-DVALGRIND" from our own
pipelines.
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit b9c5b15a7 (http: use the new httpclient, 2019-12-22), the HTTP
code got refactored to extract a generic HTTP client that operates
independently of the Git protocol. Part of refactoring was the creation
of a new `git_http_request` struct that encapsulates the generation of
requests. Our Git-specific HTTP transport was converted to use that in
`generate_request`, but during the process we forgot to set up custom
headers for the `git_http_request` and as a result we do not send out
these headers anymore.
Fix the issue by correctly setting up the request's custom headers and
add a test to verify we correctly send them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If fetching from an anonymous remote via its URL, then the URL gets
written into the FETCH_HEAD reference. This is mainly done to give
valuable context to some commands, like for example git-merge(1), which
will put the URL into the generated MERGE_MSG. As a result, what gets
written into FETCH_HEAD may become public in some cases. This is
especially important considering that URLs may contain credentials, e.g.
when cloning 'https://foo:bar@example.com/repo' we persist the complete
URL into FETCH_HEAD and put it without any kind of sanitization into the
MERGE_MSG. This is obviously bad, as your login data has now just leaked
as soon as you do git-push(1).
When writing the URL into FETCH_HEAD, upstream git does strip
credentials first. Let's do the same by trying to parse the remote URL
as a "real" URL, removing any credentials and then re-formatting the
URL. In case this fails, e.g. when it's a file path or not a valid URL,
we just fall back to using the URL as-is without any sanitization. Add
tests to verify our behaviour.
|
|\
| |
| | |
cred: change enum to git_credential_t and GIT_CREDENTIAL_*
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We avoid abbreviations where possible; rename git_cred to
git_credential.
In addition, we have standardized on a trailing `_t` for enum types,
instead of using "type" in the name. So `git_credtype_t` has become
`git_credential_t` and its members have become `GIT_CREDENTIAL` instead
of `GIT_CREDTYPE`.
Finally, the source and header files have been renamed to `credential`
instead of `cred`.
Keep previous name and values as deprecated, and include the new header
files from the previous ones.
|
| |
| |
| |
| |
| | |
Stop returning a void for functions, future-proofing them to allow them
to fail.
|
| |
| |
| |
| |
| | |
Stop returning a void for functions, future-proofing them to allow them
to fail.
|
| |
| |
| |
| |
| | |
Stop returning a void for functions, future-proofing them to allow them
to fail.
|