| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Return `GIT_EINVALID` on parse errors so that direct callers of parse
functions can determine when there was a failure to parse the object.
The object parser functions will swallow this error code to prevent it
from propagating down the chain to end-users. (`git_merge` should not
return `GIT_EINVALID` when a commit it tries to look up is not valid,
this would be too vague to be useful.)
The only public function that this affects is
`git_signature_from_buffer`, which is now documented as returning
`GIT_EINVALID` when appropriate.
|
| |
|
|
|
| |
The filebuf functions should use hashes directly, not indirectly
using the oid functions.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libgit2 has two distinct requirements that were previously solved by
`git_buf`. We require:
1. A general purpose string class that provides a number of utility APIs
for manipulating data (eg, concatenating, truncating, etc).
2. A structure that we can use to return strings to callers that they
can take ownership of.
By using a single class (`git_buf`) for both of these purposes, we have
confused the API to the point that refactorings are difficult and
reasoning about correctness is also difficult.
Move the utility class `git_buf` to be called `git_str`: this represents
its general purpose, as an internal string buffer class. The name also
is an homage to Junio Hamano ("gitstr").
The public API remains `git_buf`, and has a much smaller footprint. It
is generally only used as an "out" param with strict requirements that
follow the documentation. (Exceptions exist for some legacy APIs to
avoid breaking callers unnecessarily.)
Utility functions exist to convert a user-specified `git_buf` to a
`git_str` so that we can call internal functions, then converting it
back again.
|
| |
|
|
|
|
| |
Separate the concerns of the hash functions from the git_oid functions.
The git_oid structure will need to understand either SHA1 or SHA256; the
hash functions should only deal with the appropriate one of these.
|
| | |
|
| |
|
|
|
| |
Turns out, double negatives are harder to parse than positive
statements.
|
| |
|
|
|
|
|
|
|
|
| |
It turns out that if we use `mmap(2)`, non-Windows remote filesystems
break due to permissions. If we don't, _Windows_ remote filesystems
break due to lack of coherence between memory mapped views of the file
and direct I/O operations done to the files.
To break out of this impossible situation, conditionally-compile
versions of Windows-specific `write_at` and `append_to_pack`.
|
| |
|
|
|
|
| |
Now that we're not using `mmap(2)` for writing stuff, we don't need to
truncate the file afterwards, since it'll have the correct size at the
end of the process. Whee~!
|
| |
|
|
|
|
|
|
|
| |
This change makes `append_to_pack` completely rely on `p_pwrite` to do
all its I/O instead of splitting it between `p_pwrite` and a
`mmap(2)`/`munmap(2)`+`memcpy(3)`. This saves a good chunk of user CPU
time and avoids making two syscalls per round, but doesn't really cut
down a lot of wall time (~1% on cloning the
[git](https://github.com/git/git.git) repository).
|
| |
|
|
|
|
| |
This change stops using the seek+read/write combo to perform I/O with an
offset, since this is faster by one system call (and also more atomic
and therefore safer).
|
| |
|
|
|
| |
* Use pread/pwrite to avoid updating position in file descriptor
* Emulate missing pread/pwrite on win32 using overlapped file IO
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change fixes a packfile heap corruption that can happen when
interacting with multiple packfiles concurrently across multiple
threads. This is exacerbated by setting a lower mwindow open file limit.
This change:
* Renames most of the internal methods in pack.c to clearly indicate
that they expect to be called with a certain lock held, making
reasoning about the state of locks a bit easier.
* Splits the `git_pack_file` lock in two: the one in `git_pack_file`
only protects the `index_map`. The protection to `git_mwindow_file` is
now in that struct.
* Explicitly checks for freshness of the `git_pack_file` in
`git_packfile_unpack_header`: this allows the mwindow implementation
to close files whenever there is enough cache pressure, and
`git_packfile_unpack_header` will reopen the packfile if needed.
* After a call to `p_munmap()`, the `data` and `len` fields are poisoned
with `NULL` to make use-after-frees more evident and crash rather than
being open to the possibility of heap corruption.
* Adds a test case to prevent this from regressing in the future.
Fixes: #5591
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
| |
This change:
* Initializes a few variables that were being read before being
initialized.
* Includes https://github.com/madler/zlib/pull/393. As such,
it only works reliably with `-DUSE_BUNDLED_ZLIB=ON`.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When compiling libgit2 with -DDEPRECATE_HARD, we add a preprocessor
definition `GIT_DEPRECATE_HARD` which causes the "git2/deprecated.h"
header to be empty. As a result, no function declarations are made
available to callers, but the implementations are still available to
link against. This has the problem that function declarations also
aren't visible to the implementations, meaning that the symbol's
visibility will not be set up correctly. As a result, the resulting
library may not expose those deprecated symbols at all on some platforms
and thus cause linking errors.
Fix the issue by conditionally compiling deprecated functions, only.
While it becomes impossible to link against such a library in case one
uses deprecated functions, distributors of libgit2 aren't expected to
pass -DDEPRECATE_HARD anyway. Instead, users of libgit2 should manually
define GIT_DEPRECATE_HARD to hide deprecated functions. Using "real"
hard deprecation still makes sense in the context of CI to test we don't
use deprecated symbols ourselves and in case a dependant uses libgit2 in
a vendored way and knows it won't ever use any of the deprecated symbols
anyway.
|
| |
|
|
|
| |
This makes get_delta_base() return the error code as the return value
and the delta base as an out-parameter.
|
| |
|
|
|
|
|
|
|
| |
Initialization of the hashing context may fail on some systems, most
notably on Win32 via the legacy hashing context. As such, we need to
always check the error code of `git_hash_ctx_init`, which is not done
when creating a new indexer.
Fix the issue by adding checks.
|
| |
|
|
| |
Prefer `off64_t` internally.
|
| |
|
|
|
|
|
|
|
|
|
| |
When adding OIDs to the indexer's map of yet-to-be-seen OIDs to verify
that packfiles are complete, we do so by first allocating a new OID and
then calling `git_oidmap_set` on it. There was no check for memory
allocation errors in place, though, leading to possible segfaults due to
trying to copy data to a `NULL` pointer.
Verify the result of `git__malloc` with `GIT_ERROR_CHECK_ALLOC` to fix
the issue.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In libgit2 nomenclature, when we need to verb a direct object, we name
a function `git_directobject_verb`. Thus, if we need to init an options
structure named `git_foo_options`, then the name of the function that
does that should be `git_foo_options_init`.
The previous names of `git_foo_init_options` is close - it _sounds_ as
if it's initializing the options of a `foo`, but in fact
`git_foo_options` is its own noun that should be respected.
Deprecate the old names; they'll now call directly to the new ones.
|
| |
|
|
|
| |
Update internal usage of `git_transfer_progress` to
`git_indexer_progreses`.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
To compute whether there are objects missing in a packfile, the indexer keeps
around a map of OIDs that it still expects to see. This map does not store any
values at all, but in fact the keys are owned by the map itself. Right now, we
free these keys by iterating over the map and freeing the key itself, which is
kind of awkward as keys are expected to be constant.
We can make this a bit prettier by inserting the OID as value, too. As we
already store the `NULL` pointer either way, this does not increase memory
usage, but makes the code a tad more clear. Furthermore, we convert the
previously existing map iteration via indices to make use of an iterator,
instead.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, one would use either `git_oidmap_insert` to insert key/value pairs
into a map or `git_oidmap_put` to insert a key only. These function have
historically been macros, which is why their syntax is kind of weird: instead of
returning an error code directly, they instead have to be passed a pointer to
where the return value shall be stored. This does not match libgit2's common
idiom of directly returning error codes.Furthermore, `git_oidmap_put` is tightly
coupled with implementation details of the map as it exposes the index of
inserted entries.
Introduce a new function `git_oidmap_set`, which takes as parameters the map,
key and value and directly returns an error code. Convert all trivial callers of
`git_oidmap_insert` and `git_oidmap_put` to make use of it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the lifecycle functions for maps (allocation, deallocation, resize)
are not named in a uniform way and do not have a uniform function signature.
Rename the functions to fix that, and stick to libgit2's naming scheme of saying
`git_foo_new`. This results in the following new interface for allocation:
- `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an
error code if we ran out of memory
- `void git_<t>map_free(git_<t>map *map)` to free a map
- `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map
This commit also fixes all existing callers.
|
| |
|
|
|
| |
Move to the `git_error` name in the internal API for error-related
functions.
|
| |
|
|
| |
Use the new object_type enumeration names within the codebase.
|
| |
|
|
|
|
|
| |
Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types,
simply use `size_t` instead. This decouples code from the khash stuff
and makes it possible to move the khash includes into the implementation
files.
|
| |\
| |
| | |
Pack file verification
|
| | | |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Right now, we simply turn on connectivity checks in the indexer as soon
as we have access to an object database. But seeing that the
connectivity checks may incur additional overhead, we do want the user
to decide for himself whether he wants to allow those checks.
Furthermore, it might also be desirable to check connectivity in case
where no object database is given at all, e.g. in case where a fully
connected pack file is expected.
Add a flag `verify` to `git_indexer_options` to enable additional
verification checks. Also avoid to query the ODB in case none is given
to allow users to enable checks when they do not have an ODB.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We strive to keep an options structure to many functions to be able to
extend options in the future without breaking the API. `git_indexer_new`
doesn't have one right now, but we want to be able to add an option
for enabling strict packfile verification.
Add a new `git_indexer_options` structure and adjust callers to use
that.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When passing `--strict` to `git-unpack-objects`, core git will verify
the pack file that is currently being read. In addition to the typical
checksum verification, this will especially cause it to verify object
connectivity of the received pack file. So it checks, for every received
object, if all the objects it references are either part of the local
object database or part of the pack file. In libgit2, we currently have
no such mechanism, which leaves us unable to verify received pack files
prior to writing them into our local object database.
This commit introduce the concept of `expected_oids` to the indexer.
When pack file verification is turned on by a new flag, the indexer will
try to parse each received object first. If the object has any links to
other objects, it will check if those links are already satisfied by
known objects either part of the object database or objects it has
already seen as part of that pack file. If not, it will add them to the
list of `expected_oids`. Furthermore, the indexer will remove the
current object from the `expected_oids` if it is currently being
expected.
Like this, we are able to verify whether all object links are being
satisfied. As soon as we hit the end of the object stream and have
resolved all objects as well as deltified objects, we assert that
`expected_oids` is in fact empty. This should always be the case for a
valid pack file with full connectivity.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The loop inside of `git_indexer_append` iterates over every object that
is to be stored as part of the index. While the logic to retrieve every
object from the packfile stream is rather involved, it currently just
part of the loop, making it unnecessarily hard to follow.
Move the logic into its own function `read_stream_object`, which unpacks
a single object from the stream. Note that there is some subtletly here
involving the special error `GIT_EBUFS`, which indicates to the indexer
that no more data is currently available. So instead of returning an
error and aborting the whole loop in that case, we do have to catch that
value and return successfully to wait for more data to be read.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The `processed` variable local to `git_indexer_append` counts how many
objects have already been processed. But actually, whenever it gets
assigned to, we are also assigning the same value to the
`stats->indexed_objects` struct member. So in fact, it is being quite
useless due to always having the same value as the `indexer_objects`
member and makes it a bit harder to understand the code. We can just
remove the variable to fix that.
|
| | |
| |
| | |
This replicates the old behavior of limiting to 2³² by default.
|
| | | |
|
| | | |
|
| |/ |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The function `git_packfile_stream_free` frees all state of the packfile
stream without freeing the structure itself. This naming makes it hard
to spot whether it will try to free the pointer itself or not, causing
potential future errors. Due to this reason, we have decided to name a
function freeing state without freeing the actual struture a "dispose"
function.
Rename `git_packfile_stream_free` to `git_packfile_stream_dispose` as a
first example following this rule.
|
| |
|
|
|
| |
Return an error to the caller when we can't create an object header for
some reason (printf failure) instead of simply asserting.
|
| |
|
|
|
|
|
|
|
|
|
| |
If an element has been cached, but then the call to
packfile_unpack_compressed() fails, the very next thing that happens is
that its data is freed and then the element is not removed from the
cache, which frees the data again.
This change sets obj->data to NULL to avoid the double-free. It also
stops trying to resolve deltas after two continuous failed rounds of
resolution, and adds a test for this.
|
| |
|
|
|
|
|
| |
This change fixes an invalid memory access when the trailer is missing /
corrupt.
Found using libFuzzer.
|
| |
|
|
|
|
|
| |
This change ensures that the git_packfile_stream object in
git_indexer_append() does not leak when the stream has errors.
Found using libFuzzer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.
This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.
This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.
|
| |\
| |
| | |
Ensure packfiles with different contents have different names
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Upstream git.git has changed the way how packfiles are named.
Previously, they were using a hash of the contained object's OIDs, which
has then been changed to use the hash of the complete packfile instead.
See 1190a1acf (pack-objects: name pack files after trailer hash,
2013-12-05) in the git.git repository for more information on this
change.
This commit changes our logic to match the behavior of core git.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Initially, the setting has been solely used to enable the use of
`fsync()` when creating objects. Since then, the use has been extended
to also cover references and index files. As the option is not yet part
of any release, we can still correct this by renaming the option to
something more sensible, indicating not only correlation to objects.
This commit renames the option to `GIT_OPT_ENABLE_FSYNC_GITDIR`. We also
move the variable from the object to repository source code.
|