| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Continue the zlib stream as long as we can make progress; stop when we
stop getting output _or_ when zlib stops taking input from us.
|
|
|
|
|
| |
Clarify the `git_atomic` type and functions now that we have a 64 bit
version as well (`git_atomic64`).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change fixes a packfile heap corruption that can happen when
interacting with multiple packfiles concurrently across multiple
threads. This is exacerbated by setting a lower mwindow open file limit.
This change:
* Renames most of the internal methods in pack.c to clearly indicate
that they expect to be called with a certain lock held, making
reasoning about the state of locks a bit easier.
* Splits the `git_pack_file` lock in two: the one in `git_pack_file`
only protects the `index_map`. The protection to `git_mwindow_file` is
now in that struct.
* Explicitly checks for freshness of the `git_pack_file` in
`git_packfile_unpack_header`: this allows the mwindow implementation
to close files whenever there is enough cache pressure, and
`git_packfile_unpack_header` will reopen the packfile if needed.
* After a call to `p_munmap()`, the `data` and `len` fields are poisoned
with `NULL` to make use-after-frees more evident and crash rather than
being open to the possibility of heap corruption.
* Adds a test case to prevent this from regressing in the future.
Fixes: #5591
|
|
|
|
|
|
|
| |
This change adds all the necessary locking to the odb to avoid races in
the backends.
Part of: #5592
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds support for reading multi-pack-index files from the
packfile odb backend. This also makes git_pack_file objects open their
backing failes lazily in more scenarios, since the multi-pack-index can
avoid having to open them in some cases (yay!).
This change also refreshes the documentation found in src/odb_pack.c to
match the updated code.
Part of: #5399
|
| |
|
|
|
|
|
|
|
| |
This change is the first in a series to add support for git's
multi-pack-index. This should speed up large repositories significantly.
Part of: #5399
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our processing loop in git_zstream_get_output_chunk does not handle
`Z_BUF_ERROR` appropriately at the end of a compressed window.
From the zlib manual, inflate will return:
> Z_BUF_ERROR if no progress was possible or if there was not enough
> room in the output buffer when Z_FINISH is used. Note that Z_BUF_ERROR
> is not fatal, and inflate() can be called again with more input and
> more output space to continue decompressing.
In our loop, we were waiting until we got the expected size, then
ensuring that we were at `Z_STREAM_END`. We are not guaranteed to be,
since zlib may be in the `Z_BUF_ERROR` state where it has consumed a
full window's worth of data, but it doesn't know that it's really at the
end of the stream. There _could_ be more compressed data, but it
doesn't _know_ that there's not until we make a subsequent call.
We can change the loop to look for the end of stream instead of our
expected size. This allows us to call inflate one last time when we are
at the end of a window (and in the `Z_BUF_ERROR` state), allowing it to
recognize the end of the stream, and move from the `Z_BUF_ERROR` state
to the `Z_STREAM_END` state.
If we do this, we need another exit condition: when `bytes == 0`, then
no progress could be made and we should stop trying to inflate. This
will be an error case, caught by the size and/or end-of-stream test.
|
| |
|
|
|
|
|
| |
This makes get_delta_base() return the error code as the return value
and the delta base as an out-parameter.
|
|
|
|
|
|
|
|
| |
This change moves the responsibility of setting the error upon failures
of get_delta_base() to get_delta_base() instead of its callers. That
way, the caller chan always check if the return value is negative and
mark the whole operation as an error instead of using garbage values,
which can lead to crashes if the .pack files are malformed.
|
|
|
|
|
|
|
| |
The file "sha1_lookup.c" contains a single function `sha1_position`
only which is used only in the packfile implementation. As the function
is comparatively small, to enable the compiler to optimize better and to
remove symbol visibility, move it into "pack.c".
|
|
|
|
|
|
| |
While we do have a `git_zstream` abstraction that encapsulates all the
calls to zlib as well as its error handling, we do not use it in our
pack file code. Refactor it to make the code a lot easier to understand.
|
|
|
|
|
|
| |
While we do have a zstream abstraction that encapsulates all the calls
to zlib as well as its error handling, we do not use it in our pack file
code. Refactor it to make the code a lot easier to understand.
|
|
|
|
| |
Prefer `off64_t` internally.
|
|
|
|
|
|
|
|
|
| |
Our file utils functions all have a "futils" prefix, e.g.
`git_futils_touch`. One would thus naturally guess that their
definitions and implementation would live in files "futils.h" and
"futils.c", respectively, but in fact they live in "fileops.h".
Rename the files to match expectations.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, there is only one caller that adds entries into an offset map, and
this caller first uses `git_offmap_put` to add a key and then set the value at
the returned index by using `git_offmap_set_value_at`. This is just too tighlty
coupled with implementation details of the map as it exposes the index of
inserted entries, which we really do not care about at all.
Introduce a new function `git_offmap_set`, which takes as parameters the map,
key and value and directly returns an error code. Convert the caller to make use
of it instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current way of looking up an entry from a map is tightly coupled with the
map implementation, as one first has to look up the index of the key and then
retrieve the associated value by using the index. As a caller, you usually do
not care about any indices at all, though, so this is more complicated than
really necessary. Furthermore, it invites for errors to happen if the correct
error checking sequence is not being followed.
Introduce a new high-level function `git_offmap_get` that takes a map and a key
and returns a pointer to the associated value if such a key exists. Otherwise,
a `NULL` pointer is returned. Adjust all callers that can trivially be
converted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current way of looking up an entry from a map is tightly coupled with the
map implementation, as one first has to look up the index of the key and then
retrieve the associated value by using the index. As a caller, you usually do
not care about any indices at all, though, so this is more complicated than
really necessary. Furthermore, it invites for errors to happen if the correct
error checking sequence is not being followed.
Introduce a new high-level function `git_oidmap_get` that takes a map and a key
and returns a pointer to the associated value if such a key exists. Otherwise,
a `NULL` pointer is returned. Adjust all callers that can trivially be
converted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the lifecycle functions for maps (allocation, deallocation, resize)
are not named in a uniform way and do not have a uniform function signature.
Rename the functions to fix that, and stick to libgit2's naming scheme of saying
`git_foo_new`. This results in the following new interface for allocation:
- `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an
error code if we ran out of memory
- `void git_<t>map_free(git_<t>map *map)` to free a map
- `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map
This commit also fixes all existing callers.
|
|
|
|
| |
'GIT_OPT_IGNORE_PACK_KEEP_FILE_CHECK'
|
|
|
|
|
| |
Move to the `git_error` name in the internal API for error-related
functions.
|
|
|
|
|
|
|
| |
We use the term "invalid" to refer to bad or malformed data, eg
`GIT_REF_INVALID` and `GIT_EINVALIDSPEC`. Since we're changing the
names of the `git_object_t`s in this release, update it to be
`GIT_OBJECT_INVALID` instead of `BAD`.
|
|
|
|
| |
Use the new object_type enumeration names within the codebase.
|
|
|
|
|
|
|
| |
Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types,
simply use `size_t` instead. This decouples code from the khash stuff
and makes it possible to move the khash includes into the implementation
files.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function `git_packfile_stream_free` frees all state of the packfile
stream without freeing the structure itself. This naming makes it hard
to spot whether it will try to free the pointer itself or not, causing
potential future errors. Due to this reason, we have decided to name a
function freeing state without freeing the actual struture a "dispose"
function.
Rename `git_packfile_stream_free` to `git_packfile_stream_dispose` as a
first example following this rule.
|
|
|
|
|
|
|
|
|
|
|
| |
If an element has been cached, but then the call to
packfile_unpack_compressed() fails, the very next thing that happens is
that its data is freed and then the element is not removed from the
cache, which frees the data again.
This change sets obj->data to NULL to avoid the double-free. It also
stops trying to resolve deltas after two continuous failed rounds of
resolution, and adds a test for this.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
The type of |base_offset| in get_delta_base() is `git_off_t`, which is a
signed `long`. That means that we need to make sure that the 8 most
significant bits are zero (instead of 7) to avoid an overflow when it is
shifted by 7 bits.
Found using libFuzzer.
|
|\
| |
| | |
Include fixups
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.
This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.
This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
This was pulled over from git.git, and is an experiment in
making binary-searching lists of sha1s faster. It was never
compiled by default (nor was it used upstream by default
without a special environment variable).
Unfortunately, it is actually slower in practice, and
upstream is planning to drop it in
git/git@f1068efefe6dd3beaa89484db5e2db730b094e0b (which has
some timing results). It's worth doing the same here for
simplicity.
|
|
|
|
|
|
|
|
| |
The `git_buf_init` function has an optional length parameter, which will
cause the buffer to be initialized and allocated in one step. This can
be used instead of static initialization with `GIT_BUF_INIT` followed by
a `git_buf_grow`. This patch does so for two functions where it is
applicable.
|
|
|
|
|
|
|
|
| |
The function `git_buf_try_grow` consistently calls `giterr_set_oom`
whenever growing the buffer fails due to insufficient memory being
available. So in fact, we do not have to do this ourselves when a call
to any buffer-growing function has failed due to an OOM situation. But
we still do so in two functions, which this patch cleans up.
|
|
|
|
|
|
| |
Fixes a regression from #4092. This is a crash on 32-bit and I assume that
it doesn't do the right thing on 64-bit either. MSVC emits a warning for this,
but of course, it's easy to get lost among all of the similar 'possible loss
of data' warnings.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Encapsulation!
|
|
|
|
|
|
|
|
| |
Error messages should be sentence fragments, and therefore:
1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
|