delta/libgit2.git - github.com: libgit2/libgit2.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	object: return GIT_EINVALID on parse errors	Edward Thomson	2021-11-30	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Return `GIT_EINVALID` on parse errors so that direct callers of parse functions can determine when there was a failure to parse the object. The object parser functions will swallow this error code to prevent it from propagating down the chain to end-users. (`git_merge` should not return `GIT_EINVALID` when a commit it tries to look up is not valid, this would be too vague to be useful.) The only public function that this affects is `git_signature_from_buffer`, which is now documented as returning `GIT_EINVALID` when appropriate.
*	filebuf: use hashes not oids	Edward Thomson	2021-11-22	1	-1/+1
\| \| \| \| \|	The filebuf functions should use hashes directly, not indirectly using the oid functions.
*	str: introduce `git_str` for internal, `git_buf` is externalethomson/gitstr	Edward Thomson	2021-10-17	1	-33/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
*	hash: hash functions operate on byte arrays not git_oids	Edward Thomson	2021-10-02	1	-3/+3
\| \| \| \| \| \|	Separate the concerns of the hash functions from the git_oid functions. The git_oid structure will need to understand either SHA1 or SHA256; the hash functions should only deal with the appropriate one of these.
*	hash: accept the algorithm in inputs	Edward Thomson	2021-10-01	1	-2/+2
\|
*	Avoid double negatives in the justification for truncation	lhchavez	2021-09-08	1	-2/+5
\| \| \| \| \|	Turns out, double negatives are harder to parse than positive statements.
*	Only avoid `mmap(2)`/`ftruncate(2)` when in non-Windows	lhchavez	2021-09-08	1	-2/+84
\| \| \| \| \| \| \| \| \| \|	It turns out that if we use `mmap(2)`, non-Windows remote filesystems break due to permissions. If we don't, _Windows_ remote filesystems break due to lack of coherence between memory mapped views of the file and direct I/O operations done to the files. To break out of this impossible situation, conditionally-compile versions of Windows-specific `write_at` and `append_to_pack`.
*	Also remove a `ftruncate(2)` call in `git_indexer_commit`	lhchavez	2021-09-07	1	-6/+0
\| \| \| \| \| \|	Now that we're not using `mmap(2)` for writing stuff, we don't need to truncate the file afterwards, since it'll have the correct size at the end of the process. Whee~!
*	indexer: Avoid one `mmap(2)`/`munmap(2)` pair per `git_indexer_append` call	lhchavez	2021-09-07	1	-47/+2
\| \| \| \| \| \| \| \| \|	This change makes `append_to_pack` completely rely on `p_pwrite` to do all its I/O instead of splitting it between `p_pwrite` and a `mmap(2)`/`munmap(2)`+`memcpy(3)`. This saves a good chunk of user CPU time and avoids making two syscalls per round, but doesn't really cut down a lot of wall time (~1% on cloning the [git](https://github.com/git/git.git) repository).
*	Use `p_pwrite`/`p_pread` consistently throughout the codebase	lhchavez	2021-01-07	1	-3/+1
\| \| \| \| \| \|	This change stops using the seek+read/write combo to perform I/O with an offset, since this is faster by one system call (and also more atomic and therefore safer).
*	Support build with NO_MMAP to disable use of system mmap	Dhruva Krishnamurthy	2020-12-30	1	-0/+18
\| \| \| \| \|	* Use pread/pwrite to avoid updating position in file descriptor * Emulate missing pread/pwrite on win32 using overlapped file IO
*	Make the pack and mwindow implementations data-race-free	lhchavez	2020-11-28	1	-11/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change fixes a packfile heap corruption that can happen when interacting with multiple packfiles concurrently across multiple threads. This is exacerbated by setting a lower mwindow open file limit. This change: * Renames most of the internal methods in pack.c to clearly indicate that they expect to be called with a certain lock held, making reasoning about the state of locks a bit easier. * Splits the `git_pack_file` lock in two: the one in `git_pack_file` only protects the `index_map`. The protection to `git_mwindow_file` is now in that struct. * Explicitly checks for freshness of the `git_pack_file` in `git_packfile_unpack_header`: this allows the mwindow implementation to close files whenever there is enough cache pressure, and `git_packfile_unpack_header` will reopen the packfile if needed. * After a call to `p_munmap()`, the `data` and `len` fields are poisoned with `NULL` to make use-after-frees more evident and crash rather than being open to the possibility of heap corruption. * Adds a test case to prevent this from regressing in the future. Fixes: #5591
*	pack: use GIT_ASSERT	Edward Thomson	2020-11-27	1	-2/+2
\|
*	mwindow: use GIT_ASSERT	Edward Thomson	2020-11-27	1	-10/+16
\|
*	indexer: use GIT_ASSERT	Edward Thomson	2020-11-27	1	-6/+10
\|
*	Make the tests pass cleanly with MemorySanitizer	lhchavez	2020-06-30	1	-1/+4
\| \| \| \| \| \| \| \| \|	This change: * Initializes a few variables that were being read before being initialized. * Includes https://github.com/madler/zlib/pull/393. As such, it only works reliably with `-DUSE_BUNDLED_ZLIB=ON`.
*	tree-wide: do not compile deprecated functions with hard deprecation	Patrick Steinhardt	2020-06-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When compiling libgit2 with -DDEPRECATE_HARD, we add a preprocessor definition `GIT_DEPRECATE_HARD` which causes the "git2/deprecated.h" header to be empty. As a result, no function declarations are made available to callers, but the implementations are still available to link against. This has the problem that function declarations also aren't visible to the implementations, meaning that the symbol's visibility will not be set up correctly. As a result, the resulting library may not expose those deprecated symbols at all on some platforms and thus cause linking errors. Fix the issue by conditionally compiling deprecated functions, only. While it becomes impossible to link against such a library in case one uses deprecated functions, distributors of libgit2 aren't expected to pass -DDEPRECATE_HARD anyway. Instead, users of libgit2 should manually define GIT_DEPRECATE_HARD to hide deprecated functions. Using "real" hard deprecation still makes sense in the context of CI to test we don't use deprecated symbols ourselves and in case a dependant uses libgit2 in a vendored way and knows it won't ever use any of the deprecated symbols anyway.
*	Making get_delta_base() conform to the general error-handling pattern	lhchavez	2020-04-01	1	-3/+4
\| \| \| \| \|	This makes get_delta_base() return the error code as the return value and the delta base as an out-parameter.
*	indexer: check return code of `git_hash_ctx_init`	Patrick Steinhardt	2020-02-07	1	-3/+3
\| \| \| \| \| \| \| \| \|	Initialization of the hashing context may fail on some systems, most notably on Win32 via the legacy hashing context. As such, we need to always check the error code of `git_hash_ctx_init`, which is not done when creating a new indexer. Fix the issue by adding checks.
*	internal: use off64_t instead of git_off_tethomson/off_t	Edward Thomson	2019-11-25	1	-19/+19
\| \| \| \|	Prefer `off64_t` internally.
*	indexer: catch OOM when adding expected OIDs	Patrick Steinhardt	2019-08-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	When adding OIDs to the indexer's map of yet-to-be-seen OIDs to verify that packfiles are complete, we do so by first allocating a new OID and then calling `git_oidmap_set` on it. There was no check for memory allocation errors in place, though, leading to possible segfaults due to trying to copy data to a `NULL` pointer. Verify the result of `git__malloc` with `GIT_ERROR_CHECK_ALLOC` to fix the issue.
*	Rename opt init functions to `options_init`	Edward Thomson	2019-06-14	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	In libgit2 nomenclature, when we need to verb a direct object, we name a function `git_directobject_verb`. Thus, if we need to init an options structure named `git_foo_options`, then the name of the function that does that should be `git_foo_options_init`. The previous names of `git_foo_init_options` is close - it _sounds_ as if it's initializing the options of a `foo`, but in fact `git_foo_options` is its own noun that should be respected. Deprecate the old names; they'll now call directly to the new ones.
*	indexer: use git_indexer_progress throughout	Edward Thomson	2019-02-22	1	-8/+8
\| \| \| \| \|	Update internal usage of `git_transfer_progress` to `git_indexer_progreses`.
*	indexer: use map iterator to delete expected OIDs	Patrick Steinhardt	2019-02-15	1	-16/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	To compute whether there are objects missing in a packfile, the indexer keeps around a map of OIDs that it still expects to see. This map does not store any values at all, but in fact the keys are owned by the map itself. Right now, we free these keys by iterating over the map and freeing the key itself, which is kind of awkward as keys are expected to be constant. We can make this a bit prettier by inserting the OID as value, too. As we already store the `NULL` pointer either way, this does not increase memory usage, but makes the code a tad more clear. Furthermore, we convert the previously existing map iteration via indices to make use of an iterator, instead.
*	oidmap: introduce high-level setter for key/value pairs	Patrick Steinhardt	2019-02-15	1	-23/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, one would use either `git_oidmap_insert` to insert key/value pairs into a map or `git_oidmap_put` to insert a key only. These function have historically been macros, which is why their syntax is kind of weird: instead of returning an error code directly, they instead have to be passed a pointer to where the return value shall be stored. This does not match libgit2's common idiom of directly returning error codes.Furthermore, `git_oidmap_put` is tightly coupled with implementation details of the map as it exposes the index of inserted entries. Introduce a new function `git_oidmap_set`, which takes as parameters the map, key and value and directly returns an error code. Convert all trivial callers of `git_oidmap_insert` and `git_oidmap_put` to make use of it.
*	maps: use uniform lifecycle management functions	Patrick Steinhardt	2019-02-15	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the lifecycle functions for maps (allocation, deallocation, resize) are not named in a uniform way and do not have a uniform function signature. Rename the functions to fix that, and stick to libgit2's naming scheme of saying `git_foo_new`. This results in the following new interface for allocation: - `int git_<t>map_new(git_<t>map *out)` to allocate a new map, returning an error code if we ran out of memory - `void git_<t>map_free(git_<t>map map)` to free a map - `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map This commit also fixes all existing callers.
*	git_error: use new names in internal APIs and usage	Edward Thomson	2019-01-22	1	-32/+32
\| \| \| \| \|	Move to the `git_error` name in the internal API for error-related functions.
*	object_type: use new enumeration namesethomson/index_fixes	Edward Thomson	2018-12-01	1	-17/+17
\| \| \| \|	Use the new object_type enumeration names within the codebase.
*	khash: remove intricate knowledge of khash types	Patrick Steinhardt	2018-11-28	1	-3/+3
\| \| \| \| \| \| \|	Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types, simply use `size_t` instead. This decouples code from the khash stuff and makes it possible to move the khash includes into the implementation files.
*	Merge pull request #4374 from pks-t/pks/pack-file-verify	Edward Thomson	2018-08-26	1	-89/+259
\|\ \| \| \| \|	Pack file verification
\| *	indexer: correctly initialize struct with {0}	Patrick Steinhardt	2018-06-22	1	-1/+1
\| \|
\| *	indexer: add ability to select connectivity checks	Patrick Steinhardt	2018-06-22	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now, we simply turn on connectivity checks in the indexer as soon as we have access to an object database. But seeing that the connectivity checks may incur additional overhead, we do want the user to decide for himself whether he wants to allow those checks. Furthermore, it might also be desirable to check connectivity in case where no object database is given at all, e.g. in case where a fully connected pack file is expected. Add a flag `verify` to `git_indexer_options` to enable additional verification checks. Also avoid to query the ODB in case none is given to allow users to enable checks when they do not have an ODB.
\| *	indexer: introduce options struct to `git_indexer_new`	Patrick Steinhardt	2018-06-22	1	-4/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We strive to keep an options structure to many functions to be able to extend options in the future without breaking the API. `git_indexer_new` doesn't have one right now, but we want to be able to add an option for enabling strict packfile verification. Add a new `git_indexer_options` structure and adjust callers to use that.
\| *	indexer: check pack file connectivity	Patrick Steinhardt	2018-06-22	1	-1/+151
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When passing `--strict` to `git-unpack-objects`, core git will verify the pack file that is currently being read. In addition to the typical checksum verification, this will especially cause it to verify object connectivity of the received pack file. So it checks, for every received object, if all the objects it references are either part of the local object database or part of the pack file. In libgit2, we currently have no such mechanism, which leaves us unable to verify received pack files prior to writing them into our local object database. This commit introduce the concept of `expected_oids` to the indexer. When pack file verification is turned on by a new flag, the indexer will try to parse each received object first. If the object has any links to other objects, it will check if those links are already satisfied by known objects either part of the object database or objects it has already seen as part of that pack file. If not, it will add them to the list of `expected_oids`. Furthermore, the indexer will remove the current object from the `expected_oids` if it is currently being expected. Like this, we are able to verify whether all object links are being satisfied. As soon as we hit the end of the object stream and have resolved all objects as well as deltified objects, we assert that `expected_oids` is in fact empty. This should always be the case for a valid pack file with full connectivity.
\| *	indexer: extract function reading stream objects	Patrick Steinhardt	2018-06-22	1	-78/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The loop inside of `git_indexer_append` iterates over every object that is to be stored as part of the index. While the logic to retrieve every object from the packfile stream is rather involved, it currently just part of the loop, making it unnecessarily hard to follow. Move the logic into its own function `read_stream_object`, which unpacks a single object from the stream. Note that there is some subtletly here involving the special error `GIT_EBUFS`, which indicates to the indexer that no more data is currently available. So instead of returning an error and aborting the whole loop in that case, we do have to catch that value and return successfully to wait for more data to be read.
\| *	indexer: remove useless local variable	Patrick Steinhardt	2018-06-22	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The `processed` variable local to `git_indexer_append` counts how many objects have already been processed. But actually, whenever it gets assigned to, we are also assigning the same value to the `stats->indexed_objects` struct member. So in fact, it is being quite useless due to always having the same value as the `indexer_objects` member and makes it a bit harder to understand the code. We can just remove the variable to fix that.
* \|	Use UINT32_MAX as the default object limit	Nelson Elhage	2018-07-20	1	-1/+1
\| \| \| \| \| \|	This replicates the old behavior of limiting to 2³² by default.
* \|	See if this fixes 32-bit build	Nelson Elhage	2018-07-16	1	-1/+1
\| \|
* \|	Add a git_libgit2_opts option to set the max indexer object count	Nelson Elhage	2018-07-12	1	-9/+6
\| \|
* \|	while fuzzing, limit # objects read	Nelson Elhage	2018-07-12	1	-0/+5
\|/
*	Convert usage of `git_buf_free` to new `git_buf_dispose`	Patrick Steinhardt	2018-06-10	1	-7/+7
\|
*	pack: rename `git_packfile_stream_free`	Patrick Steinhardt	2018-06-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	The function `git_packfile_stream_free` frees all state of the packfile stream without freeing the structure itself. This naming makes it hard to spot whether it will try to free the pointer itself or not, causing potential future errors. Due to this reason, we have decided to name a function freeing state without freeing the actual struture a "dispose" function. Rename `git_packfile_stream_free` to `git_packfile_stream_dispose` as a first example following this rule.
*	odb: error when we can't create object header	Edward Thomson	2018-02-09	1	-4/+11
\| \| \| \| \|	Return an error to the caller when we can't create an object header for some reason (printf failure) instead of simply asserting.
*	Fix unpack double free	lhchavez	2017-12-23	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \|	If an element has been cached, but then the call to packfile_unpack_compressed() fails, the very next thing that happens is that its data is freed and then the element is not removed from the cache, which frees the data again. This change sets obj->data to NULL to avoid the double-free. It also stops trying to resolve deltas after two continuous failed rounds of resolution, and adds a test for this.
*	libFuzzer: Fix missing trailer crash	lhchavez	2017-12-08	1	-0/+4
\| \| \| \| \| \| \|	This change fixes an invalid memory access when the trailer is missing / corrupt. Found using libFuzzer.
*	libFuzzer: Fix a git_packfile_stream leak	lhchavez	2017-12-06	1	-0/+3
\| \| \| \| \| \| \|	This change ensures that the git_packfile_stream object in git_indexer_append() does not leak when the stream has errors. Found using libFuzzer.
*	Make sure to always include "common.h" first	Patrick Steinhardt	2017-07-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
*	Merge pull request #4088 from chescock/packfile-name-using-complete-hash	Edward Thomson	2017-06-11	1	-8/+4
\|\ \| \| \| \|	Ensure packfiles with different contents have different names
\| *	indexer: name pack files after trailer hash	Chris Hescock	2017-05-19	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Upstream git.git has changed the way how packfiles are named. Previously, they were using a hash of the contained object's OIDs, which has then been changed to use the hash of the complete packfile instead. See 1190a1acf (pack-objects: name pack files after trailer hash, 2013-12-05) in the git.git repository for more information on this change. This commit changes our logic to match the behavior of core git.
* \|	settings: rename `GIT_OPT_ENABLE_SYNCHRONOUS_OBJECT_CREATION`	Patrick Steinhardt	2017-06-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Initially, the setting has been solely used to enable the use of `fsync()` when creating objects. Since then, the use has been extended to also cover references and index files. As the option is not yet part of any release, we can still correct this by renaming the option to something more sensible, indicating not only correlation to objects. This commit renames the option to `GIT_OPT_ENABLE_FSYNC_GITDIR`. We also move the variable from the object to repository source code.