summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | config: add config values for packed-refs v2Derrick Stolee2022-11-079-9/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When updating the file format version for something as critical as ref storage, the file format version must come with an extension change. The extensions.refFormat config value is a multi-valued config value that defaults to the pair "files" and "packed". Add "packed-v2" as a possible value to extensions.refFormat. This value specifies that the packed-refs file may exist in the version 2 format. (If the "packed" value does not exist, then the packed-refs file must exist in version 2, not version 1.) In order to select version 2 for writing, the user will have two options. First, the user could remove "packed" and add "packed-v2" to the extensions.refFormat list. This would imply that version 2 is the only format available. However, this also means that version 1 files would be ignored at read time, so this does not allow users to upgrade repositories with existing packed-refs files. Add a new refs.packedRefsVersion config option which allows specifying which version to use during writes. Thus, when both "packed" and "packed-v2" are in the extensions.refFormat list, the user can upgrade from version 1 to version 2, or downgrade from 2 to 1. Currently, the implementation does not use refs.packedRefsVersion, as that is delayed until we have the code to write that file format version. However, we can add the necessary enum values and flag constants to communicate the presence of "packed-v2" in the extensions.refFormat list. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | packed-backend: create abstraction for writing refsDerrick Stolee2022-11-073-15/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The packed-refs file is a plaintext file format that starts with a header line, then each ref is given as one or two lines (two if there is a peeled value). These lines are written as part of a sequence of updates which are merged with the existing ref iterator in merge_iterator_and_updates(). That method is currently tied directly to write_packed_entry_v1(). When creating a new version of the packed-file format, it would be valuable to use this merging logic in an identical way. Create a new function pointer type, write_ref_fn, and use that type in merge_iterator_and_updates(). Notably, the function pointer type no longer depends on a FILE pointer, but instead takes an arbitrary "void *write_data" parameter. This flexibility will be critical in the future, since the planned v2 format will use the chunk-format API and need a more complicated structure than the output FILE. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | packed-backend: extract iterator/updates mergeDerrick Stolee2022-11-071-53/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TBD Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | packed-backend: extract add_write_error()Derrick Stolee2022-11-071-10/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The write_with_updates() method uses a write_error label to jump to code that adds an error message before exiting with an error. This appears both when the packed-refs file header is written, but also when a ref line is written to the packed-refs file. A future change will abstract the loop that writes the refs out of write_with_updates(), making the goto an inconvenient pattern. For now, remove the distinction between "goto write_error" and "goto error" by adding the message in-line using the new static method add_write_error(). This is functionally equivalent, but will make the next step easier. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | refs: extract packfile format to new fileDerrick Stolee2022-11-074-577/+667
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for adding a new packed-refs file format, extract all code from refs/packed-backend.c that involves knowledge of the plaintext file format. This includes any parsing logic that cares about the header, plaintext lines of the form "<oid> <ref>" or "^<peeled>", and the error messages when there is an issue in the file. This also includes the writing logic that writes the header or the individual references. Future changes will perform more refactoring to abstract away more of the writing process to be more generic, but this is enough of a chunk of code movement. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | chunk-format: parse trailing table of contentsDerrick Stolee2022-11-072-0/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new read_trailing_table_of_contents() mimics read_table_of_contents() except that it reads the table of contents in reverse from the end of the given hashfile. The file is given as a memory-mapped section of memory and a size. Automatically calculate the start of the trailing hash and read the table of contents in revers from that position. The errors come along from those in read_table_of_contents(). The one exception is that the chunk_offset cannot be checked as going into the table of contents since we do not have that length automatically. That may have some surprising results for some narrow forms of corruption. However, we do still limit the size to the size of the file plus the part of the table of contents read so far. At minimum, the given sizes can be used to limit parsing within the file itself. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | chunk-format: allow trailing table of contentsDerrick Stolee2022-11-074-19/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing chunk formats use the table of contents at the beginning of the file. This is intended as a way to speed up the initial loading of the file, but comes at a cost during writes. Each example needs to fully compute how big each chunk will be in advance, which usually requires storing the full file contents in memory. Future file formats may want to use the chunk format API in cases where the writing stage is critical to performance, so we may want to stream updates from an existing file and then only write the table of contents at the end. Add a new 'flags' parameter to write_chunkfile() that allows this behavior. When this is specified, the defensive programming that checks that the chunks are written with the precomputed sizes is disabled. Then, the table of contents is written in reverse order at the end of the hashfile, so a parser can read the chunk list starting from the end of the file (minus the hash). The parsing of these table of contents will come in a later change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | chunk-format: store chunk offset during writeDerrick Stolee2022-11-071-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As a preparatory step to allowing trailing table of contents, store the offsets of each chunk as we write them. This replaces an existing use of a local variable, but the stored value will be used in the next change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | chunk-format: document trailing table of contentsDerrick Stolee2022-11-071-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It will be helpful to allow a trailing table of contents when writing some file types with the chunk-format API. The main reason is that it allows dynamically computing the chunk sizes while writing the file. This can use fewer resources than precomputing all chunk sizes in advance. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | chunk-format: number of chunks is optionalDerrick Stolee2022-11-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even though the commit-graph and multi-pack-index file formats specify a number of chunks in their header information, this is optional. The table of contents terminates with a null chunk ID, which can be used instead. The extra value is helpful for some checks, but is ultimately not necessary for the format. This will be important in some future formats. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | refs: allow loose files without packed-refsDerrick Stolee2022-11-075-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The extensions.refFormat extension is a multi-valued config that specifies which ref formats are available to the current repository. By default, Git assumes the list of "files" and "packed", unless there is at least one of these extensions specified. With the current values, it is possible for a user to specify only "files" or only "packed". The only-"packed" option was already ruled as invalid since Git's current code has too many places that require a loose reference. This could change in the future. However, we can now allow the user to specify extensions.refFormat=files alone, making it impossible to create a packed-refs file (or to read one that might exist). Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | repository: wire ref extensions to ref backendsDerrick Stolee2022-11-078-4/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous change introduced the extensions.refFormat config option. It is a multi-valued config option that currently understands "files" and "packed", with both values assumed by default. If any value is provided explicitly, this default is ignored and the provided settings are used instead. The multi-valued nature of this extension presents a way to allow a user to specify that they never want a packed-refs file (only use "files") or that they never want loose reference files (only use "packed"). However, that functionality is not currently connected. Before actually modifying the files backend to understand these extension settings, do the basic wiring that connects the extensions.refFormat parsing to the creation of the ref backend. A future change will actually change the ref backend initialization based on these settings, but this communication of the extension is sufficiently complicated to be worth an isolated change. For now, also forbid the setting of only "packed". This is done by redirecting the choice of backend to the packed backend when that selection is made. A later change will make the "files"-only extension value ignore the packed backend. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | config: fix multi-level bulleted listDerrick Stolee2022-11-071-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The documentation for 'extensions.worktreeConfig' includes a bulletted list describing certain config values that need to be moved into the worktree config instead of the repository config file. However, since we are already in a bulletted list, the documentation tools do not know when that inner list is complete. Paragraphs intended to not be part of that inner list are rendered as part of the last bullet. Modify the format to match a similar doubly-nested list from the 'column.ui' config documentation. Reword the descriptions slightly to make the config keys appear as their own heading in the inner list. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | extensions: add refFormat extensionDerrick Stolee2022-11-073-0/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Git's reference storage is critical to its function. Creating new storage formats for references requires adding an extension. This prevents third-party tools that do not understand that format from operating incorrectly on the repository. This makes updating ref formats more difficult than other optional indexes, such as the commit-graph or multi-pack-index. However, there are a number of potential ref storage enhancements that are underway or could be created. Git needs an established mechanism for coordinating between these different options. The first obvious format update is the reftable format as documented in Documentation/technical/reftable.txt. This format has much of its implementation already in Git, but its connection as a ref backend is not complete. This change is similar to some changes within one of the patches intended for the reftable effort [1]. [1] https://lore.kernel.org/git/pull.1215.git.git.1644351400761.gitgitgadget@gmail.com/ However, this change makes a distinct strategy change from the one recommended by reftable. Here, the extensions.refFormat extension is provided as a multi-valued list. In the reftable RFC, the extension has a single value, "files" or "reftable" and explicitly states that this should not change after 'git init' or 'git clone'. The single-valued approach has some major drawbacks, including the idea that the "files" backend cannot coexist with the "reftable" backend at the same time. In this way, it would not be possible to create a repository that can write loose references and combine them into a reftable in the background. With the multi-valued approach, we could integrate reftable as a drop-in replacement for the packed-refs file and allow that to be a faster way to do the integration since the test suite would only need updates when the test is explicitly testing packed-refs. When upgrading a repository from the "files" backend to the "reftable" backend, it can help to have a transition period where both are present, then finally removing the "files" backend after all loose refs are collected into the reftable. But the reftable is not the only approach available. One obvious improvement could be a new file format version for the packed-refs file. Its current plaintext-based format is inefficient due to storing object IDs as hexadecimal representations instead of in their raw format. This extra cost will get worse with SHA-256. In addition, binary searches need to guess a position and scan to find newlines for a refname entry. A structured binary format could allow for more compact representation and faster access. Adding such a format could be seen as "files-v2", but it is really "packed-v2". The reftable approach has a concept of a "stack" of reftable files. This idea would also work for a stack of packed-refs files (in v1 or v2 format). It would be helpful to describe that the refs could be stored in a stack of packed-ref files independently of whether that is in file format v1 or v2. Even in these two options, it might be helpful to indicate whether or not loose ref files are present. That is one reason to not make them appear as "files-v2" or "files-v3" options in a single-valued extension. Even as "packed-v2" or "packed-v3" options, this approach would require third-party tools to understand the "v2" version if they want to support the "v3" options. Instead, by splitting the format from the layout, we can allow third-party tools to integrate only with the most-desired format options. For these reasons, this change is defining the extensions.refFormat extension as well as how the two existing values interact. By default, Git will assume "files" and "packed" in the list. If any other value is provided, then the extension is marked as unrecognized. Add tests that check the behavior of extensions.refFormat, both in that it requires core.repositoryFormatVersion=1, and Git will refuse to work with an unknown value of the extension. There is a gap in the current implementation, though. What happens if exactly one of "files" or "packed" is provided? The presence of only one would imply that the other is not available. A later change can communicate the list contents to the repository struct and then the reference backend could ignore one of these two layers. Specifically, having only "files" would mean that Git should not read or write the packed-refs file and instead only read and write loose ref files. By contrast, having only "packed" would mean that Git should not read or write loose ref files and instead always update the packed-refs file on every ref update. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | read-cache: add index.computeHash config optionDerrick Stolee2022-11-073-1/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous change allowed skipping the hashing portion of the hashwrite API, using it instead as a buffered write API. Disabling the hashwrite can be particularly helpful when the write operation is in a critical path. One such critical path is the writing of the index. This operation is so critical that the sparse index was created specifically to reduce the size of the index to make these writes (and reads) faster. Following a similar approach to one used in the microsoft/git fork [1], add a new config option that allows disabling this hashing during the index write. The cost is that we can no longer validate the contents for corruption-at-rest using the trailing hash. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 While older Git versions will not recognize the null hash as a special case, the file format itself is still being met in terms of its structure. Using this null hash will still allow Git operations to function across older versions. The one exception is 'git fsck' which checks the hash of the index file. Here, we disable this check if the trailing hash is all zeroes. We add a warning to the config option that this may cause undesirable behavior with older Git versions. As a quick comparison, I tested 'git update-index --force-write' with and without index.computHash=false on a copy of the Linux kernel repository. Benchmark 1: with hash Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms] Range (min … max): 34.3 ms … 79.1 ms 82 runs Benchmark 2: without hash Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms] Range (min … max): 16.3 ms … 42.0 ms 69 runs Summary 'without hash' ran 1.78 ± 0.76 times faster than 'with hash' These performance benefits are substantial enough to allow users the ability to opt-in to this feature, even with the potential confusion with older 'git fsck' versions. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | hashfile: allow skipping the hash functionDerrick Stolee2022-11-072-3/+18
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hashfile API is useful for generating files that include a trailing hash of the file's contents up to that point. Using such a hash is helpful for verifying the file for corruption-at-rest, such as a faulty drive causing flipped bits. Since the commit-graph and multi-pack-index files both use this trailing hash, the chunk-format API uses a 'struct hashfile' to handle the I/O to the file. This was very convenient to allow using the hashfile methods during these operations. However, hashing the file contents during write comes at a performance penalty. It's slower to hash the bytes on their way to the disk than without that step. If we wish to use the chunk-format API to upgrade other file types, then this hashing is a performance penalty that might not be worth the benefit of a trailing hash. For example, if we create a chunk-format version of the packed-refs file, then the file format could shrink by using raw object IDs instead of hexadecimal representations in ASCII. That reduction in size is not enough to counteract the performance penalty of hashing the file contents. In cases such as deleting a reference that appears in the packed-refs file, that write-time performance is critical. This is in contrast to the commit-graph and multi-pack-index files which are mainly updated in non-critical paths such as background maintenance. One way to allow future chunked formats to not suffer this penalty would be to create an abstraction layer around the 'struct hashfile' using a vtable of function pointers. This would allow placing a different representation in place of the hashfile. This option would be cumbersome for a few reasons. First, the hashfile's buffered writes are already highly optimized and would need to be duplicated in another code path. The second is that the chunk-format API calls the chunk_write_fn pointers using a hashfile. If we change that to an abstraction layer, then those that _do_ use the hashfile API would need to change all of their instances of hashwrite(), hashwrite_be32(), and others to use the new abstraction layer. Instead, this change opts for a simpler change. Introduce a new 'skip_hash' option to 'struct hashfile'. When set, the update_fn and final_fn members of the_hash_algo are skipped. When finalizing the hashfile, the trailing hash is replaced with the null hash. This use of a trailing null hash would be desireable in either case, since we do not want to special case a file format to have a different length depending on whether it was hashed or not. When the final bytes of a file are all zero, we can infer that it was written without hashing, and thus that verification is not available as a check for file consistency. This also means that we could easily toggle hashing for any file format we desire. For the commit-graph and multi-pack-index file, it may be possible to allow the null hash without incrementing the file format version, since it technically fits the structure of the file format. The only issue is that older versions would trigger a failure during 'git fsck'. For these file formats, we may want to delay such a change until it is justified. However, the index file is written in critical paths. It is also frequently updated, so corruption at rest is less likely to be an issue than in those other file formats. This could be a good candidate to create an option that skips the hashing operation. A version of this patch has existed in the microsoft/git fork since 2017 [1] (the linked commit was rebased in 2018, but the original dates back to January 2017). Here, the change to make the index use this fast path is delayed until a later change. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 Co-authored-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | Merge branch 'ja/worktree-orphan' into jchTaylor Blau2022-11-183-8/+115
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'git worktree add' learned how to create a worktree based on an orphaned branch with `--orphan`. * ja/worktree-orphan: worktree add: add --orphan flag worktree add: Include -B in usage docs
| * | | | | | worktree add: add --orphan flagJacob Abel2022-11-103-8/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for creating an orphan branch when adding a new worktree. This functionality is equivalent to git switch's --orphan flag. The original reason this feature was implemented was to allow a user to initialise a new repository using solely the worktree oriented workflow. Example usage included below. $ GIT_DIR=".git" git init --bare $ git worktree add --orphan master master/ Signed-off-by: Jacob Abel <jacobabel@nullpo.dev> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | worktree add: Include -B in usage docsJacob Abel2022-11-102-2/+2
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While -B behavior is already documented, it was not included in the usage docs for either the man page or the help text. This change fixes that and brings the usage docs in line with how the flags are documented in other commands such as git checkout. Signed-off-by: Jacob Abel <jacobabel@nullpo.dev> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | Merge branch 'aw/complete-case-insensitive' into jchTaylor Blau2022-11-182-3/+80
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a case insensitive mode to the Bash completion helpers. * aw/complete-case-insensitive: completion: add case-insensitive match of pseudorefs completion: add optional ignore-case when matching refs
| * | | | | | completion: add case-insensitive match of pseudorefsAlison Winters2022-11-072-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When GIT_COMPLETION_IGNORE_CASE=1, also allow lowercase completion text like "head" to match HEAD and other pseudorefs. Signed-off-by: Alison Winters <alisonatwork@outlook.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | completion: add optional ignore-case when matching refsAlison Winters2022-11-072-0/+57
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If GIT_COMPLETION_IGNORE_CASE=1 is set, --ignore-case will be added to git for-each-ref calls so that branches and tags can be matched case insensitively. Signed-off-by: Alison Winters <alisonatwork@outlook.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | Merge branch 'rr/long-status-advice' into jchTaylor Blau2022-11-183-5/+152
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The advice message emitted by a slow "status" run is amended to mention fsmonitor. * rr/long-status-advice: status: long status advice adapted to recent capabilities
| * | | | | | status: long status advice adapted to recent capabilitiesRudy Rigot2022-11-153-5/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve the advice displayed when `git status` is slow because of excessive numbers of untracked files. Update the `git status` man page to explain the various configuration options. `git status` can be slow when there are a large number of untracked files and directories, because Git must search the entire worktree to enumerate them. Previously, Git would print an advice message with the elapsed search time and a suggestion to disable the search using the `-uno` option. This suggestion also carried a warning that might scare off some users. Git can reduce the size and time of the untracked file search when the `core.untrackedCache` and `core.fsmonitor` features are enabled by caching results from previous `git status` invocations. Update the advice to explain the various combinations of additional configuration options and refer to (new) documentation in the man page that explains it in more detail than what can be printed in an advice message. Finally, add new tests to verify the new functionality. Signed-off-by: Rudy Rigot <rudy.rigot@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | | Merge branch 'sz/macos-fsmonitor-symlinks' into jchTaylor Blau2022-11-181-1/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix an issue where core.fsmonitor on macOS would not notice created or modified symbolic links. * sz/macos-fsmonitor-symlinks: fsmonitor--daemon: on macOS support symlink
| * | | | | | | fsmonitor--daemon: on macOS support symlinksrz_zumix2022-11-081-1/+1
| | |/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolves a problem where symbolic links were not showing up in diff when created or modified. kFSEventStreamEventFlagItemIsSymlink is also treated as a file update. This is because kFSEventStreamEventFlagItemIsFile is not included in FSEvents when creating or deleting symbolic links. For example: $ ln -snf t test fsevent: '/path/to/dir/test', flags=0x40100 ItemCreated|ItemIsSymlink| $ ln -snf ci test fsevent: '/path/to/dir/test', flags=0x40200 ItemIsSymlink|ItemRemoved| fsevent: '/path/to/dir/test', flags=0x40100 ItemCreated|ItemIsSymlink| Signed-off-by: srz_zumix <zumix.cpp@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | | Merge branch 'ew/delta-islands-free' into jchTaylor Blau2022-11-181-20/+51
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Free structures related to delta islands after use. * ew/delta-islands-free: delta-islands: free island-related data after use
| * | | | | | | delta-islands: free island-related data after useEric Wong2022-11-181-20/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On my use case involving 771 islands of Linux on kernel.org, this reduces memory usage by around 25MB. The bulk of that comes from free_remote_islands, since free_config_regexes only saves around 40k. This memory is saved early in the memory-intensive pack process, making it available for the remainder of the long process. Signed-off-by: Eric Wong <e@80x24.org> Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | | | Merge branch 'jk/parse-object-type-mismatch' into jchTaylor Blau2022-11-182-5/+4
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `parse_object()` hardening when checking for the existence of a suspected blob object. * jk/parse-object-type-mismatch: parse_object(): check on-disk type of suspected blob parse_object(): drop extra "has" check before checking object type
| * | | | | | | | parse_object(): check on-disk type of suspected blobJeff King2022-11-182-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In parse_object(), we try to handle blobs by streaming rather than loading them entirely into memory. The most common case here will be that we haven't seen the object yet and check oid_object_info(), which tells us we have a blob. But we trigger this code on one other case: when we have an in-memory object struct with type OBJ_BLOB (and without its "parsed" flag set, since otherwise we'd return early from the function). This indicates that some other part of the code suspected we have a blob (e.g., it was mentioned by a tree or tag) but we haven't yet looked at the on-disk copy. In this case before hitting the streaming path, we check if we have the object on-disk at all. This is mostly pointless extra work, as the streaming path would complain if it couldn't open the object (albeit with the message "hash mismatch", which is a little misleading). But it's also insufficient to catch all problems. The streaming code will only tell us "yes, the on-disk object matches the oid". But it doesn't actually confirm that what we found was indeed a blob, and neither does repo_has_object_file(). One way to improve this would be to teach stream_object_signature() to check the type (either by returning it to us to check, or taking an "expected" type). But there's an even simpler fix here: if we suspect the object is a blob, just call oid_object_info() to confirm that we have it on-disk, and that it really is a blob. This is slightly less efficient than teaching stream_object_signature() to do it (since it has to open the object already). But this case very rarely comes up. In practice, we usually don't have any clue what the type is, in which case we already call oid_object_info(). This "suspected" case happens only when some other code created an object struct but didn't actually parse the blob, which is actually tricky to trigger at all (see the discussion of the test below). I reworked the conditional a bit so that instead of: if ((suspected_blob && oid_object_info() == OBJ_BLOB) (no_clue && oid_object_info() == OBJ_BLOB) we have the simpler: if ((suspected_blob || no_clue) && oid_object_info() == OBJ_BLOB) This is shorter, but also reflects what we really want say, which is "have we ruled out this being a blob; if not, check it on-disk". In either case, if oid_object_info() fails to tell us it's a blob, we'll skip the streaming code path and call repo_read_object_file(), just as before. And if we really do have a mismatch with the existing object struct, we'll eventually call lookup_commit(), etc, via parse_object_buffer(), which will complain that it doesn't match our existing obj->type. So this fixes one of the lingering expect_failure cases from 0616617c7e (t: introduce tests for unexpected object types, 2019-04-09). That test works by peeling a tag that claims to point to a blob (triggering us to create the struct), but really points to something else, which we later discover when we call parse_object() as part of the actual traversal). Prior to this commit, we'd quietly check the sha1 and mark the blob as "parsed". Now we correctly complain about the mismatch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | | | parse_object(): drop extra "has" check before checking object typeJeff King2022-11-181-2/+1
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When parsing an object of unknown type, we check to see if it's a blob, so we can use our streaming code path. This uses oid_object_info() to check the type, but before doing so we call repo_has_object_file(). This latter is pointless, as oid_object_info() will already fail if the object is missing. Checking it ahead of time just complicates the code and is a waste of resources (albeit small). Let's drop the redundant check. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | | | Merge branch 'mh/gitcredentials-generate' into jchTaylor Blau2022-11-181-3/+5
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Doc update. * mh/gitcredentials-generate: Docs: describe how a credential-generating helper works
| * | | | | | | | Docs: describe how a credential-generating helper worksM Hickford2022-11-141-3/+5
| | |_|_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the docs only described storage helpers. A concrete example: Git Credential Manager can generate credentials for GitHub and GitLab via OAuth. https://github.com/GitCredentialManager/git-credential-manager Signed-off-by: M Hickford <mirth.hickford@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | | | Merge branch 'mg/notes-newline' into jchTaylor Blau2022-11-181-1/+1
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid a stray empty newline in the template when creating new notes. * mg/notes-newline: notes: avoid empty line in template
| * | | | | | | | notes: avoid empty line in templateMichael J Gruber2022-11-161-1/+1
| | |/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When `git notes` prepares the template it adds an empty newline between the comment header and the content: > > # > # Write/edit the notes for the following object: > > # commit 0f3c55d4c2b7864bffb2d92278eff08d0b2e083f > # etc This is wrong structurally because that newline is part of the comment, too, and thus should be commented. Also, it throws off some positioning strategies of editors and plugins, and it differs from how we do commit templates. Change this to follow the standard set by `git commit`: > > # > # Write/edit the notes for the following object: > # > # commit 0f3c55d4c2b7864bffb2d92278eff08d0b2e083f > Tests pass unchanged after this code change. Signed-off-by: Michael J Gruber <git@grubix.eu> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | | | Merge branch 'jt/submodule-on-demand' into jchTaylor Blau2022-11-185-15/+73
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Push all submodules recursively with '--recurse-submodules=on-demand'. * jt/submodule-on-demand: Doc: document push.recurseSubmodules=only
| * | | | | | | | Doc: document push.recurseSubmodules=onlyJonathan Tan2022-11-145-15/+73
| | |_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Git learned pushing submodules without pushing the superproject by the user specifying --recurse-submodules=only through 6c656c3fe4 ("submodules: add RECURSE_SUBMODULES_ONLY value", 2016-12-20) and 225e8bf778 ("push: add option to push only submodules", 2016-12-20). For users who use this feature regularly, it is desirable to have an equivalent configuration. It turns out that such a configuration (push.recurseSubmodules=only) is already supported, even though it is neither documented nor mentioned in the commit messages, due to the way the --recurse-submodules=only feature was implemented (a function used to parse --recurse-submodules was updated to support "only", but that same function is used to parse push.recurseSubmodules too). What is left is to document it and test it, which is what this commit does. There is a possible point of confusion when recursing into a submodule that itself has the push.recurseSubmodules=only configuration, because if a repository has only its submodules pushed and not itself, its superproject can never be pushed. Therefore, treat such configurations as being "on-demand", and print a warning message. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | | | Merge branch 'pw/rebase-no-reflog-action' into jchTaylor Blau2022-11-180-0/+0
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid setting GIT_REFLOG_ACTION to improve readability of the sequencer internals. * pw/rebase-no-reflog-action: rebase: stop exporting GIT_REFLOG_ACTION sequencer: stop exporting GIT_REFLOG_ACTION
| * | | | | | | | rebase: stop exporting GIT_REFLOG_ACTIONPhillip Wood2022-11-091-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that struct replay_opts has a reflog_action member we no longer need to export GIT_REFLOG_ACTION when starting a rebase. If the user has set GIT_REFLOG_ACTION then we use it when initializing reflog_action. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | | | sequencer: stop exporting GIT_REFLOG_ACTIONPhillip Wood2022-11-092-20/+31
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each time it picks a commit the sequencer copies the GIT_REFLOG_ACITON environment variable so it can temporarily change it and then restore the previous value. This results in code that is hard to follow and also leaks memory because (i) we fail to free the copy when we've finished with it and (ii) each call to setenv() leaks the previous value. Instead pass the reflog action around in a variable and use it to set GIT_REFLOG_ACTION in the child environment when running "git commit". Within the sequencer GIT_REFLOG_ACTION is no longer set and is only read by sequencer_reflog_action(). It is still set by rebase before calling the sequencer, that will be addressed in the next commit. cherry-pick and revert are unaffected as they do not set GIT_REFLOG_ACTION before calling the sequencer. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | | | Merge branch 'ps/receive-use-only-advertised' into jchTaylor Blau2022-11-1815-82/+410
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "git receive-pack" used to use all the local refs as the boundary for checking connectivity of the data "git push" sent, but now it uses only the refs that it advertised to the pusher. In a repository with the .hideRefs configuration, this reduces the resources needed to perform the check. * ps/receive-use-only-advertised: receive-pack: only use visible refs for connectivity check rev-parse: add `--exclude-hidden=` option revision: add new parameter to exclude hidden refs revision: introduce struct to handle exclusions revision: move together exclusion-related functions refs: get rid of global list of hidden refs refs: fix memory leak when parsing hideRefs config
| * | | | | | | | receive-pack: only use visible refs for connectivity checkPatrick Steinhardt2022-11-173-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When serving a push, git-receive-pack(1) needs to verify that the packfile sent by the client contains all objects that are required by the updated references. This connectivity check works by marking all preexisting references as uninteresting and using the new reference tips as starting point for a graph walk. Marking all preexisting references as uninteresting can be a problem when it comes to performance. Git forges tend to do internal bookkeeping to keep alive sets of objects for internal use or make them easy to find via certain references. These references are typically hidden away from the user so that they are neither advertised nor writeable. At GitLab, we have one particular repository that contains a total of 7 million references, of which 6.8 million are indeed internal references. With the current connectivity check we are forced to load all these references in order to mark them as uninteresting, and this alone takes around 15 seconds to compute. We can optimize this by only taking into account the set of visible refs when marking objects as uninteresting. This means that we may now walk more objects until we hit any object that is marked as uninteresting. But it is rather unlikely that clients send objects that make large parts of objects reachable that have previously only ever been hidden, whereas the common case is to push incremental changes that build on top of the visible object graph. This provides a huge boost to performance in the mentioned repository, where the vast majority of its refs hidden. Pushing a new commit into this repo with `transfer.hideRefs` set up to hide 6.8 million of 7 refs as it is configured in Gitaly leads to a 4.5-fold speedup: Benchmark 1: main Time (mean ± σ): 30.977 s ± 0.157 s [User: 30.226 s, System: 1.083 s] Range (min … max): 30.796 s … 31.071 s 3 runs Benchmark 2: pks-connectivity-check-hide-refs Time (mean ± σ): 6.799 s ± 0.063 s [User: 6.803 s, System: 0.354 s] Range (min … max): 6.729 s … 6.850 s 3 runs Summary 'pks-connectivity-check-hide-refs' ran 4.56 ± 0.05 times faster than 'main' As we mostly go through the same codepaths even in the case where there are no hidden refs at all compared to the code before there is no change in performance when no refs are hidden: Benchmark 1: main Time (mean ± σ): 48.188 s ± 0.432 s [User: 49.326 s, System: 5.009 s] Range (min … max): 47.706 s … 48.539 s 3 runs Benchmark 2: pks-connectivity-check-hide-refs Time (mean ± σ): 48.027 s ± 0.500 s [User: 48.934 s, System: 5.025 s] Range (min … max): 47.504 s … 48.500 s 3 runs Summary 'pks-connectivity-check-hide-refs' ran 1.00 ± 0.01 times faster than 'main' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | | | rev-parse: add `--exclude-hidden=` optionPatrick Steinhardt2022-11-173-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new `--exclude-hidden=` option that is similar to the one we just added to git-rev-list(1). Given a section name `uploadpack` or `receive` as argument, it causes us to exclude all references that would be hidden by the respective `$section.hideRefs` configuration. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | | | revision: add new parameter to exclude hidden refsPatrick Steinhardt2022-11-175-1/+241
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users can optionally hide refs from remote users in git-upload-pack(1), git-receive-pack(1) and others via the `transfer.hideRefs`, but there is not an easy way to obtain the list of all visible or hidden refs right now. We'll require just that though for a performance improvement in our connectivity check. Add a new option `--exclude-hidden=` that excludes any hidden refs from the next pseudo-ref like `--all` or `--branches`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | | | revision: introduce struct to handle exclusionsPatrick Steinhardt2022-11-173-36/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The functions that handle exclusion of refs work on a single string list. We're about to add a second mechanism for excluding refs though, and it makes sense to reuse much of the same architecture for both kinds of exclusion. Introduce a new `struct ref_exclusions` that encapsulates all the logic related to excluding refs and move the `struct string_list` that holds all wildmatch patterns of excluded refs into it. Rename functions that operate on this struct to match its name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | | | revision: move together exclusion-related functionsPatrick Steinhardt2022-11-171-26/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move together the definitions of functions that handle exclusions of refs so that related functionality sits in a single place, only. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | | | refs: get rid of global list of hidden refsPatrick Steinhardt2022-11-175-31/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're about to add a new argument to git-rev-list(1) that allows it to add all references that are visible when taking `transfer.hideRefs` et al into account. This will require us to potentially parse multiple sets of hidden refs, which is not easily possible right now as there is only a single, global instance of the list of parsed hidden refs. Refactor `parse_hide_refs_config()` and `ref_is_hidden()` so that both take the list of hidden references as input and adjust callers to keep a local list, instead. This allows us to easily use multiple hidden-ref lists. Furthermore, it allows us to properly free this list before we exit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
| * | | | | | | | refs: fix memory leak when parsing hideRefs configPatrick Steinhardt2022-11-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When parsing the hideRefs configuration, we first duplicate the config value so that we can modify it. We then subsequently append it to the `hide_refs` string list, which is initialized with `strdup_strings` enabled. As a consequence we again reallocate the string, but never free the first duplicate and thus have a memory leak. While we never clean up the static `hide_refs` variable anyway, this is no excuse to make the leak worse by leaking every value twice. We are also about to change the way this variable will be handled so that we do indeed start to clean it up. So let's fix the memory leak by using the `string_list_append_nodup()` so that we pass ownership of the allocated string to `hide_refs`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* | | | | | | | | Merge branch 'ab/submodule-no-abspath' into jchTaylor Blau2022-11-182-9/+35
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove an absolute path in the "Migrating git directory" message. * ab/submodule-no-abspath: submodule--helper absorbgitdirs: no abspaths in "Migrating git..."
| * | | | | | | | | submodule--helper absorbgitdirs: no abspaths in "Migrating git..."Ævar Arnfjörð Bjarmason2022-11-092-9/+35
| | |_|_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the "Migrating git directory" messages to avoid emitting absolute paths. We could use "old_git_dir" and "new_gitdir.buf" here sometimes, but not in all the cases covered by these tests, i.e. sometimes the latter will be an absolute path with a different prefix. So let's just strip off the common prefix of the two strings, which handles the cases where we have nested submodules nicely. Note that this case is different than the one in get_submodule_displaypath() in "builtin/submodule--helper.c" handles, as we're dealing with the paths to the two ".git" here, not worktree paths. Before this change we had no tests at all for this "Migrating git directory" output. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>