From 078b895fefdca94995862a4cc8644198b00a89bf Mon Sep 17 00:00:00 2001 From: Ivan Todoroski Date: Mon, 2 Apr 2012 17:13:48 +0200 Subject: fetch-pack: new --stdin option to read refs from stdin If a remote repo has too many tags (or branches), cloning it over the smart HTTP transport can fail because remote-curl.c puts all the refs from the remote repo on the fetch-pack command line. This can make the command line longer than the global OS command line limit, causing fetch-pack to fail. This is especially a problem on Windows where the command line limit is orders of magnitude shorter than Linux. There are already real repos out there that msysGit cannot clone over smart HTTP due to this problem. Here is an easy way to trigger this problem: git init too-many-refs cd too-many-refs echo bla > bla.txt git add . git commit -m test sha=$(git rev-parse HEAD) tag=$(perl -e 'print "bla" x 30') for i in `seq 50000`; do echo $sha refs/tags/$tag-$i >> .git/packed-refs done Then share this repo over the smart HTTP protocol and try cloning it: $ git clone http://localhost/.../too-many-refs/.git Cloning into 'too-many-refs'... fatal: cannot exec 'fetch-pack': Argument list too long 50k tags is obviously an absurd number, but it is required to demonstrate the problem on Linux because it has a much more generous command line limit. On Windows the clone fails with as little as 500 tags in the above loop, which is getting uncomfortably close to the number of tags you might see in real long lived repos. This is not just theoretical, msysGit is already failing to clone our company repo due to this. It's a large repo converted from CVS, nearly 10 years of history. Four possible solutions were discussed on the Git mailing list (in no particular order): 1) Call fetch-pack multiple times with smaller batches of refs. This was dismissed as inefficient and inelegant. 2) Add option --refs-fd=$n to pass a an fd from where to read the refs. This was rejected because inheriting descriptors other than stdin/stdout/stderr through exec() is apparently problematic on Windows, plus it would require changes to the run-command API to open extra pipes. 3) Add option --refs-from=$tmpfile to pass the refs using a temp file. This was not favored because of the temp file requirement. 4) Add option --stdin to pass the refs on stdin, one per line. In the end this option was chosen as the most efficient and most desirable from scripting perspective. There was however a small complication when using stdin to pass refs to fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for communication with the remote server. If we are going to sneak refs on stdin line by line, it would have to be done very carefully in the presence of --stateless-rpc, because when reading refs line by line we might read ahead too much data into our buffer and eat some of the remote protocol data which is also coming on stdin. One way to solve this would be to refactor get_remote_heads() in fetch-pack.c to accept a residual buffer from our stdin line parsing above, but this function is used in several places so other callers would be burdened by this residual buffer interface even when most of them don't need it. In the end we settled on the following solution: If --stdin is specified without --stateless-rpc, fetch-pack would read the refs from stdin one per line, in a script friendly format. However if --stdin is specified together with --stateless-rpc, fetch-pack would read the refs from stdin in packetized format (pkt-line) with a flush packet terminating the list of refs. This way we can read the exact number of bytes that we need from stdin, and then get_remote_heads() can continue reading from the same fd without losing a single byte of remote protocol data. This way the --stdin option only loses generality and scriptability when used together with --stateless-rpc, which is not easily scriptable anyway because it also uses pkt-line when talking to the remote server. Signed-off-by: Ivan Todoroski Signed-off-by: Junio C Hamano --- fetch-pack.h | 1 + 1 file changed, 1 insertion(+) (limited to 'fetch-pack.h') diff --git a/fetch-pack.h b/fetch-pack.h index 0608edae3f..7c2069c972 100644 --- a/fetch-pack.h +++ b/fetch-pack.h @@ -10,6 +10,7 @@ struct fetch_pack_args { lock_pack:1, use_thin_pack:1, fetch_all:1, + stdin_refs:1, verbose:1, no_progress:1, include_tag:1, -- cgit v1.2.1 From 63c694534bbe93a188ed74396a1a3ace0b24d6ad Mon Sep 17 00:00:00 2001 From: Michael Haggerty Date: Sun, 9 Sep 2012 08:19:39 +0200 Subject: fetch_pack(): reindent function decl and defn Signed-off-by: Michael Haggerty Signed-off-by: Junio C Hamano --- fetch-pack.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'fetch-pack.h') diff --git a/fetch-pack.h b/fetch-pack.h index 7c2069c972..1dbe90fa43 100644 --- a/fetch-pack.h +++ b/fetch-pack.h @@ -18,11 +18,11 @@ struct fetch_pack_args { }; struct ref *fetch_pack(struct fetch_pack_args *args, - int fd[], struct child_process *conn, - const struct ref *ref, - const char *dest, - int nr_heads, - char **heads, - char **pack_lockfile); + int fd[], struct child_process *conn, + const struct ref *ref, + const char *dest, + int nr_heads, + char **heads, + char **pack_lockfile); #endif -- cgit v1.2.1 From 8bee93dd24731a1d2ef7c82d484a893cf68b6572 Mon Sep 17 00:00:00 2001 From: Michael Haggerty Date: Sun, 9 Sep 2012 08:19:40 +0200 Subject: Change fetch_pack() and friends to take string_list arguments Instead of juggling (sometimes called ), pass around the list of references to be sought in a single string_list variable called "sought". Future commits will make more use of string_list functionality. Signed-off-by: Michael Haggerty Signed-off-by: Junio C Hamano --- fetch-pack.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'fetch-pack.h') diff --git a/fetch-pack.h b/fetch-pack.h index 1dbe90fa43..a6a8a73311 100644 --- a/fetch-pack.h +++ b/fetch-pack.h @@ -1,6 +1,8 @@ #ifndef FETCH_PACK_H #define FETCH_PACK_H +#include "string-list.h" + struct fetch_pack_args { const char *uploadpack; int unpacklimit; @@ -21,8 +23,7 @@ struct ref *fetch_pack(struct fetch_pack_args *args, int fd[], struct child_process *conn, const struct ref *ref, const char *dest, - int nr_heads, - char **heads, + struct string_list *sought, char **pack_lockfile); #endif -- cgit v1.2.1 From 4ba159996f6c1b0d6dd0a2a8bd9d6f5b342a4aa5 Mon Sep 17 00:00:00 2001 From: Michael Haggerty Date: Sun, 9 Sep 2012 08:19:43 +0200 Subject: filter_refs(): delete matched refs from sought list Remove any references that are available from the remote from the sought list (rather than overwriting their names with NUL characters, as previously). Mark matching entries by writing a non-NULL pointer to string_list_item::util during the iteration, then use filter_string_list() later to filter out the entries that have been marked. Document this aspect of fetch_pack() in a comment in the header file. (More documentation is obviously still needed.) Signed-off-by: Michael Haggerty Signed-off-by: Junio C Hamano --- fetch-pack.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'fetch-pack.h') diff --git a/fetch-pack.h b/fetch-pack.h index a6a8a73311..cb148719bf 100644 --- a/fetch-pack.h +++ b/fetch-pack.h @@ -19,6 +19,13 @@ struct fetch_pack_args { stateless_rpc:1; }; +/* + * sought contains the full names of remote references that should be + * updated from. On return, the names that were found on the remote + * will have been removed from the list. The util members of the + * string_list_items are used internally; they must be NULL on entry + * (and will be NULL on exit). + */ struct ref *fetch_pack(struct fetch_pack_args *args, int fd[], struct child_process *conn, const struct ref *ref, -- cgit v1.2.1 From f2db854d24f32de7b03dde5a7d7134f5e31100b9 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Tue, 29 Jan 2013 14:02:15 -0800 Subject: fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano --- fetch-pack.h | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) (limited to 'fetch-pack.h') diff --git a/fetch-pack.h b/fetch-pack.h index cb148719bf..dc5266c970 100644 --- a/fetch-pack.h +++ b/fetch-pack.h @@ -20,17 +20,16 @@ struct fetch_pack_args { }; /* - * sought contains the full names of remote references that should be - * updated from. On return, the names that were found on the remote - * will have been removed from the list. The util members of the - * string_list_items are used internally; they must be NULL on entry - * (and will be NULL on exit). + * sought represents remote references that should be updated from. + * On return, the names that were found on the remote will have been + * marked as such. */ struct ref *fetch_pack(struct fetch_pack_args *args, int fd[], struct child_process *conn, const struct ref *ref, const char *dest, - struct string_list *sought, + struct ref **sought, + int nr_sought, char **pack_lockfile); #endif -- cgit v1.2.1 From c6807a40dcd29f7e5ad1e2f4fc44f1729c9afa11 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?= Date: Sun, 26 May 2013 08:16:17 +0700 Subject: clone: open a shortcut for connectivity check MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- fetch-pack.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'fetch-pack.h') diff --git a/fetch-pack.h b/fetch-pack.h index dc5266c970..40f08bab24 100644 --- a/fetch-pack.h +++ b/fetch-pack.h @@ -16,7 +16,9 @@ struct fetch_pack_args { verbose:1, no_progress:1, include_tag:1, - stateless_rpc:1; + stateless_rpc:1, + check_self_contained_and_connected:1, + self_contained_and_connected:1; }; /* -- cgit v1.2.1