diff options
author | Shawn O. Pearce <spearce@spearce.org> | 2007-02-06 14:58:30 -0500 |
---|---|---|
committer | Shawn O. Pearce <spearce@spearce.org> | 2007-02-06 14:58:30 -0500 |
commit | 63e0c8b364e334fc7cc975edf1f16fb4c89594b3 (patch) | |
tree | 85f4ed7849cf2799bb1dbcd0b696415f4b748d6a | |
parent | ef94edb53c9a5fd1e5fca9f548adc713d3d8ffe1 (diff) | |
download | git-63e0c8b364e334fc7cc975edf1f16fb4c89594b3.tar.gz |
Support RFC 2822 date parsing in fast-import.
Since some frontends may be working with source material where
the dates are only readily available as RFC 2822 strings, it is
more friendly if fast-import exposes Git's parse_date() function
to handle the conversion. This way the frontend doesn't need
to perform the parsing itself.
The new --date-format option to fast-import can be used by a
frontend to select which format it will supply date strings in.
The default is the standard `raw` Git format, which fast-import
has always supported. Format rfc2822 can be used to activate the
parse_date() function instead.
Because fast-import could also be useful for creating new, current
commits, the format `now` is also supported to generate the current
system timestamp. The implementation of `now` is a trivial call
to datestamp(), but is actually a whole whopping 3 lines so that
fast-import can verify the frontend really meant `now`.
As part of this change I have added validation of the `raw` date
format. Prior to this change fast-import would accept anything
in a `committer` command, even if it was seriously malformed.
Now fast-import requires the '> ' near the end of the string and
verifies the timestamp is formatted properly.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
-rw-r--r-- | Documentation/git-fast-import.txt | 95 | ||||
-rw-r--r-- | fast-import.c | 107 | ||||
-rwxr-xr-x | t/t9300-fast-import.sh | 36 |
3 files changed, 214 insertions, 24 deletions
diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index 6fc78bff3e..08450de9ac 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -32,6 +32,12 @@ the frontend program in use. OPTIONS ------- +--date-format=<fmt>:: + Specify the type of dates the frontend will supply to + gfi within `author`, `committer` and `tagger` commands. + See ``Date Formats'' below for details about which formats + are supported, and their syntax. + --max-pack-size=<n>:: Maximum size of each output packfile, expressed in MiB. The default is 4096 (4 GiB) as that is the maximum allowed @@ -53,7 +59,6 @@ OPTIONS Frontends can use this file to validate imports after they have been completed. - Performance ----------- The design of gfi allows it to import large projects in a minimum @@ -127,6 +132,78 @@ results, such as branch names or file names with leading or trailing spaces in their name, or early termination of gfi when it encounters unexpected input. +Date Formats +~~~~~~~~~~~~ +The following date formats are supported. A frontend should select +the format it will use for this import by passing the format name +in the `--date-format=<fmt>` command line option. + +`raw`:: + This is the Git native format and is `<time> SP <tz>`. + It is also gfi's default format, if `--date-format` was + not specified. ++ +The time of the event is specified by `<time>` as the number of +seconds since the UNIX epoch (midnight, Jan 1, 1970, UTC) and is +written as an ASCII decimal integer. ++ +The timezone is specified by `<tz>` as a positive or negative offset +from UTC. For example EST (which is typically 5 hours behind GMT) +would be expressed in `<tz>` by ``-0500'' while GMT is ``+0000''. ++ +If the timezone is not available in the source material, use +``+0000'', or the most common local timezone. For example many +organizations have a CVS repository which has only ever been accessed +by users who are located in the same location and timezone. In this +case the user's timezone can be easily assumed. ++ +Unlike the `rfc2822` format, this format is very strict. Any +variation in formatting will cause gfi to reject the value. + +`rfc2822`:: + This is the standard email format as described by RFC 2822. ++ +An example value is ``Tue Feb 6 11:22:18 2007 -0500''. The Git +parser is accurate, but a little on the lenient side. Its the +same parser used by gitlink:git-am[1] when applying patches +received from email. ++ +Some malformed strings may be accepted as valid dates. In some of +these cases Git will still be able to obtain the correct date from +the malformed string. There are also some types of malformed +strings which Git will parse wrong, and yet consider valid. +Seriously malformed strings will be rejected. ++ +If the source material is formatted in RFC 2822 style dates, +the frontend should let gfi handle the parsing and conversion +(rather than attempting to do it itself) as the Git parser has +been well tested in the wild. ++ +Frontends should prefer the `raw` format if the source material +is already in UNIX-epoch format, or is easily convertible to +that format, as there is no ambiguity in parsing. + +`now`:: + Always use the current time and timezone. The literal + `now` must always be supplied for `<when>`. ++ +This is a toy format. The current time and timezone of this system +is always copied into the identity string at the time it is being +created by gfi. There is no way to specify a different time or +timezone. ++ +This particular format is supplied as its short to implement and +may be useful to a process that wants to create a new commit +right now, without needing to use a working directory or +gitlink:git-update-index[1]. ++ +If separate `author` and `committer` commands are used in a `commit` +the timestamps may not match, as the system clock will be polled +twice (once for each command). The only way to ensure that both +author and committer identity information has the same timestamp +is to omit `author` (thus copying from `committer`) or to use a +date format other than `now`. + Commands ~~~~~~~~ gfi accepts several commands to update the current repository @@ -168,8 +245,8 @@ change to the project. .... 'commit' SP <ref> LF mark? - ('author' SP <name> SP LT <email> GT SP <time> SP <tz> LF)? - 'committer' SP <name> SP LT <email> GT SP <time> SP <tz> LF + ('author' SP <name> SP LT <email> GT SP <when> LF)? + 'committer' SP <name> SP LT <email> GT SP <when> LF data ('from' SP <committish> LF)? ('merge' SP <committish> LF)? @@ -222,12 +299,10 @@ the email address from the other fields in the line. Note that `<name>` is free-form and may contain any sequence of bytes, except `LT` and `LF`. It is typically UTF-8 encoded. -The time of the change is specified by `<time>` as the number of -seconds since the UNIX epoc (midnight, Jan 1, 1970, UTC) and is -written as an ASCII decimal integer. The committer's -timezone is specified by `<tz>` as a positive or negative offset -from UTC. For example EST (which is typically 5 hours behind GMT) -would be expressed in `<tz>` by ``-0500'' while GMT is ``+0000''. +The time of the change is specified by `<when>` using the date format +that was selected by the `--date-format=<fmt>` command line option. +See ``Date Formats'' above for the set of supported formats, and +their syntax. `from` ^^^^^^ @@ -394,7 +469,7 @@ lightweight (non-annotated) tags see the `reset` command below. .... 'tag' SP <name> LF 'from' SP <committish> LF - 'tagger' SP <name> SP LT <email> GT SP <time> SP <tz> LF + 'tagger' SP <name> SP LT <email> GT SP <when> LF data LF .... diff --git a/fast-import.c b/fast-import.c index 4dcba416e0..ee4777fdaf 100644 --- a/fast-import.c +++ b/fast-import.c @@ -17,8 +17,8 @@ Format of STDIN stream: new_commit ::= 'commit' sp ref_str lf mark? - ('author' sp name '<' email '>' ts tz lf)? - 'committer' sp name '<' email '>' ts tz lf + ('author' sp name '<' email '>' when lf)? + 'committer' sp name '<' email '>' when lf commit_msg ('from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)? ('merge' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)* @@ -34,7 +34,7 @@ Format of STDIN stream: new_tag ::= 'tag' sp tag_str lf 'from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf - 'tagger' sp name '<' email '>' ts tz lf + 'tagger' sp name '<' email '>' when lf tag_msg; tag_msg ::= data; @@ -88,6 +88,10 @@ Format of STDIN stream: bigint ::= # unsigned integer value, ascii base10 notation; binary_data ::= # file content, not interpreted; + when ::= raw_when | rfc2822_when; + raw_when ::= ts sp tz; + rfc2822_when ::= # Valid RFC 2822 date and time; + sp ::= # ASCII space character; lf ::= # ASCII newline (LF) character; @@ -234,6 +238,12 @@ struct hash_list unsigned char sha1[20]; }; +typedef enum { + WHENSPEC_RAW = 1, + WHENSPEC_RFC2822, + WHENSPEC_NOW, +} whenspec_type; + /* Configured limits on output */ static unsigned long max_depth = 10; static unsigned long max_packsize = (1LL << 32) - 1; @@ -294,6 +304,7 @@ static struct tag *first_tag; static struct tag *last_tag; /* Input stream parsing */ +static whenspec_type whenspec = WHENSPEC_RAW; static struct strbuf command_buf; static uintmax_t next_mark; static struct dbuf new_data; @@ -1396,6 +1407,64 @@ static void *cmd_data (size_t *size) return buffer; } +static int validate_raw_date(const char *src, char *result, int maxlen) +{ + const char *orig_src = src; + char *endp, sign; + + strtoul(src, &endp, 10); + if (endp == src || *endp != ' ') + return -1; + + src = endp + 1; + if (*src != '-' && *src != '+') + return -1; + sign = *src; + + strtoul(src + 1, &endp, 10); + if (endp == src || *endp || (endp - orig_src) >= maxlen) + return -1; + + strcpy(result, orig_src); + return 0; +} + +static char *parse_ident(const char *buf) +{ + const char *gt; + size_t name_len; + char *ident; + + gt = strrchr(buf, '>'); + if (!gt) + die("Missing > in ident string: %s", buf); + gt++; + if (*gt != ' ') + die("Missing space after > in ident string: %s", buf); + gt++; + name_len = gt - buf; + ident = xmalloc(name_len + 24); + strncpy(ident, buf, name_len); + + switch (whenspec) { + case WHENSPEC_RAW: + if (validate_raw_date(gt, ident + name_len, 24) < 0) + die("Invalid raw date \"%s\" in ident: %s", gt, buf); + break; + case WHENSPEC_RFC2822: + if (parse_date(gt, ident + name_len, 24) < 0) + die("Invalid rfc2822 date \"%s\" in ident: %s", gt, buf); + break; + case WHENSPEC_NOW: + if (strcmp("now", gt)) + die("Date in ident must be 'now': %s", buf); + datestamp(ident + name_len, 24); + break; + } + + return ident; +} + static void cmd_new_blob(void) { size_t l; @@ -1655,11 +1724,11 @@ static void cmd_new_commit(void) read_next_command(); cmd_mark(); if (!strncmp("author ", command_buf.buf, 7)) { - author = strdup(command_buf.buf); + author = parse_ident(command_buf.buf + 7); read_next_command(); } if (!strncmp("committer ", command_buf.buf, 10)) { - committer = strdup(command_buf.buf); + committer = parse_ident(command_buf.buf + 10); read_next_command(); } if (!committer) @@ -1692,7 +1761,7 @@ static void cmd_new_commit(void) store_tree(&b->branch_tree); hashcpy(b->branch_tree.versions[0].sha1, b->branch_tree.versions[1].sha1); - size_dbuf(&new_data, 97 + msglen + size_dbuf(&new_data, 114 + msglen + merge_count * 49 + (author ? strlen(author) + strlen(committer) @@ -1708,11 +1777,9 @@ static void cmd_new_commit(void) free(merge_list); merge_list = next; } - if (author) - sp += sprintf(sp, "%s\n", author); - else - sp += sprintf(sp, "author %s\n", committer + 10); - sp += sprintf(sp, "%s\n\n", committer); + sp += sprintf(sp, "author %s\n", author ? author : committer); + sp += sprintf(sp, "committer %s\n", committer); + *sp++ = '\n'; memcpy(sp, msg, msglen); sp += msglen; free(author); @@ -1780,7 +1847,7 @@ static void cmd_new_tag(void) /* tagger ... */ if (strncmp("tagger ", command_buf.buf, 7)) die("Expected tagger command, got %s", command_buf.buf); - tagger = strdup(command_buf.buf); + tagger = parse_ident(command_buf.buf + 7); /* tag payload/message */ read_next_command(); @@ -1792,7 +1859,8 @@ static void cmd_new_tag(void) sp += sprintf(sp, "object %s\n", sha1_to_hex(sha1)); sp += sprintf(sp, "type %s\n", type_names[OBJ_COMMIT]); sp += sprintf(sp, "tag %s\n", t->name); - sp += sprintf(sp, "%s\n\n", tagger); + sp += sprintf(sp, "tagger %s\n", tagger); + *sp++ = '\n'; memcpy(sp, msg, msglen); sp += msglen; free(tagger); @@ -1835,7 +1903,7 @@ static void cmd_checkpoint(void) } static const char fast_import_usage[] = -"git-fast-import [--depth=n] [--active-branches=n] [--export-marks=marks.file] [--branch-log=log]"; +"git-fast-import [--date-format=f] [--max-pack-size=n] [--depth=n] [--active-branches=n] [--export-marks=marks.file]"; int main(int argc, const char **argv) { @@ -1849,6 +1917,17 @@ int main(int argc, const char **argv) if (*a != '-' || !strcmp(a, "--")) break; + else if (!strncmp(a, "--date-format=", 14)) { + const char *fmt = a + 14; + if (!strcmp(fmt, "raw")) + whenspec = WHENSPEC_RAW; + else if (!strcmp(fmt, "rfc2822")) + whenspec = WHENSPEC_RFC2822; + else if (!strcmp(fmt, "now")) + whenspec = WHENSPEC_NOW; + else + die("unknown --date-format argument %s", fmt); + } else if (!strncmp(a, "--max-pack-size=", 16)) max_packsize = strtoumax(a + 16, NULL, 0) * 1024 * 1024; else if (!strncmp(a, "--depth=", 8)) diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index a5cc846b34..84b3c12a50 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -240,4 +240,40 @@ test_expect_success \ 'git-cat-file blob branch:newdir/exec.sh >actual && diff -u expect actual' +### +### series E +### + +cat >input <<INPUT_END +commit refs/heads/branch +author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> Tue Feb 6 11:22:18 2007 -0500 +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> Tue Feb 6 12:35:02 2007 -0500 +data <<COMMIT +RFC 2822 type date +COMMIT + +from refs/heads/branch^0 + +INPUT_END +test_expect_failure \ + 'E: rfc2822 date, --date-format=raw' \ + 'git-fast-import --date-format=raw <input' +test_expect_success \ + 'E: rfc2822 date, --date-format=rfc2822' \ + 'git-fast-import --date-format=rfc2822 <input' +test_expect_success \ + 'E: verify pack' \ + 'for p in .git/objects/pack/*.pack;do git-verify-pack $p||exit;done' + +cat >expect <<EOF +author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 1170778938 -0500 +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1170783302 -0500 + +RFC 2822 type date +EOF +test_expect_success \ + 'E: verify commit' \ + 'git-cat-file commit branch | sed 1,2d >actual && + diff -u expect actual' + test_done |