diff options
Diffstat (limited to 'Documentation/user-manual.txt')
-rw-r--r-- | Documentation/user-manual.txt | 219 |
1 files changed, 219 insertions, 0 deletions
diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt index dd1578dc8d..9e61798344 100644 --- a/Documentation/user-manual.txt +++ b/Documentation/user-manual.txt @@ -3160,6 +3160,225 @@ confusing and scary messages, but it won't actually do anything bad. In contrast, running "git prune" while somebody is actively changing the repository is a *BAD* idea). +[[birdview-on-the-source-code]] +A birdview on Git's source code +----------------------------- + +While Git's source code is quite elegant, it is not always easy for +new developers to find their way through it. A good idea is to look +at the contents of the initial commit: +_e83c5163316f89bfbde7d9ab23ca2e25604af290_ (also known as _v0.99~954_). + +Tip: you can see what files are in there with + +---------------------------------------------------- +$ git show e83c5163316f89bfbde7d9ab23ca2e25604af290: +---------------------------------------------------- + +and look at those files with something like + +----------------------------------------------------------- +$ git show e83c5163316f89bfbde7d9ab23ca2e25604af290:cache.h +----------------------------------------------------------- + +Be sure to read the README in that revision _after_ you are familiar with +the terminology (<<glossary>>), since the terminology has changed a little +since then. For example, we call the things "commits" now, which are +described in that README as "changesets". + +Actually a lot of the structure as it is now can be explained by that +initial commit. + +For example, we do not call it "cache" any more, but "index", however, the +file is still called `cache.h`. Remark: Not much reason to change it now, +especially since there is no good single name for it anyway, because it is +basically _the_ header file which is included by _all_ of Git's C sources. + +If you grasp the ideas in that initial commit (it is really small and you +can get into it really fast, and it will help you recognize things in the +much larger code base we have now), you should go on skimming `cache.h`, +`object.h` and `commit.h` in the current version. + +In the early days, Git (in the tradition of UNIX) was a bunch of programs +which were extremely simple, and which you used in scripts, piping the +output of one into another. This turned out to be good for initial +development, since it was easier to test new things. However, recently +many of these parts have become builtins, and some of the core has been +"libified", i.e. put into libgit.a for performance, portability reasons, +and to avoid code duplication. + +By now, you know what the index is (and find the corresponding data +structures in `cache.h`), and that there are just a couple of object types +(blobs, trees, commits and tags) which inherit their common structure from +`struct object`, which is their first member (and thus, you can cast e.g. +`(struct object *)commit` to achieve the _same_ as `&commit->object`, i.e. +get at the object name and flags). + +Now is a good point to take a break to let this information sink in. + +Next step: get familiar with the object naming. Read <<naming-commits>>. +There are quite a few ways to name an object (and not only revisions!). +All of these are handled in `sha1_name.c`. Just have a quick look at +the function `get_sha1()`. A lot of the special handling is done by +functions like `get_sha1_basic()` or the likes. + +This is just to get you into the groove for the most libified part of Git: +the revision walker. + +Basically, the initial version of `git log` was a shell script: + +---------------------------------------------------------------- +$ git-rev-list --pretty $(git-rev-parse --default HEAD "$@") | \ + LESS=-S ${PAGER:-less} +---------------------------------------------------------------- + +What does this mean? + +`git-rev-list` is the original version of the revision walker, which +_always_ printed a list of revisions to stdout. It is still functional, +and needs to, since most new Git programs start out as scripts using +`git-rev-list`. + +`git-rev-parse` is not as important any more; it was only used to filter out +options that were relevant for the different plumbing commands that were +called by the script. + +Most of what `git-rev-list` did is contained in `revision.c` and +`revision.h`. It wraps the options in a struct named `rev_info`, which +controls how and what revisions are walked, and more. + +The original job of `git-rev-parse` is now taken by the function +`setup_revisions()`, which parses the revisions and the common command line +options for the revision walker. This information is stored in the struct +`rev_info` for later consumption. You can do your own command line option +parsing after calling `setup_revisions()`. After that, you have to call +`prepare_revision_walk()` for initialization, and then you can get the +commits one by one with the function `get_revision()`. + +If you are interested in more details of the revision walking process, +just have a look at the first implementation of `cmd_log()`; call +`git-show v1.3.0~155^2~4` and scroll down to that function (note that you +no longer need to call `setup_pager()` directly). + +Nowadays, `git log` is a builtin, which means that it is _contained_ in the +command `git`. The source side of a builtin is + +- a function called `cmd_<bla>`, typically defined in `builtin-<bla>.c`, + and declared in `builtin.h`, + +- an entry in the `commands[]` array in `git.c`, and + +- an entry in `BUILTIN_OBJECTS` in the `Makefile`. + +Sometimes, more than one builtin is contained in one source file. For +example, `cmd_whatchanged()` and `cmd_log()` both reside in `builtin-log.c`, +since they share quite a bit of code. In that case, the commands which are +_not_ named like the `.c` file in which they live have to be listed in +`BUILT_INS` in the `Makefile`. + +`git log` looks more complicated in C than it does in the original script, +but that allows for a much greater flexibility and performance. + +Here again it is a good point to take a pause. + +Lesson three is: study the code. Really, it is the best way to learn about +the organization of Git (after you know the basic concepts). + +So, think about something which you are interested in, say, "how can I +access a blob just knowing the object name of it?". The first step is to +find a Git command with which you can do it. In this example, it is either +`git show` or `git cat-file`. + +For the sake of clarity, let's stay with `git cat-file`, because it + +- is plumbing, and + +- was around even in the initial commit (it literally went only through + some 20 revisions as `cat-file.c`, was renamed to `builtin-cat-file.c` + when made a builtin, and then saw less than 10 versions). + +So, look into `builtin-cat-file.c`, search for `cmd_cat_file()` and look what +it does. + +------------------------------------------------------------------ + git_config(git_default_config); + if (argc != 3) + usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>"); + if (get_sha1(argv[2], sha1)) + die("Not a valid object name %s", argv[2]); +------------------------------------------------------------------ + +Let's skip over the obvious details; the only really interesting part +here is the call to `get_sha1()`. It tries to interpret `argv[2]` as an +object name, and if it refers to an object which is present in the current +repository, it writes the resulting SHA-1 into the variable `sha1`. + +Two things are interesting here: + +- `get_sha1()` returns 0 on _success_. This might surprise some new + Git hackers, but there is a long tradition in UNIX to return different + negative numbers in case of different errors -- and 0 on success. + +- the variable `sha1` in the function signature of `get_sha1()` is `unsigned + char *`, but is actually expected to be a pointer to `unsigned + char[20]`. This variable will contain the 160-bit SHA-1 of the given + commit. Note that whenever a SHA-1 is passed as "unsigned char *", it + is the binary representation, as opposed to the ASCII representation in + hex characters, which is passed as "char *". + +You will see both of these things throughout the code. + +Now, for the meat: + +----------------------------------------------------------------------------- + case 0: + buf = read_object_with_reference(sha1, argv[1], &size, NULL); +----------------------------------------------------------------------------- + +This is how you read a blob (actually, not only a blob, but any type of +object). To know how the function `read_object_with_reference()` actually +works, find the source code for it (something like `git grep +read_object_with | grep ":[a-z]"` in the git repository), and read +the source. + +To find out how the result can be used, just read on in `cmd_cat_file()`: + +----------------------------------- + write_or_die(1, buf, size); +----------------------------------- + +Sometimes, you do not know where to look for a feature. In many such cases, +it helps to search through the output of `git log`, and then `git show` the +corresponding commit. + +Example: If you know that there was some test case for `git bundle`, but +do not remember where it was (yes, you _could_ `git grep bundle t/`, but that +does not illustrate the point!): + +------------------------ +$ git log --no-merges t/ +------------------------ + +In the pager (`less`), just search for "bundle", go a few lines back, +and see that it is in commit 18449ab0... Now just copy this object name, +and paste it into the command line + +------------------- +$ git show 18449ab0 +------------------- + +Voila. + +Another example: Find out what to do in order to make some script a +builtin: + +------------------------------------------------- +$ git log --no-merges --diff-filter=A builtin-*.c +------------------------------------------------- + +You see, Git is actually the best tool to find out about the source of Git +itself! + [[glossary]] include::glossary.txt[] |