TODO


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130

====
TODO
====

General
-------
* Classes requiring repo actually only need the git command - this should be 
  changed to limit their access level and make things a little safer.
* Check for correct usage of id, ref and hexsha and define their meanings, 
  currently its not so clear what id may be in cases or not - afaik its usually 
  a sha or ref unless cat-file is used where it must be a sha
* Overhaul command caching - currently its possible to create many instances of 
  the std-in command types, as it appears they are not killed when the repo gets 
  deleted. A clear() method could already help to allow long-running programs
  to remove cached commands after an idle time.
* References should be parsed 'manually' to get around command invocation, but 
  be sure to be able to read packed refs.

Object
------
* DataStream method should read the data itself. This would be easy once you have 
  the actul loose object, but will be hard if it is in a pack. In a distant future, 
  we might be able to do that or at least implement direct object reading for loose
  objects ( to safe a command call ). Currently object information comes from 
  persistent commands anyway, so the penalty is not that high. The data_stream 
  though is not based on persistent commands.
  It would be good to improve things there as cat-file keeps all the data in a buffer
  before it writes it. Hence it does not write to a stream directly, which can be 
  bad if files are large, say 1GB :).
* Effectively Objects only store hexsha's in their id attributes, so in fact 
  it should be renamed to 'sha'. There was a time when references where allowed as 
  well, but now objects will be 'baked' to the actual sha to assure comparisons work.
  
Commit
------
* message is stipped during parsing, which is wrong unless we parse from 
  rev-list output. In fact we don't know that, and can't really tell either.
  Currently we strip away white space that might actually belong to the message
  
Config
------
* Expand .get* methods of GitConfigParser to support default value. If it is not None, 
  it will be returned instead of raising. This way the class will be much more usable, 
  and ... I truly hate this config reader as it is so 'old' style. Its not even a new-style
  class yet showing that it must be ten years old.

Diff
----
* Check docs on diff-core to be sure the raw-format presented there can be read
  properly: 
  - http://www.kernel.org/pub/software/scm/git-core/docs/gitdiffcore.html
  
Docs
----
Overhaul docs - check examples, check looks, improve existing docs

Index
-----
* [advanced]
  write_tree should write a tree directly, which would require ability to create
  objects in the first place. Should be rather simple as it is 
  "tree" bytes datablock | sha1sum and zipped.
  Currently we use some file swapping and the git command to do it which probably 
  is much slower. The thing is that properly writing a tree from an index involves
  creating several tree objects, so in the end it might be slower. 
  Hmm, probably its okay to use the command unless we go c(++)
* Implement diff so that temporary indices can be used as well ( using file swapping )
* Proper merge handling with index and working copy
* Checkout individual blobs using the index and git-checkout. Blobs can already 
  be written using their stream_data method.
* index.add: could be implemented in python together with hash-object, allowing 
  to keep the internal entry cache and write once everything is done. Problem 
  would be that all other git commands are unaware of the changes unless the index
  gets written. Its worth an evaluation at least.
  A problem going with it is that there might be shell-related limitations on non-unix
  where the commandline grows too large.
* index.remove: On windows, there can be a command line length overflow 
  as we pass the paths directly as argv. This is as we use git-rm to be able 
  to remove whole directories easily. This could be implemented using 
  git-update-index if this becomes an issue, but then we had to do all the globbing
  and directory removal ourselves
* commit: advance head = False - tree object should get the base commit wrapping
  that index uses after writing itself as tree. Perhaps it would even be better
  to have a Commit.create method from a tree or from an index. Allowing the 
  latter would be quite flexible and would fit into the system as refs have 
  create methods as well

Refs
-----
* When adjusting the reference of a symbolic reference, the ref log might need 
  adjustments as well. This is not critical, but would make things totally 'right'
  - same with adjusting references directly
 !! - Could simply rewrite it using git-update-ref which works nicely for symbolic 
    and for normal refs !!
* Check whether we are the active reference HEAD.reference == this_ref
  - NO: The reference dosnt need to know - in fact it does not know about the 
  main HEAD, so it may not use it. This is to be done in client code only. 
  Remove me
* Reference.from_path may return a symbolic reference although it is not related 
  to the reference type. Split that up into two from_path on each of the types, 
  and provide a general method outside of the type  that tries both.

Remote
------
* 'push' method needs a test, a true test repository is required though, a fork 
  of a fork would do :)!
* Fetch should return heads that where updated, pull as well.
* Creation and deletion methods for references should be part of the interface, allowing
  repo.create_head(...) instaed of Head.create(repo, ...). Its a convenience thing, clearly
* When parsing fetch-info, the regex will not allow spaces in the target remote ref as 
  I couldn't properly parse the optional space separated note in that case. Probably 
  the regex should be improved to handle this gracefully.
  
Repo
----
* Blame: Read the blame format making assumptions about its structure, 
  currently regex are used a lot although we can deduct what will be next.
  - Read data from a stream directly from git command
* Figure out how to implement a proper merge API
  
Submodules
----------
* add submodule support

Tree
----
* Should return submodules during iteration ( identifies as commit )
* Work through test and check for test-case cleanup and completeness ( what about
  testing whether it raises on invalid input ? ). See 6dc7799d44e1e5b9b77fd19b47309df69ec01a99