TODO


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138

====
TODO
====

General
-------
* Classes requiring repo actually only need the git command - this should be 
  changed to limit their access level and make things a little safer.
* Check for correct usage of id, ref and hexsha and define their meanings, 
  currently its not so clear what id may be in cases or not - afaik its usually 
  a sha or ref unless cat-file is used where it must be a sha
* Overhaul command caching - currently its possible to create many instances of 
  the std-in command types, as it appears they are not killed when the repo gets 
  deleted.
* git command on windows may be killed using the /F options which probably is like
  SIGKILL. This is bad as the process might be doing something, leaving the resource
  locked and/or in an inconsistent state. Without the /F option, git does not terminate 
  itself, probably it listens to SIGINT which is not sent by TASKKILL. We can't really
  help it, but may be at some point git handles windows signals properly.
  
Object
------
* DataStream method should read the data itself. This would be easy once you have 
  the actul loose object, but will be hard if it is in a pack. In a distant future, 
  we might be able to do that or at least implement direct object reading for loose
  objects ( to safe a command call ). Currently object information comes from 
  persistent commands anyway, so the penalty is not that high. The data_stream 
  though is not based on persistent commands.
  It would be good to improve things there as cat-file keeps all the data in a buffer
  before it writes it. Hence it does not write to a stream directly, which can be 
  bad if files are large, say 1GB :).
  
Config
------
* Cache the config_reader of the repository and check whether they need to 
  update their information as the local file(s) have changed. Currently
  we re-read all configuration data each time a config-reader is created.
  In a way this leaves it to the user to actually keep the config-reader for 
  multiple uses, but there are cases when the user can hardly do that.

Diff
----
* Check docs on diff-core to be sure the raw-format presented there can be read
  properly: 
  - http://www.kernel.org/pub/software/scm/git-core/docs/gitdiffcore.html
  
Docs
----
* Overhaul docs - check examples, check looks, improve existing docs
* Config: auto-generated module does not appear at all ( except for two lines )
  - its probably related to some fishy error lines:
  :0: (ERROR/3) Unexpected indentation.
  :0: (ERROR/3) Unexpected indentation.

Index
-----
* [advanced]
  write_tree should write a tree directly, which would require ability to create
  objects in the first place. Should be rather simple as it is 
  "tree" bytes datablock | sha1sum and zipped.
  Currently we use some file swapping and the git command to do it which probably 
  is much slower. The thing is that properly writing a tree from an index involves
  creating several tree objects, so in the end it might be slower. 
  Hmm, probably its okay to use the command unless we go c(++)
* Implement diff so that temporary indices can be used as well ( using file swapping )
* Proper merge handling with index and working copy
* Checkout individual blobs using the index and git-checkout. Blobs can already 
  be written using their stream_data method.
* index.add: could be implemented in python together with hash-object, allowing 
  to keep the internal entry cache and write once everything is done. Problem 
  would be that all other git commands are unaware of the changes unless the index
  gets written. Its worth an evaluation at least.
  A problem going with it is that there might be shell-related limitations on non-unix
  where the commandline grows too large.
* index.remove: On windows, there can be a command line length overflow 
  as we pass the paths directly as argv. This is as we use git-rm to be able 
  to remove whole directories easily. This could be implemented using 
  git-update-index if this becomes an issue, but then we had to do all the globbing
  and directory removal ourselves
* commit: advance head = False - tree object should get the base commit wrapping
  that index uses after writing itself as tree. Perhaps it would even be better
  to have a Commit.create method from a tree or from an index. Allowing the 
  latter would be quite flexible and would fit into the system as refs have 
  create methods as well

Refs
-----
* For performance reasons it would be good to reimplement git-update-ref to be 
  fully equivalent to what the command does. Currently it does some checking and 
  handles symbolic refs as well as normal refs, updating the reflog if required.
  Its low-priority though as we don't set references directly that often.
* I have read that refs can be symbolic refs as well which would imply the need
  to possibly dereference them. This makes sense as they originally where possibly 
  a symbolic link. This would mean References could be derived from SymbolicReference
  officially, but it would still be bad as not all References are symbolic ones.
* Making the reflog available as command might be useful actually. This way historical 
  references/commits can be returned. Git internally manages this if refs are specified
  with HEAD@{0} for instance
* Possibly follow symbolic links when manually parsing references by walking the 
  directory tree. Currently the packed-refs file wouldn't be followed either.
 
Remote
------
* When parsing fetch-info, the regex will not allow spaces in the target remote ref as 
  I couldn't properly parse the optional space separated note in that case. Probably 
  the regex should be improved to handle this gracefully.
  
Repo
----
* Blame: Read the blame format making assumptions about its structure, 
  currently regex are used a lot although we can deduct what will be next.
  - Read data from a stream directly from git command
* Figure out how to implement a proper merge API. It should be index based, but provide 
  all necessary information to the ones willing to ask for it. The index implementation 
  actually provides this already, but some real use-cases would be great to have a least.
  
  
Submodules
----------
* add submodule support
* see tree

TestSystem
----------
* Figure out a good way to indicate the required presense of a git-daemon to host
 a specific path. Ideally, the system would detect the missing daemon and inform 
 the user about the required command-line to start the daemon where we need it.
 Reason for us being unable to start a daemon is that it will always fork - we can 
 only kill itself, but not its children. Even if we would a pgrep like match, we still 
 would not know whether it truly is our daemons - in that case user permissions should
 stop us though.

Tree
----
* Should return submodules during iteration ( identifies as commit )
* Work through test and check for test-case cleanup and completeness ( what about
  testing whether it raises on invalid input ? ). See 6dc7799d44e1e5b9b77fd19b47309df69ec01a99