1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
|
====
TODO
====
General
-------
* Classes requiring repo actually only need the git command - this should be
changed to limit their access level and make things a little safer.
* Check for correct usage of id, ref and hexsha and define their meanings,
currently its not so clear what id may be in cases or not - afaik its usually
a sha or ref unless cat-file is used where it must be a sha
* Overhaul command caching - currently its possible to create many instances of
the std-in command types, as it appears they are not killed when the repo gets
deleted.
* git command on windows may be killed using the /F options which probably is like
SIGKILL. This is bad as the process might be doing something, leaving the resource
locked and/or in an inconsistent state. Without the /F option, git does not terminate
itself, probably it listens to SIGINT which is not sent by TASKKILL. We can't really
help it, but may be at some point git handles windows signals properly.
Object
------
* DataStream method should read the data itself. This would be easy once you have
the actul loose object, but will be hard if it is in a pack. In a distant future,
we might be able to do that or at least implement direct object reading for loose
objects ( to safe a command call ). Currently object information comes from
persistent commands anyway, so the penalty is not that high. The data_stream
though is not based on persistent commands.
It would be good to improve things there as cat-file keeps all the data in a buffer
before it writes it. Hence it does not write to a stream directly, which can be
bad if files are large, say 1GB :).
Config
------
* Cache the config_reader of the repository and check whether they need to
update their information as the local file(s) have changed. Currently
we re-read all configuration data each time a config-reader is created.
In a way this leaves it to the user to actually keep the config-reader for
multiple uses, but there are cases when the user can hardly do that.
Diff
----
* Check docs on diff-core to be sure the raw-format presented there can be read
properly:
- http://www.kernel.org/pub/software/scm/git-core/docs/gitdiffcore.html
Docs
----
* Overhaul docs - check examples, check looks, improve existing docs
* Config: auto-generated module does not appear at all ( except for two lines )
- its probably related to some fishy error lines:
:0: (ERROR/3) Unexpected indentation.
:0: (ERROR/3) Unexpected indentation.
Index
-----
* [advanced]
write_tree should write a tree directly, which would require ability to create
objects in the first place. Should be rather simple as it is
"tree" bytes datablock | sha1sum and zipped.
Currently we use some file swapping and the git command to do it which probably
is much slower. The thing is that properly writing a tree from an index involves
creating several tree objects, so in the end it might be slower.
Hmm, probably its okay to use the command unless we go c(++)
* Implement diff so that temporary indices can be used as well ( using file swapping )
* Proper merge handling with index and working copy
* Checkout individual blobs using the index and git-checkout. Blobs can already
be written using their stream_data method.
* index.add: could be implemented in python together with hash-object, allowing
to keep the internal entry cache and write once everything is done. Problem
would be that all other git commands are unaware of the changes unless the index
gets written. Its worth an evaluation at least.
A problem going with it is that there might be shell-related limitations on non-unix
where the commandline grows too large.
* index.remove: On windows, there can be a command line length overflow
as we pass the paths directly as argv. This is as we use git-rm to be able
to remove whole directories easily. This could be implemented using
git-update-index if this becomes an issue, but then we had to do all the globbing
and directory removal ourselves
* commit: advance head = False - tree object should get the base commit wrapping
that index uses after writing itself as tree. Perhaps it would even be better
to have a Commit.create method from a tree or from an index. Allowing the
latter would be quite flexible and would fit into the system as refs have
create methods as well
Refs
-----
* For performance reasons it would be good to reimplement git-update-ref to be
fully equivalent to what the command does. Currently it does some checking and
handles symbolic refs as well as normal refs, updating the reflog if required.
Its low-priority though as we don't set references directly that often.
* I have read that refs can be symbolic refs as well which would imply the need
to possibly dereference them. This makes sense as they originally where possibly
a symbolic link. This would mean References could be derived from SymbolicReference
officially, but it would still be bad as not all References are symbolic ones.
* Making the reflog available as command might be useful actually. This way historical
references/commits can be returned. Git internally manages this if refs are specified
with HEAD@{0} for instance
* Possibly follow symbolic links when manually parsing references by walking the
directory tree. Currently the packed-refs file wouldn't be followed either.
Remote
------
* When parsing fetch-info, the regex will not allow spaces in the target remote ref as
I couldn't properly parse the optional space separated note in that case. Probably
the regex should be improved to handle this gracefully.
Repo
----
* Blame: Read the blame format making assumptions about its structure,
currently regex are used a lot although we can deduct what will be next.
- Read data from a stream directly from git command
* Figure out how to implement a proper merge API. It should be index based, but provide
all necessary information to the ones willing to ask for it. The index implementation
actually provides this already, but some real use-cases would be great to have a least.
Submodules
----------
* add submodule support
* see tree
TestSystem
----------
* Figure out a good way to indicate the required presense of a git-daemon to host
a specific path. Ideally, the system would detect the missing daemon and inform
the user about the required command-line to start the daemon where we need it.
Reason for us being unable to start a daemon is that it will always fork - we can
only kill itself, but not its children. Even if we would a pgrep like match, we still
would not know whether it truly is our daemons - in that case user permissions should
stop us though.
Tree
----
* Should return submodules during iteration ( identifies as commit )
* Work through test and check for test-case cleanup and completeness ( what about
testing whether it raises on invalid input ? ). See 6dc7799d44e1e5b9b77fd19b47309df69ec01a99
|