summaryrefslogtreecommitdiff
path: root/TODO
diff options
context:
space:
mode:
authorPaolo Bonzini <bonzini@gnu.org>2010-03-08 17:14:51 +0100
committerPaolo Bonzini <bonzini@gnu.org>2010-03-08 17:14:51 +0100
commit5acb1dc0dffbf8a8e9db87bc6caf9fa7c3dc170e (patch)
treeccbbfb03892df6ced03134574450eea902a3fe9f /TODO
parent564ba765e84b27e2d8aa415def3841c879f668e4 (diff)
downloadgrep-5acb1dc0dffbf8a8e9db87bc6caf9fa7c3dc170e.tar.gz
more work on TODO
* TODO: More work on the first section. Use clearer section headers.
Diffstat (limited to 'TODO')
-rw-r--r--TODO99
1 files changed, 47 insertions, 52 deletions
diff --git a/TODO b/TODO
index 62e302e9..2cfd0ce0 100644
--- a/TODO
+++ b/TODO
@@ -4,58 +4,52 @@
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.
-Get sane performance with UTF-8 locales.
+===============
+Short term work
+===============
-Improve the test infrastructure.
+See where we are with UTF-8 performance.
-Other small patches which wait for a test case.
+Merge Debian patches 55-bigfile.patch, 69-mbtowc.patch and
+70-man_apostrophe.patch. Go through patches in Savannah.
-Some _minimal_ cleanup of the grep(), grepdir(), recursion (the "main
-loop") and fix --directories=read
+Cleanup of the grep(), grepdir(), recursion (the "main loop") to use fts.
+Fix --directories=read.
Write better Texinfo documentation for grep. The manual page would be a
good place to start, but Info documents are also supposed to contain a
tutorial and examples.
-Fix the DFA matcher to never use exponential space. (Fortunately, these
-cases are rare.)
-
-Improve the performance of the regex backtracking matcher. This matcher
-is agonizingly slow, and is responsible for grep sometimes being slower
-than Unix grep when backreferences are used.
+Some test in tests/spencer2.tests should have failed! Need to filter out
+some bugs in dfa.[ch]/regex.[ch].
-Some test in tests/spencer2.tests should have failed!
-Need to filter out some bugs in dfa.[ch]/regex.[ch].
+Multithreading?
-Threads for grep?
-
-GNU grep does 32-bit arithmetic, it needs to move to 64-bit.
+GNU grep does 32-bit arithmetic, it needs to move to 64-bit (i.e.
+size_t/ptrdiff_t).
Clean up, too many #ifdefs!
-Check some new algorithms for matching; talk to Karl Berry and Nelson.
-Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142)
-claim that his algorithm is faster than Boyer-More. Worth checking.
-
-Lazy dynamic linking of libpcre, libz, and libbz2?
+Lazy dynamic linking of libpcre.
Check FreeBSD's integration of zgrep (-Z) and bzgrep (-J) in one
binary. Is there a possibility of doing even better by automatically
checking the magic of binary files ourselves (0x1F 0x8B for gzip, 0x1F
-0x9D for compress, and 0x42 0x5A 0x68 for bzip2)?
+0x9D for compress, and 0x42 0x5A 0x68 for bzip2)? Once what to do with
+libpcre is decided, do the same for libz and libbz2.
-##
+
+==================
+Matching algorithms
+==================
-Check <http://tony.abou-assaleh.net/greps.html>.
-Take a look at these and consider opportunities
-for merging or cloning:
+Check <http://tony.abou-assaleh.net/greps.html>. Take a look at these
+and consider opportunities for merging or cloning:
-- ja-grep's mlb2 patch (Japanese grep)
<ftp://ftp.freebsd.org/pub/FreeBSD/ports/distfiles/grep-2.4.2-mlb2.patch.gz>
-- lgrep (from lv, a Powerful Multilingual File Viewer / Grep)
<http://www.ff.iij4u.or.jp/~nrt/lv/>;
- -- pcregrep (from Perl-Compatible Regular Expressions library)
- <http://www.pcre.org/>;
-- cgrep (Context grep) <http://plg.uwaterloo.ca/~ftp/mt/cgrep/>
seems like nice work;
-- sgrep (Struct grep) <http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html>;
@@ -65,25 +59,38 @@ for merging or cloning:
<http://www.dcc.uchile.cl/~gnavarro/software/>;
-- ggrep (Grouse grep) <http://www.grouse.com.au/ggrep/>;
-- grep.py (Python grep) <http://www.vdesmedt.com/~vds2212/grep.html>;
- -- freegrep (a BSD-licensed grep for those who can't stand the GNU GPL)
- <http://www.vocito.com/downloads/software/grep/>;
+ -- freegrep <http://www.vocito.com/downloads/software/grep/>;
-##
+Check some new algorithms for matching; talk to Karl Berry and Nelson.
+Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142)
+claim that his algorithm is faster than Boyer-More. Worth checking.
-POSIX Compliance: see p10003.x
+Fix the DFA matcher to never use exponential space. (Fortunately, these
+cases are rare.)
-In general, interesting things to check in POSIX/OpenGroup include:
+
+============================
+Standards: POSIX and Unicode
+============================
-Provide support for the POSIX [= =] and [. .] constructs. This is
-difficult because it requires locale-dependent details of the
-character set and collating sequence, but POSIX does not standardize
-any method for accessing this information!
+For POSIX compliance, see p10003.x. Current support for the POSIX [= =]
+and [. .] constructs is limited. This is difficult because it requires
+locale-dependent details of the character set and collating sequence,
+but POSIX does not standardize any method for accessing this information!
-Moving away from GNU regex API for POSIX regex API.
+For Unicode, interesting things to check include the Unicode Standard
+<http://www.unicode.org/standard/standard.html> and the Unicode Technical
+Standard #18 (<http://www.unicode.org/reports/tr18/> “Unicode Regular
+Expressions”). Talk to Bruno Haible who's mantaining GNU libunistring.
+See also Unicode Standard Annex #15 (<http://www.unicode.org/reports/tr15/>
+“Unicode Normalization Forms”), already implemented by GNU libunistring.
-##
+In particular, --ignore-case needs to be evaluated against the standards.
+We may want to deviate from POSIX if Unicode provides better or clearer
+semantics.
POSIX and --ignore-case
+-----------------------
For this issue, interesting things to check in POSIX include the
Volume “Base Definitions (XBD)”, Chapter “Regular Expressions” and in
@@ -215,21 +222,9 @@ a composition of the two conversions.
Any optimization in the implementation of each logic
must not change its basic semantic.
-##
-
-In general, interesting things to check in Unicode include:
-
-The <http://www.unicode.org/standard/standard.html> Unicode Standard.
-
-Unicode Technical Standard #18 (<http://www.unicode.org/reports/tr18/>
-“Unicode Regular Expressions”).
-
-Unicode Standard Annex #15 (<http://www.unicode.org/reports/tr15/>
-“Unicode Normalization Forms”).
-
-##
Unicode and --ignore-case
+-------------------------
For this issue, interesting things to check in Unicode include: