more work on TODO

* TODO: More work on the first section. Use clearer section headers.
author: Paolo Bonzini <bonzini@gnu.org> 2010-03-08 17:14:51 +0100
committer: Paolo Bonzini <bonzini@gnu.org> 2010-03-08 17:14:51 +0100
commit: 5acb1dc0dffbf8a8e9db87bc6caf9fa7c3dc170e (patch)
tree: ccbbfb03892df6ced03134574450eea902a3fe9f /TODO
parent: 564ba765e84b27e2d8aa415def3841c879f668e4 (diff)
download: grep-5acb1dc0dffbf8a8e9db87bc6caf9fa7c3dc170e.tar.gz
1 files changed, 47 insertions, 52 deletions
diff --git a/TODO b/TODO
index 62e302e9..2cfd0ce0 100644
--- a/TODO
+++ b/TODO
@@ -4,58 +4,52 @@
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.
 
-Get sane performance with UTF-8 locales.
+===============
+Short term work
+===============
 
-Improve the test infrastructure.
+See where we are with UTF-8 performance.
 
-Other small patches which wait for a test case.
+Merge Debian patches 55-bigfile.patch, 69-mbtowc.patch and
+70-man_apostrophe.patch.  Go through patches in Savannah.
 
-Some _minimal_ cleanup of the grep(), grepdir(), recursion (the "main
-loop") and fix --directories=read
+Cleanup of the grep(), grepdir(), recursion (the "main loop") to use fts.
+Fix --directories=read.
 
 Write better Texinfo documentation for grep.  The manual page would be a
 good place to start, but Info documents are also supposed to contain a
 tutorial and examples.
 
-Fix the DFA matcher to never use exponential space.  (Fortunately, these
-cases are rare.)
-
-Improve the performance of the regex backtracking matcher.  This matcher
-is agonizingly slow, and is responsible for grep sometimes being slower
-than Unix grep when backreferences are used.
+Some test in tests/spencer2.tests should have failed!  Need to filter out
+some bugs in dfa.[ch]/regex.[ch].
 
-Some test in tests/spencer2.tests should have failed!
-Need to filter out some bugs in dfa.[ch]/regex.[ch].
+Multithreading?
 
-Threads for grep?
-
-GNU grep does 32-bit arithmetic, it needs to move to 64-bit.
+GNU grep does 32-bit arithmetic, it needs to move to 64-bit (i.e.
+size_t/ptrdiff_t).
 
 Clean up, too many #ifdefs!
 
-Check some new algorithms for matching; talk to Karl Berry and Nelson.
-Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142)
-claim that his algorithm is faster than Boyer-More. Worth checking.
-
-Lazy dynamic linking of libpcre, libz, and libbz2?
+Lazy dynamic linking of libpcre.
 
 Check FreeBSD's integration of zgrep (-Z) and bzgrep (-J) in one
 binary. Is there a possibility of doing even better by automatically
 checking the magic of binary files ourselves (0x1F 0x8B for gzip, 0x1F
-0x9D for compress, and 0x42 0x5A 0x68 for bzip2)?
+0x9D for compress, and 0x42 0x5A 0x68 for bzip2)?  Once what to do with
+libpcre is decided, do the same for libz and libbz2.
 
-##
+
+==================
+Matching algorithms
+==================
 
-Check <http://tony.abou-assaleh.net/greps.html>.
-Take a look at these and consider opportunities
-for merging or cloning:
+Check <http://tony.abou-assaleh.net/greps.html>.  Take a look at these
+and consider opportunities for merging or cloning:
 
    -- ja-grep's mlb2 patch (Japanese grep)
       <ftp://ftp.freebsd.org/pub/FreeBSD/ports/distfiles/grep-2.4.2-mlb2.patch.gz>
    -- lgrep (from lv, a Powerful Multilingual File Viewer / Grep)
       <http://www.ff.iij4u.or.jp/~nrt/lv/>;
-   -- pcregrep (from Perl-Compatible Regular Expressions library)
-      <http://www.pcre.org/>;
    -- cgrep (Context grep) <http://plg.uwaterloo.ca/~ftp/mt/cgrep/>
       seems like nice work;
    -- sgrep (Struct grep) <http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html>;
@@ -65,25 +59,38 @@ for merging or cloning:
       <http://www.dcc.uchile.cl/~gnavarro/software/>;
    -- ggrep (Grouse grep) <http://www.grouse.com.au/ggrep/>;
    -- grep.py (Python grep) <http://www.vdesmedt.com/~vds2212/grep.html>;
-   -- freegrep (a BSD-licensed grep for those who can't stand the GNU GPL)
-      <http://www.vocito.com/downloads/software/grep/>;
+   -- freegrep <http://www.vocito.com/downloads/software/grep/>;
 
-##
+Check some new algorithms for matching; talk to Karl Berry and Nelson.
+Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142)
+claim that his algorithm is faster than Boyer-More. Worth checking.
 
-POSIX Compliance: see p10003.x
+Fix the DFA matcher to never use exponential space.  (Fortunately, these
+cases are rare.)
 
-In general, interesting things to check in POSIX/OpenGroup include:
+
+============================
+Standards: POSIX and Unicode
+============================
 
-Provide support for the POSIX [= =] and [. .] constructs. This is
-difficult because it requires locale-dependent details of the
-character set and collating sequence, but POSIX does not standardize
-any method for accessing this information!
+For POSIX compliance, see p10003.x. Current support for the POSIX [= =]
+and [. .] constructs is limited. This is difficult because it requires
+locale-dependent details of the character set and collating sequence,
+but POSIX does not standardize any method for accessing this information!
 
-Moving away from GNU regex API for POSIX regex API.
+For Unicode, interesting things to check include the Unicode Standard
+<http://www.unicode.org/standard/standard.html> and the Unicode Technical
+Standard #18 (<http://www.unicode.org/reports/tr18/> “Unicode Regular
+Expressions”).  Talk to Bruno Haible who's mantaining GNU libunistring.
+See also Unicode Standard Annex #15 (<http://www.unicode.org/reports/tr15/>
+“Unicode Normalization Forms”), already implemented by GNU libunistring.
 
-##
+In particular, --ignore-case needs to be evaluated against the standards.
+We may want to deviate from POSIX if Unicode provides better or clearer
+semantics.
 
 POSIX and --ignore-case
+-----------------------
 
 For this issue, interesting things to check in POSIX include the
 Volume “Base Definitions (XBD)”, Chapter “Regular Expressions” and in
@@ -215,21 +222,9 @@ a composition of the two conversions.
 Any optimization in the implementation of each logic
 must not change its basic semantic.
 
-##
-
-In general, interesting things to check in Unicode include:
-
-The <http://www.unicode.org/standard/standard.html> Unicode Standard.
-
-Unicode Technical Standard #18 (<http://www.unicode.org/reports/tr18/>
-“Unicode Regular Expressions”).
-
-Unicode Standard Annex #15 (<http://www.unicode.org/reports/tr15/>
-“Unicode Normalization Forms”).
-
-##
 
 Unicode and --ignore-case
+-------------------------
 
 For this issue, interesting things to check in Unicode include:
author	Paolo Bonzini <bonzini@gnu.org>	2010-03-08 17:14:51 +0100
committer	Paolo Bonzini <bonzini@gnu.org>	2010-03-08 17:14:51 +0100
commit	5acb1dc0dffbf8a8e9db87bc6caf9fa7c3dc170e (patch)
tree	ccbbfb03892df6ced03134574450eea902a3fe9f /TODO
parent	564ba765e84b27e2d8aa415def3841c879f668e4 (diff)
download	grep-5acb1dc0dffbf8a8e9db87bc6caf9fa7c3dc170e.tar.gz