diff options
-rw-r--r-- | TODO | 99 |
1 files changed, 47 insertions, 52 deletions
@@ -4,58 +4,52 @@ are permitted in any medium without royalty provided the copyright notice and this notice are preserved. -Get sane performance with UTF-8 locales. +=============== +Short term work +=============== -Improve the test infrastructure. +See where we are with UTF-8 performance. -Other small patches which wait for a test case. +Merge Debian patches 55-bigfile.patch, 69-mbtowc.patch and +70-man_apostrophe.patch. Go through patches in Savannah. -Some _minimal_ cleanup of the grep(), grepdir(), recursion (the "main -loop") and fix --directories=read +Cleanup of the grep(), grepdir(), recursion (the "main loop") to use fts. +Fix --directories=read. Write better Texinfo documentation for grep. The manual page would be a good place to start, but Info documents are also supposed to contain a tutorial and examples. -Fix the DFA matcher to never use exponential space. (Fortunately, these -cases are rare.) - -Improve the performance of the regex backtracking matcher. This matcher -is agonizingly slow, and is responsible for grep sometimes being slower -than Unix grep when backreferences are used. +Some test in tests/spencer2.tests should have failed! Need to filter out +some bugs in dfa.[ch]/regex.[ch]. -Some test in tests/spencer2.tests should have failed! -Need to filter out some bugs in dfa.[ch]/regex.[ch]. +Multithreading? -Threads for grep? - -GNU grep does 32-bit arithmetic, it needs to move to 64-bit. +GNU grep does 32-bit arithmetic, it needs to move to 64-bit (i.e. +size_t/ptrdiff_t). Clean up, too many #ifdefs! -Check some new algorithms for matching; talk to Karl Berry and Nelson. -Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142) -claim that his algorithm is faster than Boyer-More. Worth checking. - -Lazy dynamic linking of libpcre, libz, and libbz2? +Lazy dynamic linking of libpcre. Check FreeBSD's integration of zgrep (-Z) and bzgrep (-J) in one binary. Is there a possibility of doing even better by automatically checking the magic of binary files ourselves (0x1F 0x8B for gzip, 0x1F -0x9D for compress, and 0x42 0x5A 0x68 for bzip2)? +0x9D for compress, and 0x42 0x5A 0x68 for bzip2)? Once what to do with +libpcre is decided, do the same for libz and libbz2. -## + +================== +Matching algorithms +================== -Check <http://tony.abou-assaleh.net/greps.html>. -Take a look at these and consider opportunities -for merging or cloning: +Check <http://tony.abou-assaleh.net/greps.html>. Take a look at these +and consider opportunities for merging or cloning: -- ja-grep's mlb2 patch (Japanese grep) <ftp://ftp.freebsd.org/pub/FreeBSD/ports/distfiles/grep-2.4.2-mlb2.patch.gz> -- lgrep (from lv, a Powerful Multilingual File Viewer / Grep) <http://www.ff.iij4u.or.jp/~nrt/lv/>; - -- pcregrep (from Perl-Compatible Regular Expressions library) - <http://www.pcre.org/>; -- cgrep (Context grep) <http://plg.uwaterloo.ca/~ftp/mt/cgrep/> seems like nice work; -- sgrep (Struct grep) <http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html>; @@ -65,25 +59,38 @@ for merging or cloning: <http://www.dcc.uchile.cl/~gnavarro/software/>; -- ggrep (Grouse grep) <http://www.grouse.com.au/ggrep/>; -- grep.py (Python grep) <http://www.vdesmedt.com/~vds2212/grep.html>; - -- freegrep (a BSD-licensed grep for those who can't stand the GNU GPL) - <http://www.vocito.com/downloads/software/grep/>; + -- freegrep <http://www.vocito.com/downloads/software/grep/>; -## +Check some new algorithms for matching; talk to Karl Berry and Nelson. +Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142) +claim that his algorithm is faster than Boyer-More. Worth checking. -POSIX Compliance: see p10003.x +Fix the DFA matcher to never use exponential space. (Fortunately, these +cases are rare.) -In general, interesting things to check in POSIX/OpenGroup include: + +============================ +Standards: POSIX and Unicode +============================ -Provide support for the POSIX [= =] and [. .] constructs. This is -difficult because it requires locale-dependent details of the -character set and collating sequence, but POSIX does not standardize -any method for accessing this information! +For POSIX compliance, see p10003.x. Current support for the POSIX [= =] +and [. .] constructs is limited. This is difficult because it requires +locale-dependent details of the character set and collating sequence, +but POSIX does not standardize any method for accessing this information! -Moving away from GNU regex API for POSIX regex API. +For Unicode, interesting things to check include the Unicode Standard +<http://www.unicode.org/standard/standard.html> and the Unicode Technical +Standard #18 (<http://www.unicode.org/reports/tr18/> “Unicode Regular +Expressions”). Talk to Bruno Haible who's mantaining GNU libunistring. +See also Unicode Standard Annex #15 (<http://www.unicode.org/reports/tr15/> +“Unicode Normalization Forms”), already implemented by GNU libunistring. -## +In particular, --ignore-case needs to be evaluated against the standards. +We may want to deviate from POSIX if Unicode provides better or clearer +semantics. POSIX and --ignore-case +----------------------- For this issue, interesting things to check in POSIX include the Volume “Base Definitions (XBD)”, Chapter “Regular Expressions” and in @@ -215,21 +222,9 @@ a composition of the two conversions. Any optimization in the implementation of each logic must not change its basic semantic. -## - -In general, interesting things to check in Unicode include: - -The <http://www.unicode.org/standard/standard.html> Unicode Standard. - -Unicode Technical Standard #18 (<http://www.unicode.org/reports/tr18/> -“Unicode Regular Expressions”). - -Unicode Standard Annex #15 (<http://www.unicode.org/reports/tr15/> -“Unicode Normalization Forms”). - -## Unicode and --ignore-case +------------------------- For this issue, interesting things to check in Unicode include: |