summaryrefslogtreecommitdiff
path: root/TODO
blob: f9936c1739d06d008d9bc97e4984e686aecdbb38 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
  Copyright (C) 1992, 1997-2002, 2004-2010 Free Software Foundation, Inc.

  Copying and distribution of this file, with or without modification,
  are permitted in any medium without royalty provided the copyright
  notice and this notice are preserved.

Get sane performance with UTF-8 locales.

1) rewrite the configure.in script, perhaps also Makefile.am
2) set up for gnulib-tool --import
3) improve the test infrastructure
4) check in the patches for the sync of dfa.c with GNU awk
5) other small patches which wait for a test case
6) process the Fedora/Red Hat patches
7) some _minimal_ cleanup of the grep(), grepdir(), recursion
   (the "main loop") and fix --directories=read

##

Write better Texinfo documentation for grep.  The manual page would be a
good place to start, but Info documents are also supposed to contain a
tutorial and examples.

Fix the DFA matcher to never use exponential space.  (Fortunately, these
cases are rare.)

Improve the performance of the regex backtracking matcher.  This matcher
is agonizingly slow, and is responsible for grep sometimes being slower
than Unix grep when backreferences are used.

Provide support for the Posix [= =] and [. .] constructs.  This is
difficult because it requires locale-dependent details of the character
set and collating sequence, but Posix does not standardize any method
for accessing this information!

##

Some test in tests/spencer2.tests should have failed !!!
Need to filter out some bugs in dfa.[ch]/regex.[ch].

Threads for grep ?

GNU grep does 32 bits arithmetic, it needs to move to 64.

Clean up, too many #ifdef's !!

Check some new Algorithms for matching, talk to Karl Berry and Nelson.
Sunday's "Quick Search" Algorithm (CACM 33, 8 August 1990 pp. 132-142)
claim that his algo. is faster then Boyer-More ????
Worth Checking.

Check <http://tony.abou-assaleh.net/greps.html>.
Take a look at:
   -- cgrep (Context grep) seems like nice work;
   -- sgrep (Struct grep);
   -- agrep (Approximate grep), from glimpse;
   -- nr-grep (Nondeterministic reverse grep);
   -- ggrep (Grouse grep);
   -- grep.py (Python grep);
   -- lgrep (from lv, a Powerful Multilingual File Viewer / Grep);
   -- freegrep <http://www.vocito.com/downloads/software/grep/>;
   -- ja-grep's mlb2 patch (Japanese grep);
   -- pcregrep (from Perl-Compatible Regular Expressions library).
Can we merge ?

Check FreeBSD's integration of zgrep (-Z) and bzgrep (-J) in one binary.

POSIX Compliance see p10003.x

Moving away from GNU regex API for POSIX regex API.

Better and faster !!

##

Check POSIX:
   -- Volume "Base Definitions (XBD)",
      Chapter "Regular Expressions"
      and in particular
      Section "Regular Expression General Requirements"
      and its paragraph about caseless matching.

Check the Unicode Standard:
   -- Chapter 3 ("Conformance"),
      Section 3.13 ("Default Case Operations")
      and the toCasefold() case conversion operation;
   -- Chapter 4 ("Character Properties"),
      Section 4.2 ("Case -- Normative")
      and the SpecialCasing.txt and CaseFolding.txt
      files from the Unicode Database;
   -- Chapter 5 ("Implementation Guidelines"),
      Section 5.18 ("Case Mappings"),
      Subsection "Caseless Matching".

Check Unicode Technical Standard #18 ("Unicode Regular Expressions").
Check Unicode Standard Annex #15 ("Unicode Normalization Forms").

##

Before every release:
   -- drop dfa.[ch] into a copy of gawk and run "make check";
   -- send pot file to the Translation Project to get fresh po files;
   -- get up-to-date version of ABOUT-NLS;
   -- update NEWS.