diff options
Diffstat (limited to 'ghc/runtime/regex/ChangeLog')
-rw-r--r-- | ghc/runtime/regex/ChangeLog | 3041 |
1 files changed, 3041 insertions, 0 deletions
diff --git a/ghc/runtime/regex/ChangeLog b/ghc/runtime/regex/ChangeLog new file mode 100644 index 0000000000..c16096a838 --- /dev/null +++ b/ghc/runtime/regex/ChangeLog @@ -0,0 +1,3041 @@ +Tue Apr 25 10:51:27 1995 Sigbjorn Finne <sof@dcs.gla.ac.uk> + + * Merged in the regex.c and regex.h of gawk-2.15.6, following a + suggestion on gnu.utils.bugs + + * regex.h: Added defines for Perl syntax, RE_PERL_MULTILINE_SYNTAX + and RE_PERL_SINGLELINE_SYNTAX + + * regex.c (regex_compile): Added handling of Perl operators, + nothing exciting - just different syntax for common operators. + +Fri Apr 2 17:31:59 1993 Jim Blandy (jimb@totoro.cs.oberlin.edu) + + * Released version 0.12. + + * regex.c (regerror): If errcode is zero, that's not a valid + error code, according to POSIX, but return "Success." + + * regex.c (regerror): Remember to actually fetch the message + from re_error_msg. + + * regex.c (regex_compile): Don't use the trick for ".*\n" on + ".+\n". Since the latter involves laying an extra choice + point, the backward jump isn't adjusted properly. + +Thu Mar 25 21:35:18 1993 Jim Blandy (jimb@totoro.cs.oberlin.edu) + + * regex.c (regex_compile): In the handle_open and handle_close + sections, clear pending_exact to zero. + +Tue Mar 9 12:03:07 1993 Jim Blandy (jimb@wookumz.gnu.ai.mit.edu) + + * regex.c (re_search_2): In the loop which searches forward + using fastmap, don't forget to cast the character from the + string to an unsigned before using it as an index into the + translate map. + +Thu Jan 14 15:41:46 1993 David J. MacKenzie (djm@kropotkin.gnu.ai.mit.edu) + + * regex.h: Never define const; let the callers do it. + configure.in: Don't define USING_AUTOCONF. + +Wed Jan 6 20:49:29 1993 Jim Blandy (jimb@geech.gnu.ai.mit.edu) + + * regex.c (regerror): Abort if ERRCODE is out of range. + +Sun Dec 20 16:19:10 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu) + + * configure.in: Arrange to #define USING_AUTOCONF. + * regex.h: If USING_AUTOCONF is #defined, don't mess with + `const' at all; autoconf has taken care of it. + +Mon Dec 14 21:40:39 1992 David J. MacKenzie (djm@kropotkin.gnu.ai.mit.edu) + + * regex.h (RE_SYNTAX_AWK): Fix typo. From Arnold Robbins. + +Sun Dec 13 20:35:39 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu) + + * regex.c (compile_range): Fetch the range start and end by + casting the pattern pointer to an `unsigned char *' before + fetching through it. + +Sat Dec 12 09:41:01 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu) + + * regex.c: Undo change of 12/7/92; it's better for Emacs to + #define HAVE_CONFIG_H. + +Fri Dec 11 22:00:34 1992 Jim Meyering (meyering@hal.gnu.ai.mit.edu) + + * regex.c: Define and use isascii-protected ctype.h macros. + +Fri Dec 11 05:10:38 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu) + + * regex.c (re_match_2): Undo Karl's November 10th change; it + keeps the group in :\(.*\) from matching :/ properly. + +Mon Dec 7 19:44:56 1992 Jim Blandy (jimb@wookumz.gnu.ai.mit.edu) + + * regex.c: #include config.h if either HAVE_CONFIG_H or emacs + is #defined. + +Tue Dec 1 13:33:17 1992 David J. MacKenzie (djm@goldman.gnu.ai.mit.edu) + + * regex.c [HAVE_CONFIG_H]: Include config.h. + +Wed Nov 25 23:46:02 1992 David J. MacKenzie (djm@goldman.gnu.ai.mit.edu) + + * regex.c (regcomp): Add parens around bitwise & for clarity. + Initialize preg->allocated to prevent segv. + +Tue Nov 24 09:22:29 1992 David J. MacKenzie (djm@goldman.gnu.ai.mit.edu) + + * regex.c: Use HAVE_STRING_H, not USG. + * configure.in: Check for string.h, not USG. + +Fri Nov 20 06:33:24 1992 Karl Berry (karl@cs.umb.edu) + + * regex.c (SIGN_EXTEND_CHAR) [VMS]: Back out of this change, + since Roland Roberts now says it was a localism. + +Mon Nov 16 07:01:36 1992 Karl Berry (karl@cs.umb.edu) + + * regex.h (const) [!HAVE_CONST]: Test another cpp symbol (from + Autoconf) before zapping const. + +Sun Nov 15 05:36:42 1992 Jim Blandy (jimb@wookumz.gnu.ai.mit.edu) + + * regex.c, regex.h: Changes for VMS from Roland B Roberts + <roberts@nsrl31.nsrl.rochester.edu>. + +Thu Nov 12 11:31:15 1992 Karl Berry (karl@cs.umb.edu) + + * Makefile.in (distfiles): Include INSTALL. + +Tue Nov 10 09:29:23 1992 Karl Berry (karl@cs.umb.edu) + + * regex.c (re_match_2): At maybe_pop_jump, if at end of string + and pattern, just quit the matching loop. + + * regex.c (LETTER_P): Rename to `WORDCHAR_P'. + + * regex.c (AT_STRINGS_{BEG,END}): Take `d' as an arg; change + callers. + + * regex.c (re_match_2) [!emacs]: In wordchar and notwordchar + cases, advance d. + +Wed Nov 4 15:43:58 1992 Karl Berry (karl@hal.gnu.ai.mit.edu) + + * regex.h (const) [!__STDC__]: Don't define if it's already defined. + +Sat Oct 17 19:28:19 1992 Karl Berry (karl@cs.umb.edu) + + * regex.c (bcmp, bcopy, bzero): Only #define if they are not + already #defined. + + * configure.in: Use AC_CONST. + +Thu Oct 15 08:39:06 1992 Karl Berry (karl@cs.umb.edu) + + * regex.h (const) [!const]: Conditionalize. + +Fri Oct 2 13:31:42 1992 Karl Berry (karl@cs.umb.edu) + + * regex.h (RE_SYNTAX_ED): New definition. + +Sun Sep 20 12:53:39 1992 Karl Berry (karl@cs.umb.edu) + + * regex.[ch]: remove traces of `longest_p' -- dumb idea to put + this into the pattern buffer, as it means parallelism loses. + + * Makefile.in (config.status): use sh to run configure --no-create. + + * Makefile.in (realclean): OK, don't remove configure. + +Sat Sep 19 09:05:08 1992 Karl Berry (karl@hayley) + + * regex.c (PUSH_FAILURE_POINT, POP_FAILURE_POINT) [DEBUG]: keep + track of how many failure points we push and pop. + (re_match_2) [DEBUG]: declare variables for that, and print results. + (DEBUG_PRINT4): new macro. + + * regex.h (re_pattern_buffer): new field `longest_p' (to + eliminate backtracking if the user doesn't need it). + * regex.c (re_compile_pattern): initialize it (to 1). + (re_search_2): set it to zero if register information is not needed. + (re_match_2): if it's set, don't backtrack. + + * regex.c (re_search_2): update fastmap only after checking that + the pattern is anchored. + + * regex.c (re_match_2): do more debugging at maybe_pop_jump. + + * regex.c (re_search_2): cast result of TRANSLATE for use in + array subscript. + +Thu Sep 17 19:47:16 1992 Karl Berry (karl@geech.gnu.ai.mit.edu) + + * Version 0.11. + +Wed Sep 16 08:17:10 1992 Karl Berry (karl@hayley) + + * regex.c (INIT_FAIL_STACK): rewrite as statements instead of a + complicated comma expr, to avoid compiler warnings (and also + simplify). + (re_compile_fastmap, re_match_2): change callers. + + * regex.c (POP_FAILURE_POINT): cast pop of regstart and regend + to avoid compiler warnings. + + * regex.h (RE_NEWLINE_ORDINARY): remove this syntax bit, and + remove uses. + * regex.c (at_{beg,end}line_loc_p): go the last mile: remove + the RE_NEWLINE_ORDINARY case which made the ^ in \n^ be an anchor. + +Tue Sep 15 09:55:29 1992 Karl Berry (karl@hayley) + + * regex.c (at_begline_loc_p): new fn. + (at_endline_loc_p): simplify at_endline_op_p. + (regex_compile): in ^/$ cases, call the above. + + * regex.c (POP_FAILURE_POINT): rewrite the fn as a macro again, + as lord's profiling indicates the function is 20% of the time. + (re_match_2): callers changed. + + * configure.in (AC_MEMORY_H): remove, since we never use memcpy et al. + +Mon Sep 14 17:49:27 1992 Karl Berry (karl@hayley) + + * Makefile.in (makeargs): include MFLAGS. + +Sun Sep 13 07:41:45 1992 Karl Berry (karl@hayley) + + * regex.c (regex_compile): in \1..\9 case, make it always + invalid to use \<digit> if there is no preceding <digit>th subexpr. + * regex.h (RE_NO_MISSING_BK_REF): remove this syntax bit. + + * regex.c (regex_compile): remove support for invalid empty groups. + * regex.h (RE_NO_EMPTY_GROUPS): remove this syntax bit. + + * regex.c (FREE_VARIABLES) [!REGEX_MALLOC]: define as alloca (0), + to reclaim memory. + + * regex.h (RE_SYNTAX_POSIX_SED): don't bother with this. + +Sat Sep 12 13:37:21 1992 Karl Berry (karl@hayley) + + * README: incorporate emacs.diff. + + * regex.h (_RE_ARGS) [!__STDC__]: define as empty parens. + + * configure.in: add AC_ALLOCA. + + * Put test files in subdir test, documentation in subdir doc. + Adjust Makefile.in and configure.in accordingly. + +Thu Sep 10 10:29:11 1992 Karl Berry (karl@hayley) + + * regex.h (RE_SYNTAX_{POSIX_,}SED): new definitions. + +Wed Sep 9 06:27:09 1992 Karl Berry (karl@hayley) + + * Version 0.10. + +Tue Sep 8 07:32:30 1992 Karl Berry (karl@hayley) + + * xregex.texinfo: put the day of month into the date. + + * Makefile.in (realclean): remove Texinfo-generated files. + (distclean): remove empty sorted index files. + (clean): remove dvi files, etc. + + * configure.in: test for more Unix variants. + + * fileregex.c: new file. + Makefile.in (fileregex): new target. + + * iregex.c (main): move variable decls to smallest scope. + + * regex.c (FREE_VARIABLES): free reg_{,info_}dummy. + (re_match_2): check that the allocation for those two succeeded. + + * regex.c (FREE_VAR): replace FREE_NONNULL with this. + (FREE_VARIABLES): call it. + (re_match_2) [REGEX_MALLOC]: initialize all our vars to NULL. + + * tregress.c (do_match): generalize simple_match. + (SIMPLE_NONMATCH): new macro. + (SIMPLE_MATCH): change from routine. + + * Makefile.in (regex.texinfo): make file readonly, so we don't + edit it by mistake. + + * many files (re_default_syntax): rename to `re_syntax_options'; + call re_set_syntax instead of assigning to the variable where + possible. + +Mon Sep 7 10:12:16 1992 Karl Berry (karl@hayley) + + * syntax.skel: don't use prototypes. + + * {configure,Makefile}.in: new files. + + * regex.c: include <string.h> `#if USG || STDC_HEADERS'; remove + obsolete test for `POSIX', and test for BSRTING. + Include <strings.h> if we are not USG or STDC_HEADERS. + Do not include <unistd.h>. What did we ever need that for? + + * regex.h (RE_NO_EMPTY_ALTS): remove this. + (RE_SYNTAX_AWK): remove from here, too. + * regex.c (regex_compile): remove the check. + * xregex.texinfo (Alternation Operator): update. + * other.c (test_others): remove tests for this. + + * regex.h (RE_DUP_MAX): undefine if already defined. + + * regex.h: (RE_SYNTAX_POSIX*): redo to allow more operators, and + define new syntaxes with the minimal set. + + * syntax.skel (main): used sscanf instead of scanf. + + * regex.h (RE_SYNTAX_*GREP): new definitions from mike. + + * regex.c (regex_compile): initialize the upper bound of + intervals at the beginning of the interval, not the end. + (From pclink@qld.tne.oz.au.) + + * regex.c (handle_bar): rename to `handle_alt', for consistency. + + * regex.c ({store,insert}_{op1,op2}): new routines (except the last). + ({STORE,INSERT}_JUMP{,2}): macros to replace the old routines, + which took arguments in different orders, and were generally weird. + + * regex.c (PAT_PUSH*): rename to `BUF_PUSH*' -- we're not + appending info to the pattern! + +Sun Sep 6 11:26:49 1992 Karl Berry (karl@hayley) + + * regex.c (regex_compile): delete the variable + `following_left_brace', since we never use it. + + * regex.c (print_compiled_pattern): don't print the fastmap if + it's null. + + * regex.c (re_compile_fastmap): handle + `on_failure_keep_string_jump' like `on_failure_jump'. + + * regex.c (re_match_2): in `charset{,_not' case, cast the bit + count to unsigned, not unsigned char, in case we have a full + 32-byte bit list. + + * tregress.c (simple_match): remove. + (simple_test): rename as `simple_match'. + (simple_compile): print the error string if the compile failed. + + * regex.c (DO_RANGE): rewrite as a function, `compile_range', so + we can debug it. Change pattern characters to unsigned char + *'s, and change the range variable to an unsigned. + (regex_compile): change calls. + +Sat Sep 5 17:40:49 1992 Karl Berry (karl@hayley) + + * regex.h (_RE_ARGS): new macro to put in argument lists (if + ANSI) or omit them (if K&R); don't declare routines twice. + + * many files (obscure_syntax): rename to `re_default_syntax'. + +Fri Sep 4 09:06:53 1992 Karl Berry (karl@hayley) + + * GNUmakefile (extraclean): new target. + (realclean): delete the info files. + +Wed Sep 2 08:14:42 1992 Karl Berry (karl@hayley) + + * regex.h: doc fix. + +Sun Aug 23 06:53:15 1992 Karl Berry (karl@hayley) + + * regex.[ch] (re_comp): no const in the return type (from djm). + +Fri Aug 14 07:25:46 1992 Karl Berry (karl@hayley) + + * regex.c (DO_RANGE): declare variables as unsigned chars, not + signed chars (from jimb). + +Wed Jul 29 18:33:53 1992 Karl Berry (karl@claude.cs.umb.edu) + + * Version 0.9. + + * GNUmakefile (distclean): do not remove regex.texinfo. + (realclean): remove it here. + + * tregress.c (simple_test): initialize buf.buffer. + +Sun Jul 26 08:59:38 1992 Karl Berry (karl@hayley) + + * regex.c (push_dummy_failure): new opcode and corresponding + case in the various routines. Pushed at the end of + alternatives. + + * regex.c (jump_past_next_alt): rename to `jump_past_alt', for + brevity. + (no_pop_jump): rename to `jump'. + + * regex.c (regex_compile) [DEBUG]: terminate printing of pattern + with a newline. + + * NEWS: new file. + + * tregress.c (simple_{compile,match,test}): routines to simplify all + these little tests. + + * tregress.c: test for matching as much as possible. + +Fri Jul 10 06:53:32 1992 Karl Berry (karl@hayley) + + * Version 0.8. + +Wed Jul 8 06:39:31 1992 Karl Berry (karl@hayley) + + * regex.c (SIGN_EXTEND_CHAR): #undef any previous definition, as + ours should always work properly. + +Mon Jul 6 07:10:50 1992 Karl Berry (karl@hayley) + + * iregex.c (main) [DEBUG]: conditionalize the call to + print_compiled_pattern. + + * iregex.c (main): initialize buf.buffer to NULL. + * tregress (test_regress): likewise. + + * regex.c (alloca) [sparc]: #if on HAVE_ALLOCA_H instead. + + * tregress.c (test_regress): didn't have jla's test quite right. + +Sat Jul 4 09:02:12 1992 Karl Berry (karl@hayley) + + * regex.c (re_match_2): only REGEX_ALLOCATE all the register + vectors if the pattern actually has registers. + (match_end): new variable to avoid having to use best_regend[0]. + + * regex.c (IS_IN_FIRST_STRING): rename to FIRST_STRING_P. + + * regex.c: doc fixes. + + * tregess.c (test_regress): new fastmap test forwarded by rms. + + * tregress.c (test_regress): initialize the fastmap field. + + * tregress.c (test_regress): new test from jla that aborted + in re_search_2. + +Fri Jul 3 09:10:05 1992 Karl Berry (karl@hayley) + + * tregress.c (test_regress): add tests for translating charsets, + from kaoru. + + * GNUmakefile (common): add alloca.o. + * alloca.c: new file, copied from bison. + + * other.c (test_others): remove var `buf', since it's no longer used. + + * Below changes from ro@TechFak.Uni-Bielefeld.DE. + + * tregress.c (test_regress): initialize buf.allocated. + + * regex.c (re_compile_fastmap): initialize `succeed_n_p'. + + * GNUmakefile (regex): depend on $(common). + +Wed Jul 1 07:12:46 1992 Karl Berry (karl@hayley) + + * Version 0.7. + + * regex.c: doc fixes. + +Mon Jun 29 08:09:47 1992 Karl Berry (karl@fosse) + + * regex.c (pop_failure_point): change string vars to + `const char *' from `unsigned char *'. + + * regex.c: consolidate debugging stuff. + (print_partial_compiled_pattern): avoid enum clash. + +Mon Jun 29 07:50:27 1992 Karl Berry (karl@hayley) + + * xmalloc.c: new file. + * GNUmakefile (common): add it. + + * iregex.c (print_regs): new routine (from jimb). + (main): call it. + +Sat Jun 27 10:50:59 1992 Jim Blandy (jimb@pogo.cs.oberlin.edu) + + * xregex.c (re_match_2): When we have accepted a match and + restored d from best_regend[0], we need to set dend + appropriately as well. + +Sun Jun 28 08:48:41 1992 Karl Berry (karl@hayley) + + * tregress.c: rename from regress.c. + + * regex.c (print_compiled_pattern): improve charset case to ease + byte-counting. + Also, don't distinguish between Emacs and non-Emacs + {not,}wordchar opcodes. + + * regex.c (print_fastmap): move here. + * test.c: from here. + * regex.c (print_{{partial,}compiled_pattern,double_string}): + rename from ..._printer. Change calls here and in test.c. + + * regex.c: create from xregex.c and regexinc.c for once and for + all, and change the debug fns to be extern, instead of static. + * GNUmakefile: remove traces of xregex.c. + * test.c: put in externs, instead of including regexinc.c. + + * xregex.c: move interactive main program and scanstring to iregex.c. + * iregex.c: new file. + * upcase.c, printchar.c: new files. + + * various doc fixes and other cosmetic changes throughout. + + * regexinc.c (compiled_pattern_printer): change variable name, + for consistency. + (partial_compiled_pattern_printer): print other info about the + compiled pattern, besides just the opcodes. + * xregex.c (regex_compile) [DEBUG]: print the compiled pattern + when we're done. + + * xregex.c (re_compile_fastmap): in the duplicate case, set + `can_be_null' and return. + Also, set `bufp->can_be_null' according to a new variable, + `path_can_be_null'. + Also, rewrite main while loop to not test `p != NULL', since + we never set it that way. + Also, eliminate special `can_be_null' value for the endline case. + (re_search_2): don't test for the special value. + * regex.h (struct re_pattern_buffer): remove the definition. + +Sat Jun 27 15:00:40 1992 Karl Berry (karl@hayley) + + * xregex.c (re_compile_fastmap): remove the `RE_' from + `REG_RE_MATCH_NULL_AT_END'. + Also, assert the fastmap in the pattern buffer is non-null. + Also, reset `succeed_n_p' after we've + paid attention to it, instead of every time through the loop. + Also, in the `anychar' case, only clear fastmap['\n'] if the + syntax says to, and don't return prematurely. + Also, rearrange cases in some semblance of a rational order. + * regex.h (REG_RE_MATCH_NULL_AT_END): remove the `RE_' from the name. + + * other.c: take bug reports from here. + * regress.c: new file for them. + * GNUmakefile (test): add it. + * main.c (main): new possible test. + * test.h (test_type): new value in enum. + +Thu Jun 25 17:37:43 1992 Karl Berry (karl@hayley) + + * xregex.c (scanstring) [test]: new function from jimb to allow some + escapes. + (main) [test]: call it (on the string, not the pattern). + + * xregex.c (main): make return type `int'. + +Wed Jun 24 10:43:03 1992 Karl Berry (karl@hayley) + + * xregex.c (pattern_offset_t): change to `int', for the benefit + of patterns which compile to more than 2^15 bytes. + + * xregex.c (GET_BUFFER_SPACE): remove spurious braces. + + * xregex.texinfo (Using Registers): put in a stub to ``document'' + the new function. + * regex.h (re_set_registers) [!__STDC__]: declare. + * xregex.c (re_set_registers): declare K&R style (also move to a + different place in the file). + +Mon Jun 8 18:03:28 1992 Jim Blandy (jimb@pogo.cs.oberlin.edu) + + * regex.h (RE_NREGS): Doc fix. + + * xregex.c (re_set_registers): New function. + * regex.h (re_set_registers): Declaration for new function. + +Fri Jun 5 06:55:18 1992 Karl Berry (karl@hayley) + + * main.c (main): `return 0' instead of `exit (0)'. (From Paul Eggert) + + * regexinc.c (SIGN_EXTEND_CHAR): cast to unsigned char. + (extract_number, EXTRACT_NUMBER): don't bother to cast here. + +Tue Jun 2 07:37:53 1992 Karl Berry (karl@hayley) + + * Version 0.6. + + * Change copyrights to `1985, 89, ...'. + + * regex.h (REG_RE_MATCH_NULL_AT_END): new macro. + * xregex.c (re_compile_fastmap): initialize `can_be_null' to + `p==pend', instead of in the test at the top of the loop (as + it was, it was always being set). + Also, set `can_be_null'=1 if we would jump to the end of the + pattern in the `on_failure_jump' cases. + (re_search_2): check if `can_be_null' is 1, not nonzero. This + was the original test in rms' regex; why did we change this? + + * xregex.c (re_compile_fastmap): rename `is_a_succeed_n' to + `succeed_n_p'. + +Sat May 30 08:09:08 1992 Karl Berry (karl@hayley) + + * xregex.c (re_compile_pattern): declare `regnum' as `unsigned', + not `regnum_t', for the benefit of those patterns with more + than 255 groups. + + * xregex.c: rename `failure_stack' to `fail_stack', for brevity; + likewise for `match_nothing' to `match_null'. + + * regexinc.c (REGEX_REALLOCATE): take both the new and old + sizes, and copy only the old bytes. + * xregex.c (DOUBLE_FAILURE_STACK): pass both old and new. + * This change from Thorsten Ohl. + +Fri May 29 11:45:22 1992 Karl Berry (karl@hayley) + + * regexinc.c (SIGN_EXTEND_CHAR): define as `(signed char) c' + instead of relying on __CHAR_UNSIGNED__, to work with + compilers other than GCC. From Per Bothner. + + * main.c (main): change return type to `int'. + +Mon May 18 06:37:08 1992 Karl Berry (karl@hayley) + + * regex.h (RE_SYNTAX_AWK): typo in RE_RE_UNMATCHED... + +Fri May 15 10:44:46 1992 Karl Berry (karl@hayley) + + * Version 0.5. + +Sun May 3 13:54:00 1992 Karl Berry (karl@hayley) + + * regex.h (struct re_pattern_buffer): now it's just `regs_allocated'. + (REGS_UNALLOCATED, REGS_REALLOCATE, REGS_FIXED): new constants. + * xregex.c (regexec, re_compile_pattern): set the field appropriately. + (re_match_2): and use it. bufp can't be const any more. + +Fri May 1 15:43:09 1992 Karl Berry (karl@hayley) + + * regexinc.c: unconditionally include <sys/types.h>, first. + + * regex.h (struct re_pattern_buffer): rename + `caller_allocated_regs' to `regs_allocated_p'. + * xregex.c (re_compile_pattern): same change here. + (regexec): and here. + (re_match_2): reallocate registers if necessary. + +Fri Apr 10 07:46:50 1992 Karl Berry (karl@hayley) + + * regex.h (RE_SYNTAX{_POSIX,}_AWK): new definitions from Arnold. + +Sun Mar 15 07:34:30 1992 Karl Berry (karl at hayley) + + * GNUmakefile (dist): versionize regex.{c,h,texinfo}. + +Tue Mar 10 07:05:38 1992 Karl Berry (karl at hayley) + + * Version 0.4. + + * xregex.c (PUSH_FAILURE_POINT): always increment the failure id. + (DEBUG_STATEMENT) [DEBUG]: execute the statement even if `debug'==0. + + * xregex.c (pop_failure_point): if the saved string location is + null, keep the current value. + (re_match_2): at fail, test for a dummy failure point by + checking the restored pattern value, not string value. + (re_match_2): new case, `on_failure_keep_string_jump'. + (regex_compile): output this opcode in the .*\n case. + * regexinc.c (re_opcode_t): define the opcode. + (partial_compiled_pattern_pattern): add the new case. + +Mon Mar 9 09:09:27 1992 Karl Berry (karl at hayley) + + * xregex.c (regex_compile): optimize .*\n to output an + unconditional jump to the ., instead of pushing failure points + each time through the loop. + + * xregex.c (DOUBLE_FAILURE_STACK): compute the maximum size + ourselves (and correctly); change callers. + +Sun Mar 8 17:07:46 1992 Karl Berry (karl at hayley) + + * xregex.c (failure_stack_elt_t): change to `const char *', to + avoid warnings. + + * regex.h (re_set_syntax): declare this. + + * xregex.c (pop_failure_point) [DEBUG]: conditionally pass the + original strings and sizes; change callers. + +Thu Mar 5 16:35:35 1992 Karl Berry (karl at claude.cs.umb.edu) + + * xregex.c (regnum_t): new type for register/group numbers. + (compile_stack_elt_t, regex_compile): use it. + + * xregex.c (regexec): declare len as `int' to match re_search. + + * xregex.c (re_match_2): don't declare p1 twice. + + * xregex.c: change `while (1)' to `for (;;)' to avoid silly + compiler warnings. + + * regex.h [__STDC__]: use #if, not #ifdef. + + * regexinc.c (REGEX_REALLOCATE): cast the result of alloca to + (char *), to avoid warnings. + + * xregex.c (regerror): declare variable as const. + + * xregex.c (re_compile_pattern, re_comp): define as returning a const + char *. + * regex.h (re_compile_pattern, re_comp): likewise. + +Thu Mar 5 15:57:56 1992 Karl Berry (karl@hal) + + * xregex.c (regcomp): declare `syntax' as unsigned. + + * xregex.c (re_match_2): try to avoid compiler warnings about + unsigned comparisons. + + * GNUmakefile (test-xlc): new target. + + * regex.h (reg_errcode_t): remove trailing comma from definition. + * regexinc.c (re_opcode_t): likewise. + +Thu Mar 5 06:56:07 1992 Karl Berry (karl at hayley) + + * GNUmakefile (dist): add version numbers automatically. + (versionfiles): new variable. + (regex.{c,texinfo}): don't add version numbers here. + * regex.h: put in placeholder instead of the version number. + +Fri Feb 28 07:11:33 1992 Karl Berry (karl at hayley) + + * xregex.c (re_error_msg): declare const, since it is. + +Sun Feb 23 05:41:57 1992 Karl Berry (karl at fosse) + + * xregex.c (PAT_PUSH{,_2,_3}, ...): cast args to avoid warnings. + (regex_compile, regexec): return REG_NOERROR, instead + of 0, on success. + (boolean): define as char, and #define false and true. + * regexinc.c (STREQ): cast the result. + +Sun Feb 23 07:45:38 1992 Karl Berry (karl at hayley) + + * GNUmakefile (test-cc, test-hc, test-pcc): new targets. + + * regex.inc (extract_number, extract_number_and_incr) [DEBUG]: + only define if we are debugging. + + * xregex.c [_AIX]: do #pragma alloca first if necessary. + * regexinc.c [_AIX]: remove the #pragma from here. + + * regex.h (reg_syntax_t): declare as unsigned, and redo the enum + as #define's again. Some compilers do stupid things with enums. + +Thu Feb 20 07:19:47 1992 Karl Berry (karl at hayley) + + * Version 0.3. + + * xregex.c, regex.h (newline_anchor_match_p): rename to + `newline_anchor'; dumb idea to change the name. + +Tue Feb 18 07:09:02 1992 Karl Berry (karl at hayley) + + * regexinc.c: go back to original, i.e., don't include + <string.h> or define strchr. + * xregex.c (regexec): don't bother with adding characters after + newlines to the fastmap; instead, just don't use a fastmap. + * xregex.c (regcomp): set the buffer and fastmap fields to zero. + + * xregex.texinfo (GNU r.e. compiling): have to initialize more + than two fields. + + * regex.h (struct re_pattern_buffer): rename `newline_anchor' to + `newline_anchor_match_p', as we're back to two cases. + * xregex.c (regcomp, re_compile_pattern, re_comp): change + accordingly. + (re_match_2): at begline and endline, POSIX is not a special + case anymore; just check newline_anchor_match_p. + +Thu Feb 13 16:29:33 1992 Karl Berry (karl at hayley) + + * xregex.c (*empty_string*): rename to *null_string*, for brevity. + +Wed Feb 12 06:36:22 1992 Karl Berry (karl at hayley) + + * xregex.c (re_compile_fastmap): at endline, don't set fastmap['\n']. + (re_match_2): rewrite the begline/endline cases to take account + of the new field newline_anchor. + +Tue Feb 11 14:34:55 1992 Karl Berry (karl at hayley) + + * regexinc.c [!USG etc.]: include <strings.h> and define strchr + as index. + + * xregex.c (re_search_2): when searching backwards, declare `c' + as a char and use casts when using it as an array subscript. + + * xregex.c (regcomp): if REG_NEWLINE, set + RE_HAT_LISTS_NOT_NEWLINE. Set the `newline_anchor' field + appropriately. + (regex_compile): compile [^...] as matching a \n according to + the syntax bit. + (regexec): if doing REG_NEWLINE stuff, compile a fastmap and add + characters after any \n's to the newline. + * regex.h (RE_HAT_LISTS_NOT_NEWLINE): new syntax bit. + (struct re_pattern_buffer): rename `posix_newline' to + `newline_anchor', define constants for its values. + +Mon Feb 10 07:22:50 1992 Karl Berry (karl at hayley) + + * xregex.c (re_compile_fastmap): combine the code at the top and + bottom of the loop, as it's essentially identical. + +Sun Feb 9 10:02:19 1992 Karl Berry (karl at hayley) + + * xregex.texinfo (POSIX Translate Tables): remove this, as it + doesn't match the spec. + + * xregex.c (re_compile_fastmap): if we finish off a path, go + back to the top (to set can_be_null) instead of returning + immediately. + + * xregex.texinfo: changes from bob. + +Sat Feb 1 07:03:25 1992 Karl Berry (karl at hayley) + + * xregex.c (re_search_2): doc fix (from rms). + +Fri Jan 31 09:52:04 1992 Karl Berry (karl at hayley) + + * xregex.texinfo (GNU Searching): clarify the range arg. + + * xregex.c (re_match_2, at_endline_op_p): add extra parens to + get rid of GCC 2's (silly, IMHO) warning about && within ||. + + * xregex.c (common_op_match_empty_string_p): use + MATCH_NOTHING_UNSET_VALUE, not -1. + +Thu Jan 16 08:43:02 1992 Karl Berry (karl at hayley) + + * xregex.c (SET_REGS_MATCHED): only set the registers from + lowest to highest. + + * regexinc.c (MIN): new macro. + * xregex.c (re_match_2): only check min (num_regs, + regs->num_regs) when we set the returned regs. + + * xregex.c (re_match_2): set registers after the first + num_regs to -1 before we return. + +Tue Jan 14 16:01:42 1992 Karl Berry (karl at hayley) + + * xregex.c (re_match_2): initialize max (RE_NREGS, re_nsub + 1) + registers (from rms). + + * xregex.c, regex.h: don't abbreviate `19xx' to `xx'. + + * regexinc.c [!emacs]: include <sys/types.h> before <unistd.h>. + (from ro@thp.Uni-Koeln.DE). + +Thu Jan 9 07:23:00 1992 Karl Berry (karl at hayley) + + * xregex.c (*unmatchable): rename to `match_empty_string_p'. + (CAN_MATCH_NOTHING): rename to `REG_MATCH_EMPTY_STRING_P'. + + * regexinc.c (malloc, realloc): remove prototypes, as they can + cause clashes (from rms). + +Mon Jan 6 12:43:24 1992 Karl Berry (karl at claude.cs.umb.edu) + + * Version 0.2. + +Sun Jan 5 10:50:38 1992 Karl Berry (karl at hayley) + + * xregex.texinfo: bring more or less up-to-date. + * GNUmakefile (regex.texinfo): generate from regex.h and + xregex.texinfo. + * include.awk: new file. + + * xregex.c: change all calls to the fn extract_number_and_incr + to the macro. + + * xregex.c (re_match_2) [emacs]: in at_dot, use PTR_CHAR_POS + 1, + instead of bf_* and sl_*. Cast d to unsigned char *, to match + the declaration in Emacs' buffer.h. + [emacs19]: in before_dot, at_dot, and after_dot, likewise. + + * regexinc.c: unconditionally include <sys/types.h>. + + * regexinc.c (alloca) [!alloca]: Emacs config files sometimes + define this, so don't define it if it's already defined. + +Sun Jan 5 06:06:53 1992 Karl Berry (karl at fosse) + + * xregex.c (re_comp): fix type conflicts with regex_compile (we + haven't been compiling this). + + * regexinc.c (SIGN_EXTEND_CHAR): use `__CHAR_UNSIGNED__', not + `CHAR_UNSIGNED'. + + * regexinc.c (NULL) [!NULL]: define it (as zero). + + * regexinc.c (extract_number): remove the temporaries. + +Sun Jan 5 07:50:14 1992 Karl Berry (karl at hayley) + + * regex.h (regerror) [!__STDC__]: return a size_t, not a size_t *. + + * xregex.c (PUSH_FAILURE_POINT, ...): declare `destination' as + `char *' instead of `void *', to match alloca declaration. + + * xregex.c (regerror): use `size_t' for the intermediate values + as well as the return type. + + * xregex.c (regexec): cast the result of malloc. + + * xregex.c (regexec): don't initialize `private_preg' in the + declaration, as old C compilers can't do that. + + * xregex.c (main) [test]: declare printchar void. + + * xregex.c (assert) [!DEBUG]: define this to do nothing, and + remove #ifdef DEBUG's from around asserts. + + * xregex.c (re_match_2): remove error message when not debugging. + +Sat Jan 4 09:45:29 1992 Karl Berry (karl at hayley) + + * other.c: test the bizarre duplicate case in re_compile_fastmap + that I just noticed. + + * test.c (general_test): don't test registers beyond the end of + correct_regs, as well as regs. + + * xregex.c (regex_compile): at handle_close, don't assign to + *inner_group_loc if we didn't push a start_memory (because the + group number was too big). In fact, don't push or pop the + inner_group_offset in that case. + + * regex.c: rename to xregex.c, since it's not the whole thing. + * regex.texinfo: likewise. + * GNUmakefile: change to match. + + * regex.c [DEBUG]: only include <stdio.h> if debugging. + + * regexinc.c (SIGN_EXTEND_CHAR) [CHAR_UNSIGNED]: if it's already + defined, don't redefine it. + + * regex.c: define _GNU_SOURCE at the beginning. + * regexinc.c (isblank) [!isblank]: define it. + (isgraph) [!isgraph]: change conditional to this, and remove the + sequent stuff. + + * regex.c (regex_compile): add `blank' character class. + + * regex.c (regex_compile): don't use a uchar variable to loop + through all characters. + + * regex.c (regex_compile): at '[', improve logic for checking + that we have enough space for the charset. + + * regex.h (struct re_pattern_buffer): declare translate as char + * again. We only use it as an array subscript once, I think. + + * regex.c (TRANSLATE): new macro to cast the data character + before subscripting. + (num_internal_regs): rename to `num_regs'. + +Fri Jan 3 07:58:01 1992 Karl Berry (karl at hayley) + + * regex.h (struct re_pattern_buffer): declare `allocated' and + `used' as unsigned long, since these are never negative. + + * regex.c (compile_stack_element): rename to compile_stack_elt_t. + (failure_stack_element): similarly. + + * regexinc.c (TALLOC, RETALLOC): new macros to simplify + allocation of arrays. + + * regex.h (re_*) [__STDC__]: don't declare string args unsigned + char *; that makes them incompatible with string constants. + (struct re_pattern_buffer): declare the pattern and translate + table as unsigned char *. + * regex.c (most routines): use unsigned char vs. char consistently. + + * regex.h (re_compile_pattern): do not declare the length arg as + const. + * regex.c (re_compile_pattern): likewise. + + * regex.c (POINTER_TO_REG): rename to `POINTER_TO_OFFSET'. + + * regex.h (re_registers): declare `start' and `end' as + `regoff_t', instead of `int'. + + * regex.c (regexec): if either of the malloc's for the register + information fail, return failure. + + * regex.h (RE_NREGS): define this again, as 30 (from jla). + (RE_ALLOCATE_REGISTERS): remove this. + (RE_SYNTAX_*): remove it from definitions. + (re_pattern_buffer): remove `return_default_num_regs', add + `caller_allocated_regs'. + * regex.c (re_compile_pattern): clear no_sub and + caller_allocated_regs in the pattern. + (regcomp): set caller_allocated_regs. + (re_match_2): do all register allocation at the end of the + match; implement new semantics. + + * regex.c (MAX_REGNUM): new macro. + (regex_compile): at handle_open and handle_close, if the group + number is too large, don't push the start/stop memory. + +Thu Jan 2 07:56:10 1992 Karl Berry (karl at hayley) + + * regex.c (re_match_2): if the back reference is to a group that + never matched, then goto fail, not really_fail. Also, don't + test if the pattern can match the empty string. Why did we + ever do that? + (really_fail): this label no longer needed. + + * regexinc.c [STDC_HEADERS]: use only this to test if we should + include <stdlib.h>. + + * regex.c (DO_RANGE, regex_compile): translate in all cases + except the single character after a \. + + * regex.h (RE_AWK_CLASS_HACK): rename to + RE_BACKSLASH_ESCAPE_IN_LISTS. + * regex.c (regex_compile): change use. + + * regex.c (re_compile_fastmap): do not translate the characters + again; we already translated them at compilation. (From ylo@ngs.fi.) + + * regex.c (re_match_2): in case for at_dot, invert sense of + comparison and find the character number properly. (From + worley@compass.com.) + (re_match_2) [emacs]: remove the cases for before_dot and + after_dot, since there's no way to specify them, and the code + is wrong (judging from this change). + +Wed Jan 1 09:13:38 1992 Karl Berry (karl at hayley) + + * psx-{interf,basic,extend}.c, other.c: set `t' as the first + thing, so that if we run them in sucession, general_test's + kludge to see if we're doing POSIX tests works. + + * test.h (test_type): add `all_test'. + * main.c: add case for `all_test'. + + * regexinc.c (partial_compiled_pattern_printer, + double_string_printer): don't print anything if we're passed null. + + * regex.c (PUSH_FAILURE_POINT): do not scan for the highest and + lowest active registers. + (re_match_2): compute lowest/highest active regs at start_memory and + stop_memory. + (NO_{LOW,HIGH}EST_ACTIVE_REG): new sentinel values. + (pop_failure_point): return the lowest/highest active reg values + popped; change calls. + + * regex.c [DEBUG]: include <assert.h>. + (various routines) [DEBUG]: change conditionals to assertions. + + * regex.c (DEBUG_STATEMENT): new macro. + (PUSH_FAILURE_POINT): use it to increment num_regs_pushed. + (re_match_2) [DEBUG]: only declare num_regs_pushed if DEBUG. + + * regex.c (*can_match_nothing): rename to *unmatchable. + + * regex.c (re_match_2): at stop_memory, adjust argument reading. + + * regex.h (re_pattern_buffer): declare `can_be_null' as a 2-bit + bit field. + + * regex.h (re_pattern_buffer): declare `buffer' unsigned char *; + no, dumb idea. The pattern can have signed number. + + * regex.c (re_match_2): in maybe_pop_jump case, skip over the + right number of args to the group operators, and don't do + anything with endline if posix_newline is not set. + + * regex.c, regexinc.c (all the things we just changed): go back + to putting the inner group count after the start_memory, + because we need it in the on_failure_jump case in re_match_2. + But leave it after the stop_memory also, since we need it + there in re_match_2, and we don't have any way of getting back + to the start_memory. + + * regexinc.c (partial_compiled_pattern_printer): adjust argument + reading for start/stop_memory. + * regex.c (re_compile_fastmap, group_can_match_nothing): likewise. + +Tue Dec 31 10:15:08 1991 Karl Berry (karl at hayley) + + * regex.c (bits list routines): remove these. + (re_match_2): get the number of inner groups from the pattern, + instead of keeping track of it at start and stop_memory. + Put the count after the stop_memory, not after the + start_memory. + (compile_stack_element): remove `fixup_inner_group' member, + since we now put it in when we can compute it. + (regex_compile): at handle_open, don't push the inner group + offset, and at handle_close, don't pop it. + + * regex.c (level routines): remove these, and their uses in + regex_compile. This was another manifestation of having to find + $'s that were endlines. + + * regex.c (regexec): this does searching, not matching (a + well-disguised part of the standard). So rewrite to use + `re_search' instead of `re_match'. + * psx-interf.c (test_regexec): add tests to, uh, match. + + * regex.h (RE_TIGHT_ALT): remove this; nobody uses it. + * regex.c: remove the code that was supposed to implement it. + + * other.c (test_others): ^ and $ never match newline characters; + RE_CONTEXT_INVALID_OPS doesn't affect anchors. + + * psx-interf.c (test_regerror): update for new error messages. + + * psx-extend.c: it's now ok to have an alternative be just a $, + so remove all the tests which supposed that was invalid. + +Wed Dec 25 09:00:05 1991 Karl Berry (karl at hayley) + + * regex.c (regex_compile): in handle_open, don't skip over ^ and + $ when checking for an empty group. POSIX has changed the + grammar. + * psx-extend.c (test_posix_extended): thus, move (^$) tests to + valid section. + + * regexinc.c (boolean): move from here to test.h and regex.c. + * test files: declare verbose, omit_register_tests, and + test_should_match as boolean. + + * psx-interf.c (test_posix_c_interface): remove the `c_'. + * main.c: likewise. + + * psx-basic.c (test_posix_basic): ^ ($) is an anchor after + (before) an open (close) group. + + * regex.c (re_match_2): in endline, correct precedence of + posix_newline condition. + +Tue Dec 24 06:45:11 1991 Karl Berry (karl at hayley) + + * test.h: incorporate private-tst.h. + * test files: include test.h, not private-tst.h. + + * test.c (general_test): set posix_newline to zero if we are + doing POSIX tests (unfortunately, it's difficult to call + regcomp in this case, which is what we should really be doing). + + * regex.h (reg_syntax_t): make this an enumeration type which + defines the syntax bits; renames re_syntax_t. + + * regex.c (at_endline_op_p): don't preincrement p; then if it's + not an empty string op, we lose. + + * regex.h (reg_errcode_t): new enumeration type of the error + codes. + * regex.c (regex_compile): return that type. + + * regex.c (regex_compile): in [, initialize + just_had_a_char_class to false; somehow I had changed this to + true. + + * regex.h (RE_NO_CONSECUTIVE_REPEATS): remove this, since we + don't use it, and POSIX doesn't require this behavior anymore. + * regex.c (regex_compile): remove it from here. + + * regex.c (regex_compile): remove the no_op insertions for + verify_and_adjust_endlines, since that doesn't exist anymore. + + * regex.c (regex_compile) [DEBUG]: use printchar to print the + pattern, so unprintable bytes will print properly. + + * regex.c: move re_error_msg back. + * test.c (general_test): print the compile error if the pattern + was invalid. + +Mon Dec 23 08:54:53 1991 Karl Berry (karl at hayley) + + * regexinc.c: move re_error_msg here. + + * regex.c (re_error_msg): the ``message'' for success must be + NULL, to keep the interface to re_compile_pattern the same. + (regerror): if the msg is null, use "Success". + + * rename most test files for consistency. Change Makefile + correspondingly. + + * test.c (most routines): add casts to (unsigned char *) when we + call re_{match,search}{,_2}. + +Sun Dec 22 09:26:06 1991 Karl Berry (karl at hayley) + + * regex.c (re_match_2): declare string args as unsigned char * + again; don't declare non-pointer args const; declare the + pattern buffer const. + (re_match): likewise. + (re_search_2, re_search): likewise, except don't declare the + pattern const, since we make a fastmap. + * regex.h [__STDC__]: change prototypes. + + * regex.c (regex_compile): return an error code, not a string. + (re_err_list): new table to map from error codes to string. + (re_compile_pattern): return an element of re_err_list. + (regcomp): don't test all the strings. + (regerror): just use the list. + (put_in_buffer): remove this. + + * regex.c (equivalent_failure_points): remove this. + + * regex.c (re_match_2): don't copy the string arguments into + non-const pointers. We never alter the data. + + * regex.c (re_match_2): move assignment to `is_a_jump_n' out of + the main loop. Just initialize it right before we do + something with it. + + * regex.[ch] (re_match_2): don't declare the int parameters const. + +Sat Dec 21 08:52:20 1991 Karl Berry (karl at hayley) + + * regex.h (re_syntax_t): new type; declare to be unsigned + (previously we used int, but since we do bit operations on + this, unsigned is better, according to H&S). + (obscure_syntax, re_pattern_buffer): use that type. + * regex.c (re_set_syntax, regex_compile): likewise. + + * regex.h (re_pattern_buffer): new field `posix_newline'. + * regex.c (re_comp, re_compile_pattern): set to zero. + (regcomp): set to REG_NEWLINE. + * regex.h (RE_HAT_LISTS_NOT_NEWLINE): remove this (we can just + check `posix_newline' instead.) + + * regex.c (op_list_type, op_list, add_op): remove these. + (verify_and_adjust_endlines): remove this. + (pattern_offset_list_type, *pattern_offset* routines): and these. + These things all implemented the nonleading/nontrailing position + code, which was very long, had a few remaining problems, and + is no longer needed. So... + + * regexinc.c (STREQ): new macro to abbreviate strcmp(,)==0, for + brevity. Change various places in regex.c to use it. + + * regex{,inc}.c (enum regexpcode): change to a typedef + re_opcode_t, for brevity. + + * regex.h (re_syntax_table) [SYNTAX_TABLE]: remove this; it + should only be in regex.c, I think, since we don't define it + in this case. Maybe it should be conditional on !SYNTAX_TABLE? + + * regexinc.c (partial_compiled_pattern_printer): simplify and + distinguish the emacs/not-emacs (not)wordchar cases. + +Fri Dec 20 08:11:38 1991 Karl Berry (karl at hayley) + + * regexinc.c (regexpcode) [emacs]: only define the Emacs opcodes + if we are ifdef emacs. + + * regex.c (BUF_PUSH*): rename to PAT_PUSH*. + + * regex.c (regex_compile): in $ case, go back to essentially the + original code for deciding endline op vs. normal char. + (at_endline_op_p): new routine. + * regex.h (RE_ANCHORS_ONLY_AT_ENDS, RE_CONTEXT_INVALID_ANCHORS, + RE_REPEATED_ANCHORS_AWAY, RE_NO_ANCHOR_AT_NEWLINE): remove + these. POSIX has simplified the rules for anchors in draft + 11.2. + (RE_NEWLINE_ORDINARY): new syntax bit. + (RE_CONTEXT_INDEP_ANCHORS): change description to be compatible + with POSIX. + * regex.texinfo (Syntax Bits): remove the descriptions. + +Mon Dec 16 08:12:40 1991 Karl Berry (karl at hayley) + + * regex.c (re_match_2): in jump_past_next_alt, unconditionally + goto no_pop. The only register we were finding was one which + enclosed the whole alternative expression, not one around an + individual alternative. So we were never doing what we + thought we were doing, and this way makes (|a) against the + empty string fail. + + * regex.c (regex_compile): remove `highest_ever_regnum', and + don't restore regnum from the stack; just put it into a + temporary to put into the stop_memory. Otherwise, groups + aren't numbered consecutively. + + * regex.c (is_in_compile_stack): rename to + `group_in_compile_stack'; remove unnecessary test for the + stack being empty. + + * regex.c (re_match_2): in on_failure_jump, skip no_op's before + checking for the start_memory, in case we were called from + succeed_n. + +Sun Dec 15 16:20:48 1991 Karl Berry (karl at hayley) + + * regex.c (regex_compile): in duplicate case, use + highest_ever_regnum instead of regnum, since the latter is + reverted at stop_memory. + + * regex.c (re_match_2): in on_failure_jump, if the * applied to + a group, save the information for that group and all inner + groups (by making it active), even though we're not inside it + yet. + +Sat Dec 14 09:50:59 1991 Karl Berry (karl at hayley) + + * regex.c (PUSH_FAILURE_ITEM, POP_FAILURE_ITEM): new macros. + Use them instead of copying the stack manipulating a zillion + times. + + * regex.c (PUSH_FAILURE_POINT, pop_failure_point) [DEBUG]: save + and restore a unique identification value for each failure point. + + * regexinc.c (partial_compiled_pattern_printer): don't print an + extra / after duplicate commands. + + * regex.c (regex_compile): in back-reference case, allow a back + reference to register `regnum'. Otherwise, even `\(\)\1' + fails, since regnum is 1 at the back-reference. + + * regex.c (re_match_2): in fail, don't examine the pattern if we + restored to pend. + + * test_private.h: rename to private_tst.h. Change includes. + + * regex.c (extend_bits_list): compute existing size for realloc + in bytes, not blocks. + + * regex.c (re_match_2): in jump_past_next_alt, the for loop was + missing its (empty) statement. Even so, some register tests + still fail, although in a different way than in the previous change. + +Fri Dec 13 15:55:08 1991 Karl Berry (karl at hayley) + + * regex.c (re_match_2): in jump_past_next_alt, unconditionally + goto no_pop, since we weren't properly detecting if the + alternative matched something anyway. No, we need to not jump + to keep the register values correct; just change to not look at + register zero and not test RE_NO_EMPTY_ALTS (which is a + compile-time thing). + + * regex.c (SET_REGS_MATCHED): start the loop at 1, since we never + care about register zero until the very end. (I think.) + + * regex.c (PUSH_FAILURE_POINT, pop_failure_point): go back to + pushing and popping the active registers, instead of only doing + the registers before a group: (fooq|fo|o)*qbar against fooqbar + fails, since we restore back into the middle of group 1, yet it + isn't active, because the previous restore clobbered the active flag. + +Thu Dec 12 17:25:36 1991 Karl Berry (karl at hayley) + + * regex.c (PUSH_FAILURE_POINT): do not call + `equivalent_failure_points' after all; it causes the registers + to be ``wrong'' (according to POSIX), and an infinite loop on + `((a*)*)*' against `ab'. + + * regex.c (re_compile_fastmap): don't push `pend' on the failure + stack. + +Tue Dec 10 10:30:03 1991 Karl Berry (karl at hayley) + + * regex.c (PUSH_FAILURE_POINT): if pushing same failure point that + is on the top of the stack, fail. + (equivalent_failure_points): new routine. + + * regex.c (re_match_2): add debug statements for every opcode we + execute. + + * regex.c (regex_compile/handle_close): restore + `fixup_inner_group_count' and `regnum' from the stack. + +Mon Dec 9 13:51:15 1991 Karl Berry (karl at hayley) + + * regex.c (PUSH_FAILURE_POINT): declare `this_reg' as int, so + unsigned arithmetic doesn't happen when we don't want to save + the registers. + +Tue Dec 3 08:11:10 1991 Karl Berry (karl at hayley) + + * regex.c (extend_bits_list): divide size by bits/block. + + * regex.c (init_bits_list): remove redundant assignmen to + `bits_list_ptr'. + + * regexinc.c (partial_compiled_pattern_printer): don't do *p++ + twice in the same expr. + + * regex.c (re_match_2): at on_failure_jump, use the correct + pattern positions for getting the stuff following the start_memory. + + * regex.c (struct register_info): remove the bits_list for the + inner groups; make that a separate variable. + +Mon Dec 2 10:42:07 1991 Karl Berry (karl at hayley) + + * regex.c (PUSH_FAILURE_POINT): don't pass `failure_stack' as an + arg; change callers. + + * regex.c (PUSH_FAILURE_POINT): print items in order they are + pushed. + (pop_failure_point): likewise. + + * regex.c (main): prompt for the pattern and string. + + * regex.c (FREE_VARIABLES) [!REGEX_MALLOC]: declare as nothing; + remove #ifdefs from around calls. + + * regex.c (extract_number, extract_number_and_incr): declare static. + + * regex.c: remove the canned main program. + * main.c: new file. + * Makefile (COMMON): add main.o. + +Tue Sep 24 06:26:51 1991 Kathy Hargreaves (kathy at fosse) + + * regex.c (re_match_2): Made `pend' and `dend' not register variables. + Only set string2 to string1 if string1 isn't null. + Send address of p, d, regstart, regend, and reg_info to + pop_failure_point. + Put in more debug statements. + + * regex.c [debug]: Added global variable. + (DEBUG_*PRINT*): Only print if `debug' is true. + (DEBUG_DOUBLE_STRING_PRINTER): Changed DEBUG_STRING_PRINTER's + name to this. + Changed some comments. + (PUSH_FAILURE_POINT): Moved and added some debugging statements. + Was saving regstart on the stack twice instead of saving both + regstart and regend; remedied this. + [NUM_REGS_ITEMS]: Changed from 3 to 4, as now save lowest and + highest active registers instead of highest used one. + [NUM_NON_REG_ITEMS]: Changed name of NUM_OTHER_ITEMS to this. + (NUM_FAILURE_ITEMS): Use active registers instead of number 0 + through highest used one. + (re_match_2): Have pop_failure_point put things in the variables. + (pop_failure_point): Have it do what the fail case in re_match_2 + did with the failure stack, instead of throwing away the stuff + popped off. re_match_2 can ignore results when it doesn't + need them. + + +Thu Sep 5 13:23:28 1991 Kathy Hargreaves (kathy at fosse) + + * regex.c (banner): Changed copyright years to be separate. + + * regex.c [CHAR_UNSIGNED]: Put __ at both ends of this name. + [DEBUG, debug_count, *debug_p, DEBUG_PRINT_1, DEBUG_PRINT_2, + DEBUG_COMPILED_PATTERN_PRINTER ,DEBUG_STRING_PRINTER]: + defined these for debugging. + (extract_number): Added this (debuggable) routine version of + the macro EXTRACT_NUMBER. Ditto for EXTRACT_NUMBER_AND_INCR. + (re_compile_pattern): Set return_default_num_regs if the + syntax bit RE_ALLOCATE_REGISTERS is set. + [REGEX_MALLOC]: Renamed USE_ALLOCA to this. + (BUF_POP): Got rid of this, as don't ever use it. + (regex_compile): Made the type of `pattern' not be register. + If DEBUG, print the pattern to compile. + (re_match_2): If had a `$' in the pattern before a `^' then + don't record the `^' as an anchor. + Put (enum regexpcode) before references to b, as suggested + [RE_NO_BK_BRACES]: Changed RE_NO_BK_CURLY_BRACES to this. + (remove_pattern_offset): Removed this unused routine. + (PUSH_FAILURE_POINT): Changed to only save active registers. + Put in debugging statements. + (re_compile_fastmap): Made `pattern' not a register variable. + Use routine for extracting numbers instead of macro. + (re_match_2): Made `p', `mcnt' and `mcnt2' not register variables. + Added `num_regs_pushed' for debugging. + Only malloc registers if the syntax bit RE_ALLOCATE_REGISTERS is set. + Put in debug statements. + Put the macro NOTE_INNER_GROUP's code inline, as it was the + only called in one place. + For debugging, extract numbers using routines instead of macros. + In case fail: only restore pushed active registers, and added + debugging statements. + (pop_failure_point): Test for underfull stack. + (group_can_match_nothing, common_op_can_match_nothing): For + debugging, extract numbers using routines instead of macros. + (regexec): Changed formal parameters to not be prototypes. + Don't initialize `regs' or `private_preg' in their declarations. + +Tue Jul 23 18:38:36 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h [RE_CONTEX_INDEP_OPS]: Moved the anchor stuff out of + this bit. + [RE_UNMATCHED_RIGHT_PAREN_ORD]: Defined this bit. + [RE_CONTEXT_INVALID_ANCHORS]: Defined this bit. + [RE_CONTEXT_INDEP_ANCHORS]: Defined this bit. + Added RE_CONTEXT_INDEP_ANCHORS to all syntaxes which had + RE_CONTEXT_INDEP_OPS. + Took RE_ANCHORS_ONLY_AT_ENDS out of the POSIX basic syntax. + Added RE_UNMATCHED_RIGHT_PAREN_ORD to the POSIX extended + syntax. + Took RE_REPEATED_ANCHORS_AWAY out of the POSIX extended syntax. + Defined REG_NOERROR (which will probably have to go away again). + Changed the type `off_t' to `regoff_t'. + + * regex.c: Changed some commments. + (regex_compile): Added variable `had_an_endline' to keep track + of if hit a `$' since the beginning of the pattern or the last + alternative (if any). + Changed RE_CONTEXT_INVALID_OPS and RE_CONTEXT_INDEP_OPS to + RE_CONTEXT_INVALID_ANCHORS and RE_CONTEXT_INDEP_ANCHORS where + appropriate. + Put a `no_op' in the pattern if a repeat is only zero or one + times; in this case and if it is many times (whereupon a jump + backwards is pushed instead), keep track of the operator for + verify_and_adjust_endlines. + If RE_UNMATCHED_RIGHT_PAREN is set, make an unmatched + close-group operator match `)'. + Changed all error exits to exit (1). + (remove_pattern_offset): Added this routine, but don't use it. + (verify_and_adjust_endlines): At top of routine, if initialize + routines run out of memory, return true after setting + enough_memory false. + At end of endline, et al. case, don't set *p to no_op. + Repetition operators also set the level and active groups' + match statuses, unless RE_REPEATED_ANCHORS_AWAY is set. + (get_group_match_status): Put a return in front of call to get_bit. + (re_compile_fastmap): Changed is_a_succeed_n to a boolean. + If at end of pattern, then if the failure stack isn't empty, + go back to the failure point. + In *jump* case, only pop the stack if what's on top of it is + where we've just jumped to. + (re_search_2): Return -2 instead of val if val is -2. + (group_can_match_nothing, alternative_can_match_nothing, + common_op_can-match_nothing): Now pass in reg_info for the + `duplicate' case. + (re_match_2): Don't skip over the next alternative also if + empty alternatives aren't allowed. + In fail case, if failed to a backwards jump that's part of a + repetition loop, pop the current failure point and use the + next one. + (pop_failure_point): Check that there's as many register items + on the failure stack as the stack says there are. + (common_op_can_match_nothing): Added variables `ret' and + `reg_no' so can set reg_info for the group encountered. + Also break without doing anything if hit a no_op or the other + kinds of `endline's. + If not done already, set reg_info in start_memory case. + Put in no_pop_jump for an optimized succeed_n of zero repetitions. + In succeed_n case, if the number isn't zero, then return false. + Added `duplicate' case. + +Sat Jul 13 11:27:38 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (REG_NOERROR): Added this error code definition. + + * regex.c: Took some redundant parens out of macros. + (enum regexpcode): Added jump_past_next_alt. + Wrapped some macros in `do..while (0)'. + Changed some comments. + (regex_compile): Use `fixup_alt_jump' instead of `fixup_jump'. + Use `maybe_pop_jump' instead of `maybe_pop_failure_jump'. + Use `jump_past_next_alt' instead of `no_pop_jump' when at the + end of an alternative. + (re_match_2): Used REGEX_ALLOCATE for the registers stuff. + In stop_memory case: Add more boolean tests to see if the + group is in a loop. + Added jump_past_next_alt case, which doesn't jump over the + next alternative if the last one didn't match anything. + Unfortunately, to make this work with, e.g., `(a+?*|b)*' + against `bb', I also had to pop the alternative's failure + point, which in turn broke backtracking! + In fail case: Detect a dummy failure point by looking at + failure_stack.avail - 2, not stack[-2]. + (pop_failure_point): Only pop if the stack isn't empty; don't + give an error if it is. (Not sure yet this is correct.) + (group_can_match_nothing): Make it return a boolean instead of int. + Make it take an argument indicating the end of where it should look. + If find a group that can match nothing, set the pointer + argument to past the group in the pattern. + Took out cases which can share with alternative_can_match_nothing + and call common_op_can_match_nothing. + Took ++ out of switch, so could call common_op_can_match_nothing. + Wrote lots more for on_failure_jump case to handle alternatives. + Main loop now doesn't look for matching stop_memory, but + rather the argument END; return true if hit the matching + stop_memory; this way can call itself for inner groups. + (alternative_can_match_nothing): Added for alternatives. + (common_op_can_match_nothing): Added for previous two routines' + common operators. + (regerror): Returns a message saying there's no error if gets + sent REG_NOERROR. + +Wed Jul 3 10:43:15 1991 Kathy Hargreaves (kathy at hayley) + + * regex.c: Removed unnecessary enclosing parens from several macros. + Put `do..while (0)' around a few. + Corrected some comments. + (INIT_FAILURE_STACK_SIZE): Deleted in favor of using + INIT_FAILURE_ALLOC. + (INIT_FAILURE_STACK, DOUBLE_FAILURE_STACK, PUSH_PATTERN_OP, + PUSH_FAILURE_POINT): Made routines of the same name (but with all + lowercase letters) into these macros, so could use `alloca' + when USE_ALLOCA is defined. The reason is stated below for + bits lists. Deleted analogous routines. + (re_compile_fastmap): Added variable void *destination for + PUSH_PATTERN_OP. + (re_match_2): Added variable void *destination for REGEX_REALLOCATE. + Used the failure stack macros in place of the routines. + Detected a dummy failure point by inspecting the failure stack's + (avail - 2)th element, not failure_stack.stack[-2]. This bug + arose when used the failure stack macros instead of the routines. + + * regex.c [USE_ALLOCA]: Put this conditional around previous + alloca stuff and defined these to work differently depending + on whether or not USE_ALLOCA is defined: + (REGEX_ALLOCATE): Uses either `alloca' or `malloc'. + (REGEX_REALLOCATE): Uses either `alloca' or `realloc'. + (INIT_BITS_LIST, EXTEND_BITS_LIST, SET_BIT_TO_VALUE): Defined + macro versions of routines with the same name (only with all + lowercase letters) so could use `alloc' in re_match_2. This + is to prevent core leaks when C-g is used in Emacs and to make + things faster and avoid storage fragmentation. These things + have to be macros because the results of `alloca' go away with + the routine by which it's called. + (BITS_BLOCK_SIZE, BITS_BLOCK, BITS_MASK): Moved to above the + above-mentioned macros instead of before the routines defined + below regex_compile. + (set_bit_to_value): Compacted some code. + (reg_info_type): Changed inner_groups field to be bits_list_type + so could be arbitrarily long and thus handle arbitrary nesting. + (NOTE_INNER_GROUP): Put `do...while (0)' around it so could + use as a statement. + Changed code to use bits lists. + Added variable void *destination for REGEX_REALLOCATE (whose call + is several levels in). + Changed variable name of `this_bit' to `this_reg'. + (FREE_VARIABLES): Only define and use if USE_ALLOCA is defined. + (re_match_2): Use REGEX_ALLOCATE instead of malloc. + Instead of setting INNER_GROUPS of reg_info to zero, have to + use INIT_BITS_LIST and return -2 (and free variables if + USE_ALLOCA isn't defined) if it fails. + +Fri Jun 28 13:45:07 1991 Karl Berry (karl at hayley) + + * regex.c (re_match_2): set value of `dend' when we restore `d'. + + * regex.c: remove declaration of alloca. + + * regex.c (MISSING_ISGRAPH): rename to `ISGRAPH_MISSING'. + + * regex.h [_POSIX_SOURCE]: remove these conditionals; always + define POSIX stuff. + * regex.c (_POSIX_SOURCE): change conditionals to use `POSIX' + instead. + +Sat Jun 1 16:56:50 1991 Kathy Hargreaves (kathy at hayley) + + * regex.*: Changed RE_CONTEXTUAL_* to RE_CONTEXT_*, + RE_TIGHT_VBAR to RE_TIGHT_ALT, RE_NEWLINE_OR to + RE_NEWLINE_ALT, and RE_DOT_MATCHES_NEWLINE to RE_DOT_NEWLINE. + +Wed May 29 09:24:11 1991 Karl Berry (karl at hayley) + + * regex.texinfo (POSIX Pattern Buffers): cross-reference the + correct node name (Match-beginning-of-line, not ..._line). + (Syntax Bits): put @code around all syntax bits. + +Sat May 18 16:29:58 1991 Karl Berry (karl at hayley) + + * regex.c (global): add casts to keep broken compilers from + complaining about malloc and realloc calls. + + * regex.c (isgraph) [MISSING_ISGRAPH]: change test to this, + instead of `#ifndef isgraph', since broken compilers can't + have both a macro and a symbol by the same name. + + * regex.c (re_comp, re_exec) [_POSIX_SOURCE]: do not define. + (regcomp, regfree, regexec, regerror) [_POSIX_SOURCE && !emacs]: + only define in this case. + +Mon May 6 17:37:04 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (re_search, re_search_2): Changed BUFFER to not be const. + + * regex.c (re_compile_pattern): `^' is in a leading position if + it precedes a newline. + (various routines): Added or changed header comments. + (double_pattern_offsets_list): Changed name from + `extend_pattern_offsets_list'. + (adjust_pattern_offsets_list): Changed return value from + unsigned to void. + (verify_and_adjust_endlines): Now returns `true' and `false' + instead of 1 and 0. + `$' is in a leading position if it follows a newline. + (set_bit_to_value, get_bit_value): Exit with error if POSITION < 0 + so now calling routines don't have to. + (init_failure_stack, inspect_failure_stack_top, + pop_failure_stack_top, push_pattern_op, double_failure_stack): + Now return value unsigned instead of boolean. + (re_search, re_search_2): Changed BUFP to not be const. + (re_search_2): Added variable const `private_bufp' to send to + re_match_2. + (push_failure_point): Made return value unsigned instead of boolean. + +Sat May 4 15:32:22 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (re_compile_fastmap): Added extern for this. + Changed some comments. + + * regex.c (re_compile_pattern): In case handle_bar: put invalid + pattern test before levels matching stuff. + Changed some commments. + Added optimizing test for detecting an empty alternative that + ends with a trailing '$' at the end of the pattern. + (re_compile_fastmap): Moved failure_stack stuff to before this + so could use it. Made its stack dynamic. + Made it return an int so that it could return -2 if its stack + couldn't be allocated. + Added to header comment (about the return values). + (init_failure_stack): Wrote so both re_match_2 and + re_compile_fastmap could use it similar stacks. + (double_failure_stack): Added for above reasons. + (push_pattern_op): Wrote for re_compile_fastmap. + (re_search_2): Now return -2 if re_compile_fastmap does. + (re_match_2): Made regstart and regend type failure_stack_element*. + (push_failure_point): Made pattern_place and string_place type + failure_stack_element*. + Call double_failure_stack now. + Return true instead of 1. + +Wed May 1 12:57:21 1991 Kathy Hargreaves (kathy at hayley) + + * regex.c (remove_intervening_anchors): Avoid erroneously making + ops into no_op's by making them no_op only when they're beglines. + (verify_and_adjust_endlines): Don't make '$' a normal character + if it's before a newline. + Look for the endline op in *p, not p[1]. + (failure_stack_element): Added this declaration. + (failure_stack_type): Added this declaration. + (INIT_FAILURE_STACK_SIZE, FAILURE_STACK_EMPTY, + FAILURE_STACK_PTR_EMPTY, REMAINING_AVAIL_SLOTS): Added for + failure stack. + (FAILURE_ITEM_SIZE, PUSH_FAILURE_POINT): Deleted. + (FREE_VARIABLES): Now free failure_stack.stack instead of stackb. + (re_match_2): deleted variables `initial_stack', `stackb', + `stackp', and `stacke' and added `failure_stack' to replace them. + Replaced calls to PUSH_FAILURE_POINT with those to + push_failure_point. + (push_failure_point): Added for re_match_2. + (pop_failure_point): Rewrote to use a failure_stack_type of stack. + (can_match_nothing): Moved definition to below re_match_2. + (bcmp_translate): Moved definition to below re_match_2. + +Mon Apr 29 14:20:54 1991 Kathy Hargreaves (kathy at hayley) + + * regex.c (enum regexpcode): Added codes endline_before_newline + and repeated_endline_before_newline so could detect these + types of endlines in the intermediate stages of a compiled + pattern. + (INIT_FAILURE_ALLOC): Renamed NFAILURES to this and set it to 5. + (BUF_PUSH): Put `do {...} while 0' around this. + (BUF_PUSH_2): Defined this to cut down on expansion of EXTEND_BUFFER. + (regex_compile): Changed some comments. + Now push endline_before_newline if find a `$' before a newline + in the pattern. + If a `$' might turn into an ordinary character, set laststart + to point to it. + In '^' case, if syntax bit RE_TIGHT_VBAR is set, then for `^' + to be in a leading position, it must be first in the pattern. + Don't have to check in one of the else clauses that it's not set. + If RE_CONTEXTUAL_INDEP_OPS isn't set but RE_ANCHORS_ONLY_AT_ENDS + is, make '^' a normal character if it isn't first in the pattern. + Can only detect at the end if a '$' after an alternation op is a + trailing one, so can't immediately detect empty alternatives + if a '$' follows a vbar. + Added a picture of the ``success jumps'' in alternatives. + Have to set bufp->used before calling verify_and_adjust_endlines. + Also do it before returning all error strings. + (remove_intervening_anchors): Now replaces the anchor with + repeated_endline_before_newline if it's an endline_before_newline. + (verify_and_adjust_endlines): Deleted SYNTAX parameter (could + use bufp's) and added GROUP_FORWARD_MATCH_STATUS so could + detect back references referring to empty groups. + Added variable `bend' to point past the end of the pattern buffer. + Added variable `previous_p' so wouldn't have to reinspect the + pattern buffer to see what op we just looked at. + Added endline_before_newline and repeated_endline_before_newline + cases. + When checking if in a trailing position, added case where '$' + has to be at the pattern's end if either of the syntax bits + RE_ANCHORS_ONLY_AT_ENDS or RE_TIGHT_VBAR are set. + Since `endline' can have the intermediate form `endline_in_repeat', + have to change it to `endline' if RE_REPEATED_ANCHORS_AWAY + isn't set. + Now disallow empty alternatives with trailing endlines in them + if RE_NO_EMPTY_ALTS is set. + Now don't make '$' an ordinary character if it precedes a newline. + Don't make it an ordinary character if it's before a newline. + Back references now affect the level matching something only if + they refer to nonempty groups. + (can_match_nothing): Now increment p1 in the switch, which + changes many of the cases, but makes the code more like what + it was derived from. + Adjust the return statement to reflect above. + (struct register_info): Made `can_match_nothing' field an int + instead of a bit so could have -1 in it if never set. + (MAX_FAILURE_ITEMS): Changed name from MAX_NUM_FAILURE_ITEMS. + (FAILURE_ITEM_SIZE): Defined how much space a failure items uses. + (PUSH_FAILURE_POINT): Changed variable `last_used_reg's name + to `highest_used_reg'. + Added variable `num_stack_items' and changed `len's name to + `stack_length'. + Test failure stack limit in terms of number of items in it, not + in terms of its length. rms' fix tested length against number + of items, which was a misunderstanding. + Use `realloc' instead of `alloca' to extend the failure stack. + Use shifts instead of multiplying by 2. + (FREE_VARIABLES): Free `stackb' instead of `initial_stack', as + might may have been reallocated. + (re_match_2): When mallocing `initial_stack', now multiply + the number of items wanted (what was there before) by + FAILURE_ITEM_SIZE. + (pop_failure_point): Need this procedure form of the macro of + the same name for debugging, so left it in and deleted the + macro. + (recomp): Don't free the pattern buffer's translate field. + +Mon Apr 15 09:47:47 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (RE_DUP_MAX): Moved to outside of #ifdef _POSIX_SOURCE. + * regex.c (#include <sys/types.h>): Removed #ifdef _POSIX_SOURCE + condition. + (malloc, realloc): Made return type void* #ifdef __STDC__. + (enum regexpcode): Added endline_in_repeat for the compiler's + use; this never ends up on the final compiled pattern. + (INIT_PATTERN_OFFSETS_LIST_SIZE): Initial size for + pattern_offsets_list_type. + (pattern_offset_type): Type for pattern offsets. + (pattern_offsets_list_type): Type for keeping a list of + pattern offsets. + (anchor_list_type): Changed to above type. + (PATTERN_OFFSETS_LIST_PTR_FULL): Tests if a pattern offsets + list is full. + (ANCHOR_LIST_PTR_FULL): Changed to above. + (BIT_BLOCK_SIZE): Changed to BITS_BLOCK_SIZE and moved to + above bits list routines below regex_compile. + (op_list_type): Defined to be pattern_offsets_list_type. + (compile_stack_type): Changed offsets to be + pattern_offset_type instead of unsigned. + (pointer): Changed the name of all structure fields from this + to `avail'. + (COMPILE_STACK_FULL): Changed so the stack is full if `avail' + is equal to `size' instead of `size' - 1. + (GET_BUFFER_SPACE): Changed `>=' to `>' in the while statement. + (regex_compile): Added variable `enough_memory' so could check + that routine that verifies '$' positions could return an + allocation error. + (group_count): Deleted this variable, as `regnum' already does + this work. + (op_list): Added this variable to keep track of operations + needed for verifying '$' positions. + (anchor_list): Now initialize using routine + `init_pattern_offsets_list'. + Consolidated the three bits_list initializations. + In case '$': Instead of trying to go past constructs which can + follow '$', merely detect the special case where it has to be + at the pattern's end, fix up any fixup jumps if necessary, + record the anchor if necessary and add an `endline' (and + possibly two `no-op's) to the pattern; will call a routine at + the end to verify if it's in a valid position or not. + (init_pattern_offsets_list): Added to initialize pattern + offsets lists. + (extend_anchor_list): Renamed this extend_pattern_offsets_list + and renamed parameters and internal variables appropriately. + (add_pattern_offset): Added this routine which both + record_anchor_position and add_op call. + (adjust_pattern_offsets_list): Add this routine to adjust by + some increment all the pattern offsets a list of such after a + given position. + (record_anchor_position): Now send in offset instead of + calculating it and just call add_pattern_offset. + (adjust_anchor_list): Replaced by above routine. + (remove_intervening_anchors): If the anchor is an `endline' + then replace it with `endline_in_repeat' instead of `no_op'. + (add_op): Added this routine to call in regex_compile + wherever push something relevant to verifying '$' positions. + (verify_and_adjust_endlines): Added routine to (1) verify that + '$'s in a pattern buffer (represented by `endline') were in + valid positions and (2) whether or not they were anchors. + (BITS_BLOCK_SIZE): Renamed BIT_BLOCK_SIZE and moved to right + above bits list routines. + (BITS_BLOCK): Defines which array element of a bits list the + bit corresponding to a given position is in. + (BITS_MASK): Has a 1 where the bit (in a bit list array element) + for a given position is. + +Mon Apr 1 12:09:06 1991 Kathy Hargreaves (kathy at hayley) + + * regex.c (BIT_BLOCK_SIZE): Defined this for using with + bits_list_type, abstracted from level_list_type so could use + for more things than just the level match status. + (regex_compile): Renamed `level_list' variable to + `level_match_status'. + Added variable `group_match_status' of type bits_list_type. + Kept track of whether or not for all groups any of them + matched other than the empty string, so detect if a back + reference in front of a '^' made it nonleading or not. + Do this by setting a match status bit for all active groups + whenever leave a group that matches other than the empty string. + Could detect which groups are active by going through the + stack each time, but or-ing a bits list of active groups with + a bits list of group match status is faster, so make a bits + list of active groups instead. + Have to check that '^' isn't in a leading position before + going to normal_char. + Whenever set level match status of the current level, also set + the match status of all active groups. + Increase the group count and make that group active whenever + open a group. + When close a group, only set the next level down if the + current level matches other than the empty string, and make + the current group inactive. + At a back reference, only set a level's match status if the + group to which the back reference refers matches other than + the empty string. + (init_bits_list): Added to initialize a bits list. + (get_level_value): Deleted this. (Made into + get_level_match_status.) + (extend_bits_list): Added to extend a bits list. (Made this + from deleted routine `extend_level_list'.) + (get_bit): Added to get a bit value from a bits list. (Made + this from deleted routine `get_level_value'.) + (set_bit_to_value): Added to set a bit in a bits list. (Made + this from deleted routine `set_level_value'.) + (get_level_match_status): Added this to get the match status + of a given level. (Made from get_level_value.) + (set_this_level, set_next_lower_level): Made all routines + which set bits extend the bits list if necessary, thus they + now return an unsigned value to indicate whether or not the + reallocation failed. + (increase_level): No longer extends the level list. + (make_group_active): Added to mark as active a given group in + an active groups list. + (make_group_inactive): Added to mark as inactive a given group + in an active groups list. + (set_match_status_of_active_groups): Added to set the match + status of all currently active groups. + (get_group_match_status): Added to get a given group's match status. + (no_levels_match_anything): Removed the paramenter LEVEL. + (PUSH_FAILURE_POINT): Added rms' bug fix and changed RE_NREGS + to num_internal_regs. + +Sun Mar 31 09:04:30 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (RE_ANCHORS_ONLY_AT_ENDS): Added syntax so could + constrain '^' and '$' to only be anchors if at the beginning + and end of the pattern. + (RE_SYNTAX_POSIX_BASIC): Added the above bit. + + * regex.c (enum regexcode): Changed `unused' to `no_op'. + (this_and_lower_levels_match_nothing): Deleted forward reference. + (regex_compile): case '^': if the syntax bit RE_ANCHORS_ONLY_AT_ENDS + is set, then '^' is only an anchor if at the beginning of the + pattern; only record anchor position if the syntax bit + RE_REPEATED_ANCHORS_AWAY is set; the '^' is a normal char if + the syntax bit RE_ANCHORS_ONLY_AT_END is set and we're not at + the beginning of the pattern (and neither RE_CONTEXTUAL_INDEP_OPS + nor RE_CONTEXTUAL_INDEP_OPS syntax bits are set). + Only adjust the anchor list if the syntax bit + RE_REPEATED_ANCHORS_AWAY is set. + + * regex.c (level_list_type): Use to detect when '^' is + in a leading position. + (regex_compile): Added level_list_type level_list variable in + which we keep track of whether or not a grouping level (in its + current or most recent incarnation) matches anything besides the + empty string. Set the bit for the i-th level when detect it + should match something other than the empty string and the bit + for the (i-1)-th level when leave the i-th group. Clear all + bits for the i-th and higher levels if none of 0--(i - 1)-th's + bits are set when encounter an alternation operator on that + level. If no levels are set when hit a '^', then it is in a + leading position. We keep track of which level we're at by + increasing a variable current_level whenever we encounter an + open-group operator and decreasing it whenever we encounter a + close-group operator. + Have to adjust the anchor list contents whenever insert + something ahead of them (such as on_failure_jump's) in the + pattern. + (adjust_anchor_list): Adjusts the offsets in an anchor list by + a given increment starting at a given start position. + (get_level_value): Returns the bit setting of a given level. + (set_level_value): Sets the bit of a given level to a given value. + (set_this_level): Sets (to 1) the bit of a given level. + (set_next_lower_level): Sets (to 1) the bit of (LEVEL - 1) for a + given LEVEL. + (clear_this_and_higher_levels): Clears the bits for a given + level and any higher levels. + (extend_level_list): Adds sizeof(unsigned) more bits to a level list. + (increase_level): Increases by 1 the value of a given level variable. + (decrease_level): Decreases by 1 the value of a given level variable. + (lower_levels_match_nothing): Checks if any levels lower than + the given one match anything. + (no_levels_match_anything): Checks if any levels match anything. + (re_match_2): At case wordbeg: before looking at d-1, check that + we're not at the string's beginning. + At case wordend: Added some illuminating parentheses. + +Mon Mar 25 13:58:51 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (RE_NO_ANCHOR_AT_NEWLINE): Changed syntax bit name + from RE_ANCHOR_NOT_NEWLINE because an anchor never matches the + newline itself, just the empty string either before or after it. + (RE_REPEATED_ANCHORS_AWAY): Added this syntax bit for ignoring + anchors inside groups which are operated on by repetition + operators. + (RE_DOT_MATCHES_NEWLINE): Added this bit so the match-any-character + operator could match a newline when it's set. + (RE_SYNTAX_POSIX_BASIC): Set RE_DOT_MATCHES_NEWLINE in this. + (RE_SYNTAX_POSIX_EXTENDED): Set RE_DOT_MATCHES_NEWLINE and + RE_REPEATED_ANCHORS_AWAY in this. + (regerror): Changed prototypes to new POSIX spec. + + * regex.c (anchor_list_type): Added so could null out anchors inside + repeated groups. + (ANCHOR_LIST_PTR_FULL): Added for above type. + (compile_stack_element): Changed name from stack_element. + (compile_stack_type): Changed name from compile_stack. + (INIT_COMPILE_STACK_SIZE): Changed name from INIT_STACK_SIZE. + (COMPILE_STACK_EMPTY): Changed name from STACK_EMPTY. + (COMPILE_STACK_FULL): Changed name from STACK_FULL. + (regex_compile): Changed SYNTAX parameter to non-const. + Changed variable name `stack' to `compile_stack'. + If syntax bit RE_REPEATED_ANCHORS_AWAY is set, then naively put + anchors in a list when encounter them and then set them to + `unused' when detect they are within a group operated on by a + repetition operator. Need something more sophisticated than + this, as they should only get set to `unused' if they are in + positions where they would be anchors. Also need a better way to + detect contextually invalid anchors. + Changed some commments. + (is_in_compile_stack): Changed name from `is_in_stack'. + (extend_anchor_list): Added to do anchor stuff. + (record_anchor_position): Added to do anchor stuff. + (remove_intervening_anchors): Added to do anchor stuff. + (re_match_2): Now match a newline with the match-any-character + operator if RE_DOT_MATCHES_NEWLINE is set. + Compacted some code. + (regcomp): Added new POSIX newline information to the header + commment. + If REG_NEWLINE cflag is set, then now unset RE_DOT_MATCHES_NEWLINE + in syntax. + (put_in_buffer): Added to do new POSIX regerror spec. Called + by regerror. + (regerror): Changed to take a pattern buffer, error buffer and + its size, and return type `size_t', the size of the full error + message, and the first ERRBUF_SIZE - 1 characters of the full + error message in the error buffer. + +Wed Feb 27 16:38:33 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (#include <sys/types.h>): Removed this as new POSIX + standard has the user include it. + (RE_SYNTAX_POSIX_BASIC and RE_SYNTAX_POSIX_EXTENDED): Removed + RE_HAT_LISTS_NOT_NEWLINE as new POSIX standard has the cflag + REG_NEWLINE now set this. Similarly, added syntax bit + RE_ANCHOR_NOT_NEWLINE as this is now unset by REG_NEWLINE. + (RE_SYNTAX_POSIX_BASIC): Removed syntax bit + RE_NO_CONSECUTIVE_REPEATS as POSIX now allows them. + + * regex.c (#include <sys/types.h>): Added this as new POSIX + standard has the user include it instead of us putting it in + regex.h. + (extern char *re_syntax_table): Made into an extern so the + user could allocate it. + (DO_RANGE): If don't find a range end, now goto invalid_range_end + instead of unmatched_left_bracket. + (regex_compile): Made variable SYNTAX non-const.???? + Reformatted some code. + (re_compile_fastmap): Moved is_a_succeed_n's declaration to + inner braces. + Compacted some code. + (SET_NEWLINE_FLAG): Removed and put inline. + (regcomp): Made variable `syntax' non-const so can unset + RE_ANCHOR_NOT_NEWLINE syntax bit if cflag RE_NEWLINE is set. + If cflag RE_NEWLINE is set, set the RE_HAT_LISTS_NOT_NEWLINE + syntax bit and unset RE_ANCHOR_NOT_NEWLINE one of `syntax'. + +Wed Feb 20 16:33:38 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (RE_NO_CONSECUTIVE_REPEATS): Changed name from + RE_NO_CONSEC_REPEATS. + (REG_ENESTING): Deleted this POSIX return value, as the stack + is now unbounded. + (struct re_pattern_buffer): Changed some comments. + (re_compile_pattern): Changed a comment. + Deleted check on stack upper bound and corresponding error. + Now when there's no interval contents and it's the end of the + pattern, go to unmatched_left_curly_brace instead of end_of_pattern. + Removed nesting_too_deep error, as the stack is now unbounded. + (regcomp): Removed REG_ENESTING case, as the stack is now unbounded. + (regerror): Removed REG_ENESTING case, as the stack is now unbounded. + + * regex.c (MAX_STACK_SIZE): Deleted because don't need upper + bound on array indexed with an unsigned number. + +Sun Feb 17 15:50:24 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h: Changed and added some comments. + + * regex.c (init_syntax_once): Made `_' a word character. + (re_compile_pattern): Added a comment. + (re_match_2): Redid header comment. + (regexec): With header comment about PMATCH, corrected and + removed details found regex.h, adding a reference. + +Fri Feb 15 09:21:31 1991 Kathy Hargreaves (kathy at hayley) + + * regex.c (DO_RANGE): Removed argument parentheses. + Now get untranslated range start and end characters and set + list bits for the translated (if at all) versions of them and + all characters between them. + (re_match_2): Now use regs->num_regs instead of num_regs_wanted + wherever possible. + (regcomp): Now build case-fold translate table using isupper + and tolower facilities so will work on foreign language characters. + +Sat Feb 9 16:40:03 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (RE_HAT_LISTS_NOT_NEWLINE): Changed syntax bit name + from RE_LISTS_NOT_NEWLINE as it only affects nonmatching lists. + Changed all references to the match-beginning-of-string + operator to match-beginning-of-line operator, as this is what + it does. + (RE_NO_CONSEC_REPEATS): Added this syntax bit. + (RE_SYNTAX_POSIX_BASIC): Added above bit to this. + (REG_PREMATURE_END): Changed name to REG_EEND. + (REG_EXCESS_NESTING): Changed name to REG_ENESTING. + (REG_TOO_BIG): Changed name to REG_ESIZE. + (REG_INVALID_PREV_RE): Deleted this return POSIX value. + Added and changed some comments. + + * regex.c (re_compile_pattern): Now sets the pattern buffer's + `return_default_num_regs' field. + (typedef struct stack_element, stack_type, INIT_STACK_SIZE, + MAX_STACK_SIZE, STACK_EMPTY, STACK_FULL): Added for regex_compile. + (INIT_BUF_SIZE): Changed value from 28 to 32. + (BUF_PUSH): Changed name from BUFPUSH. + (MAX_BUF_SIZE): Added so could use in many places. + (IS_CHAR_CLASS_STRING): Replaced is_char_class with this. + (regex_compile): Added a stack which could grow dynamically + and which has struct elements. + Go back to initializing `zero_times_ok' and `many_time_ok' to + 0 and |=ing them inside the loop. + Now disallow consecutive repetition operators if the syntax + bit RE_NO_CONSEC_REPEATS is set. + Now detect trailing backslash when the compiler is expecting a + `?' or a `+'. + Changed calls to GET_BUFFER_SPACE which asked for 6 to ask for + 3, as that's all they needed. + Now check for trailing backslash inside lists. + Now disallow an empty alternative right before an end-of-line + operator. + Now get buffer space before leaving space for a fixup jump. + Now check if at pattern end when at open-interval operator. + Added some comments. + Now check if non-interval repetition operators follow an + interval one if the syntax bit RE_NO_CONSEC_REPEATS is set. + Now only check if what precedes an interval repetition + operator isn't a regular expression which matches one + character if the syntax bit RE_NO_CONSEC_REPEATS is set. + Now return "Unmatched [ or [^" instead of "Unmatched [". + (is_in_stack): Added to check if a given register number is in + the stack. + (re_match_2): If initial variable allocations fail, return -2, + instead of -1. + Now set reg's `num_regs' field when allocating regs. + Now before allocating them, free regs->start and end if they + aren't NULL and return -2 if either allocation fails. + Now use regs->num_regs instead of num_regs_wanted to control + regs loops. + Now increment past the newline when matching it with an + end-of-line operator. + (recomp): Added to the header comment. + Now return REG_ESUBREG if regex_compile returns "Unmatched [ + or [^" instead of doing so if it returns "Unmatched [". + Now return REG_BADRPT if in addition to returning "Missing + preceding regular expression", regex_compile returns "Invalid + preceding regular expression". + Now return new return value names (see regex.h changes). + (regexec): Added to header comment. + Initialize regs structure. + Now match whole string. + Now always free regs.start and regs.end instead of just when + the string matched. + (regerror): Now return "Regex error: Unmatched [ or [^.\n" + instead of "Regex error: Unmatched [.\n". + Now return "Regex error: Preceding regular expression either + missing or not simple.\n" instead of "Regex error: Missing + preceding regular expression.\n". + Removed REG_INVALID_PREV_RE case (it got subsumed into the + REG_BADRPT case). + +Thu Jan 17 09:52:35 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h: Changed a comment. + + * regex.c: Changed and added large header comments. + (re_compile_pattern): Now if detect that `laststart' for an + interval points to a byte code for a regular expression which + matches more than one character, make it an internal error. + (regerror): Return error message, don't print it. + +Tue Jan 15 15:32:49 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (regcomp return codes): Added GNU ones. + Updated some comments. + + * regex.c (DO_RANGE): Changed `obscure_syntax' to `syntax'. + (regex_compile): Added `following_left_brace' to keep track of + where pseudo interval following a valid interval starts. + Changed some instances that returned "Invalid regular + expression" to instead return error strings coinciding with + POSIX error codes. + Changed some comments. + Now consider only things between `[:' and `:]' to be possible + character class names. + Now a character class expression can't end a pattern; at + least a `]' must close the list. + Now if the syntax bit RE_NO_BK_CURLY_BRACES is set, then a + valid interval must be followed by yet another to get an error + for preceding an interval (in this case, the second one) with + a regular expression that matches more than one character. + Now if what follows a valid interval begins with a open + interval operator but doesn't begin a valid interval, then set + following_left_bracket to it, put it in C and go to + normal_char label. + Added some comments. + Return "Invalid character class name" instead of "Invalid + character class". + (regerror): Return messages for all POSIX error codes except + REG_ECOLLATE and REG_NEWLINE, along with all GNU error codes. + Added `break's after all cases. + (main): Call re_set_syntax instead of setting `obscure_syntax' + directly. + +Sat Jan 12 13:37:59 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (Copyright): Updated date. + (#include <sys/types.h>): Include unconditionally. + (RE_CANNOT_MATCH_NEWLINE): Deleted this syntax bit. + (RE_SYNTAX_POSIX_BASIC, RE_SYNTAX_POSIX_EXTENDED): Removed + setting the RE_ANCHOR_NOT_NEWLINE syntax bit from these. + Changed and added some comments. + (struct re_pattern_buffer): Changed some flags from chars to bits. + Added field `syntax'; holds which syntax pattern was compiled with. + Added bit flag `return_default_num_regs'. + (externs for GNU and Berkeley UNIX routines): Added `const's to + parameter types to be compatible with POSIX. + (#define const): Added to support old C compilers. + + * regex.c (Copyright): Updated date. + (enum regexpcode): Deleted `newline'. + (regex_compile): Renamed re_compile_pattern to this, added a + syntax parameter so it can set the pattern buffer's `syntax' + field. + Made `pattern', and `size' `const's so could pass to POSIX + interface routines; also made `const' whatever interval + variables had to be to make this work. + Changed references to `obscure_syntax' to new parameter `syntax'. + Deleted putting `newline' in buffer when see `\n'. + Consider invalid character classes which have nothing wrong + except the character class name; if so, return character-class error. + (is_char_class): Added routine for regex_compile. + (re_compile_pattern): added a new one which calls + regex_compile with `obscure_syntax' as the actual parameter + for the formal `syntax'. + Gave this the old routine's header comments. + Made `pattern', and `size' `const's so could use POSIX interface + routine parameters. + (re_search, re_search_2, re_match, re_match_2): Changed + `pbufp' to `bufp'. + (re_search_2, re_match_2): Changed `mstop' to `stop'. + (re_search, re_search_2): Made all parameters except `regs' + `const's so could use POSIX interface routines parameters. + (re_search_2): Added private copies of `const' parameters so + could change their values. + (re_match_2): Made all parameters except `regs' `const's so + could use POSIX interface routines parameters. + Changed `size1' and `size2' parameters to `size1_arg' and + `size2_arg' and so could change; added local `size1' and + `size2' and set to these. + Added some comments. + Deleted `newline' case. + `begline' can also possibly match if `d' contains a newline; + if it does, we have to increment d to point past the newline. + Replaced references to `obscure_syntax' with `bufp->syntax'. + (re_comp, re_exec): Made parameter `s' a `const' so could use POSIX + interface routines parameters. + Now call regex_compile, passing `obscure_syntax' via the + `syntax' parameter. + (re_exec): Made local `len' a `const' so could pass to re_search. + (regcomp): Added header comment. + Added local `syntax' to set and pass to regex_compile rather + than setting global `obscure_syntax' and passing it. + Call regex_compile with its `syntax' parameter rather than + re_compile_pattern. + Return REG_ECTYPE if character-class error. + (regexec): Don't initialize `regs' to anything. + Made `private_preg' a nonpointer so could set to what the + constant `preg' points. + Initialize `private_preg's `return_default_num_regs' field to + zero because want to return `nmatch' registers, not however + many there are subexpressions in the pattern. + Also test if `nmatch' > 0 to see if should pass re_match `regs'. + +Tue Jan 8 15:57:17 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (struct re_pattern_buffer): Reworded comment. + + * regex.c (EXTEND_BUFFER): Also reset beg_interval. + (re_search_2): Return val if val = -2. + (NUM_REG_ITEMS): Listed items in comment. + (NUM_OTHER_ITEMS): Defined this for using in > 1 definition. + (MAX_NUM_FAILURE_ITEMS): Replaced `+ 2' with NUM_OTHER_ITEMS. + (NUM_FAILURE_ITEMS): As with definition above and added to + comment. + (PUSH_FAILURE_POINT): Replaced `* 2's with `<< 1's. + (re_match_2): Test with equality with 1 to see pbufp->bol and + pbufp->eol are set. + +Fri Jan 4 15:07:22 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (struct re_pattern_buffer): Reordered some fields. + Updated some comments. + Added not_bol and not_eol fields. + (extern regcomp, regexec, regerror): Added return types. + (extern regfree): Added `extern'. + + * regex.c (min): Deleted unused macro. + (re_match_2): Compacted some code. + Removed call to macro `min' from `for' loop. + Fixed so unused registers get filled with -1's. + Fail if the pattern buffer's `not_bol' field is set and + encounter a `begline'. + Fail if the pattern buffer's `not_eol' field is set and + encounter a `endline'. + Deleted redundant check for empty stack in fail case. + Don't free pattern buffer's components in re_comp. + (regexec): Initialize variable regs. + Added `private_preg' pattern buffer so could set `not_bol' and + `not_eol' fields and hand to re_match. + Deleted naive attempt to detect anchors. + Set private pattern buffer's `not_bol' and `not_eol' fields + according to eflags value. + `nmatch' must also be > 0 for us to bother allocating + registers to send to re_match and filling pmatch + with their results after the call to re_match. + Send private pattern buffer instead of argument to re_match. + If use the registers, always free them and then set them to NULL. + (regerror): Added this Posix routine. + (regfree): Added this Posix routine. + +Tue Jan 1 15:02:45 1991 Kathy Hargreaves (kathy at hayley) + + * regex.h (RE_NREGS): Deleted this definition, as now the user + can choose how many registers to have. + (REG_NOTBOL, REG_NOTEOL): Defined these Posix eflag bits. + (REG_NOMATCH, REG_BADPAT, REG_ECOLLATE, REG_ECTYPE, + REG_EESCAPE, REG_ESUBREG, REG_EBRACK, REG_EPAREN, REG_EBRACE, + REG_BADBR, REG_ERANGE, REG_ESPACE, REG_BADRPT, REG_ENEWLINE): + Defined these return values for Posix's regcomp and regexec. + Updated some comments. + (struct re_pattern_buffer): Now typedef this as regex_t + instead of the other way around. + (struct re_registers): Added num_regs field. Made start and + end fields pointers to char instead of fixed size arrays. + (regmatch_t): Added this Posix register type. + (regcomp, regexec, regerror, regfree): Added externs for these + Posix routines. + + * regex.c (enum boolean): Typedefed this. + (re_pattern_buffer): Reformatted some comments. + (re_compile_pattern): Updated some comments. + Always push start_memory and its attendant number whenever + encounter a group, not just when its number is less than the + previous maximum number of registers; same for stop_memory. + Get 4 bytes of buffer space instead of 2 when pushing a + set_number_at. + (can_match_nothing): Added this to elaborate on and replace + code in re_match_2. + (reg_info_type): Made can_match_nothing field a bit instead of int. + (MIN): Added for re_match_2. + (re_match_2 macros): Changed all `for' loops which used + RE_NREGS to now use num_internal_regs as upper bounds. + (MAX_NUM_FAILURE_ITEMS): Use num_internal_regs instead of RE_NREGS. + (POP_FAILURE_POINT): Added check for empty stack. + (FREE_VARIABLES): Added this to free (and set to NULL) + variables allocated in re_match_2. + (re_match_2): Rearranged parameters to be in order. + Added variables num_regs_wanted (how many registers the user wants) + and num_internal_regs (how many groups there are). + Allocated initial_stack, regstart, regend, old_regstart, + old_regend, reginfo, best_regstart, and best_regend---all + which used to be fixed size arrays. Free them all and return + -1 if any fail. + Free above variables if starting position pos isn't valid. + Changed all `for' loops which used RE_NREGS to now use + num_internal_regs as upper bounds---except for the loops which + fill regs; then use num_regs_wanted. + Allocate regs if the user has passed it and wants more than 0 + registers filled. + Set regs->start[i] and regs->end[i] to -1 if either + regstart[i] or regend[i] equals -1, not just the first. + Free allocated variables before returning. + Updated some comments. + (regcomp): Return REG_ESPACE, REG_BADPAT, REG_EPAREN when + appropriate. + Free translate array. + (regexec): Added this Posix interface routine. + +Mon Dec 24 14:21:13 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h: If _POSIX_SOURCE is defined then #include <sys/types.h>. + Added syntax bit RE_CANNOT_MATCH_NEWLINE. + Defined Posix cflags: REG_EXTENDED, REG_NEWLINE, REG_ICASE, and + REG_NOSUB. + Added fields re_nsub and no_sub to struct re_pattern_buffer. + Typedefed regex_t to be `struct re_pattern_buffer'. + + * regex.c (CHAR_SET_SIZE): Defined this to be 256 and replaced + incidences of this value with this constant. + (re_compile_pattern): Added switch case for `\n' and put + `newline' into the pattern buffer when encounter this. + Increment the pattern_buffer's `re_nsub' field whenever open a + group. + (re_match_2): Match a newline with `newline'---provided the + syntax bit RE_CANNOT_MATCH_NEWLINE isn't set. + (regcomp): Added this Posix interface routine. + (enum test_type): Added interface_test tag. + (main): Added Posix interface test. + +Tue Dec 18 12:58:12 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h (struct re_pattern_buffer): reformatted so would fit + in texinfo documentation. + +Thu Nov 29 15:49:16 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h (RE_NO_EMPTY_ALTS): Added this bit. + (RE_SYNTAX_POSIX_EXTENDED): Added above bit. + + * regex.c (re_compile_pattern): Disallow empty alternatives only + when RE_NO_EMPTY_ALTS is set, not when RE_CONTEXTUAL_INVALID_OPS is. + Changed RE_NO_BK_CURLY_BRACES to RE_NO_BK_PARENS when testing + for empty groups at label handle_open. + At label handle_bar: disallow empty alternatives if RE_NO_EMPTY_ALTS + is set. + Rewrote some comments. + + (re_compile_fastmap): cleaned up code. + + (re_search_2): Rewrote comment. + + (struct register_info): Added field `inner_groups'; it records + which groups are inside of the current one. + Added field can_match_nothing; it's set if the current group + can match nothing. + Added field ever_match_something; it's set if current group + ever matched something. + + (INNER_GROUPS): Added macro to access inner_groups field of + struct register_info. + + (CAN_MATCH_NOTHING): Added macro to access can_match_nothing + field of struct register_info. + + (EVER_MATCHED_SOMETHING): Added macro to access + ever_matched_something field of struct register_info. + + (NOTE_INNER_GROUP): Defined macro to record that a given group + is inside of all currently active groups. + + (re_match_2): Added variables *p1 and mcnt2 (multipurpose). + Added old_regstart and old_regend arrays to hold previous + register values if they need be restored. + Initialize added fields and variables. + case start_memory: Find out if the group can match nothing. + Save previous register values in old_restart and old_regend. + Record that current group is inside of all currently active + groups. + If the group is inside a loop and it ever matched anything, + restore its registers to values before the last failed match. + Restore the registers for the inner groups, too. + case duplicate: Can back reference to a group that never + matched if it can match nothing. + +Thu Nov 29 11:12:54 1990 Karl Berry (karl at hayley) + + * regex.c (bcopy, ...): define these if either _POSIX_SOURCE or + STDC_HEADERS is defined; same for including <stdlib.h>. + +Sat Oct 6 16:04:55 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h (struct re_pattern_buffer): Changed field comments. + + * regex.c (re_compile_pattern): Allow a `$' to precede an + alternation operator (`|' or `\|'). + Disallow `^' and/or `$' in empty groups if the syntax bit + RE_NO_EMPTY_GROUPS is set. + Wait until have parsed a valid `\{...\}' interval expression + before testing RE_CONTEXTUAL_INVALID_OPS to see if it's + invalidated by that. + Don't use RE_NO_BK_CURLY_BRACES to test whether or not a validly + parsed interval expression is invalid if it has no preceding re; + rather, use RE_CONTEXTUAL_INVALID_OPS. + If an interval parses, but there is no preceding regular + expression, yet the syntax bit RE_CONTEXTUAL_INDEP_OPS is set, + then that interval can match the empty regular expression; if + the bit isn't set, then the characters in the interval + expression are parsed as themselves (sans the backslashes). + In unfetch_interval case: Moved PATFETCH to above the test for + RE_NO_BK_CURLY_BRACES being set, which would force a goto + normal_backslash; the code at both normal_backsl and normal_char + expect a character in `c.' + +Sun Sep 30 11:13:48 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h: Changed some comments to use the terms used in the + documentation. + (RE_CONTEXTUAL_INDEP_OPS): Changed name from `RE_CONTEXT_INDEP_OPS'. + (RE_LISTS_NOT_NEWLINE): Changed name from `RE_HAT_NOT_NEWLINE.' + (RE_ANCHOR_NOT_NEWLINE): Added this syntax bit. + (RE_NO_EMPTY_GROUPS): Added this syntax bit. + (RE_NO_HYPHEN_RANGE_END): Deleted this syntax bit. + (RE_SYNTAX_...): Reformatted. + (RE_SYNTAX_POSIX_BASIC, RE_SYNTAX_EXTENDED): Added syntax bits + RE_ANCHOR_NOT_NEWLINE and RE_NO_EMPTY_GROUPS, and deleted + RE_NO_HYPHEN_RANGE_END. + (RE_SYNTAX_POSIX_EXTENDED): Added syntax bit RE_DOT_NOT_NULL. + + * regex.c (bcopy, bcmp, bzero): Define if _POSIX_SOURCE is defined. + (_POSIX_SOURCE): ifdef this, #include <stdlib.h> + (#ifdef emacs): Changed comment of the #endif for the its #else + clause to be `not emacs', not `emacs.' + (no_pop_jump): Changed name from `jump'. + (pop_failure_jump): Changed name from `finalize_jump.' + (maybe_pop_failure_jump): Changed name from `maybe_finalize_jump'. + (no_pop_jump_n): Changed name from `jump_n.' + (EXTEND_BUFFER): Use shift instead of multiplication to double + buf->allocated. + (DO_RANGE, recompile_pattern): Added macro to set the list bits + for a range. + (re_compile_pattern): Fixed grammar problems in some comments. + Checked that RE_NO_BK_VBAR is set to make `$' valid before a `|' + and not set to make it valid before a `\|'. + Checked that RE_NO_BK_PARENS is set to make `$' valid before a ')' + and not set to make it valid before a `\)'. + Disallow ranges starting with `-', unless the range is the + first item in a list, rather than disallowing ranges which end + with `-'. + Disallow empty groups if the syntax bit RE_NO_EMPTY_GROUPS is set. + Disallow nothing preceding `{' and `\{' if they represent the + open-interval operator and RE_CONTEXTUAL_INVALID_OPS is set. + (register_info_type): typedef-ed this using `struct register_info.' + (SET_REGS_MATCHED): Compacted the code. + (re_match_2): Made it fail if back reference a group which we've + never matched. + Made `^' not match a newline if the syntax bit + RE_ANCHOR_NOT_NEWLINE is set. + (really_fail): Added this label so could force a final fail that + would not try to use the failure stack to recover. + +Sat Aug 25 14:23:01 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h (RE_CONTEXTUAL_OPS): Changed name from RE_CONTEXT_OPS. + (global): Rewrote comments and rebroke some syntax #define lines. + + * regex.c (isgraph): Added definition for sequents. + (global): Now refer to character set lists as ``lists.'' + Rewrote comments containing ``\('' or ``\)'' to now refer to + ``groups.'' + (RE_CONTEXTUAL_OPS): Changed name from RE_CONTEXT_OPS. + + (re_compile_pattern): Expanded header comment. + +Sun Jul 15 14:50:25 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h (RE_CONTEX_INDEP_OPS): the comment's sense got turned + around when we changed how it read; changed it to be correct. + +Sat Jul 14 16:38:06 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h (RE_NO_EMPTY_BK_REF): changed name to + RE_NO_MISSING_BK_REF, as this describes it better. + + * regex.c (re_compile_pattern): changed RE_NO_EMPTY_BK_REF + to RE_NO_MISSING_BK_REF, as above. + +Thu Jul 12 11:45:05 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h (RE_NO_EMPTY_BRACKETS): removed this syntax bit, as + bracket expressions should *never* be empty regardless of the + syntax. Removes this bit from RE_SYNTAX_POSIX_BASIC and + RE_SYNTAX_POSIX_EXTENDED. + + * regex.c (SET_LIST_BIT): in the comment, now refer to character + sets as (non)matching sets, as bracket expressions can now match + other things in addition to characters. + (re_compile_pattern): refer to groups as such instead of `\(...\)' + or somesuch, because groups can now be enclosed in either plain + parens or backslashed ones, depending on the syntax. + In the '[' case, added a boolean just_had_a_char_class to detect + whether or not a character class begins a range (which is invalid). + Restore way of breaking out of a bracket expression to original way. + Add way to detect a range if the last thing in a bracket + expression was a character class. + Took out check for c != ']' at the end of a character class in + the else clause, as it had already been checked in the if part + that also checked the validity of the string. + Set or clear just_had_a_char_class as appropriate. + Added some comments. Changed references to character sets to + ``(non)matching lists.'' + +Sun Jul 1 12:11:29 1990 Karl Berry (karl at hayley) + + * regex.h (BYTEWIDTH): moved back to regex.c. + + * regex.h (re_compile_fastmap): removed declaration; this + shouldn't be advertised. + +Mon May 28 15:27:53 1990 Kathy Hargreaves (kathy at hayley) + + * regex.c (ifndef Sword): Made comments more specific. + (global): include <stdio.h> so can write fatal messages on + standard error. Replaced calls to assert with fprintfs to + stderr and exit (1)'s. + (PREFETCH): Reformatted to make more readable. + (AT_STRINGS_BEG): Defined to test if we're at the beginning of + the virtual concatenation of string1 and string2. + (AT_STRINGS_END): Defined to test if at the end of the virtual + concatenation of string1 and string2. + (AT_WORD_BOUNDARY): Defined to test if are at a word boundary. + (IS_A_LETTER(d)): Defined to test if the contents of the pointer D + is a letter. + (re_match_2): Rewrote the wordbound, notwordbound, wordbeg, wordend, + begbuf, and endbuf cases in terms of the above four new macros. + Called SET_REGS_MATCHED in the matchsyntax, matchnotsyntax, + wordchar, and notwordchar cases. + +Mon May 14 14:49:13 1990 Kathy Hargreaves (kathy at hayley) + + * regex.c (re_search_2): Fixed RANGE to not ever take STARTPOS + outside of virtual concatenation of STRING1 and STRING2. + Updated header comment as to this. + (re_match_2): Clarified comment about MSTOP in header. + +Sat May 12 15:39:00 1990 Kathy Hargreaves (kathy at hayley) + + * regex.c (re_search_2): Checked for out-of-range STARTPOS. + Added comments. + When searching backwards, not only get the character with which + to compare to the fastmap from string2 if the starting position + >= size1, but also if size1 is zero; this is so won't get a + segmentation fault if string1 is null. + Reformatted code at label advance. + +Thu Apr 12 20:26:21 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h: Added #pragma once and #ifdef...endif __REGEXP_LIBRARY. + (RE_EXACTN_VALUE): Added for search.c to use. + Reworded some comments. + + regex.c: Punctuated some comments correctly. + (NULL): Removed this. + (RE_EXACTN_VALUE): Added for search.c to use. + (<ctype.h>): Moved this include to top of file. + (<assert.h>): Added this include. + (struct regexpcode): Assigned 0 to unused and 1 to exactn + because of RE_EXACTN_VALUE. + Added comment. + (various macros): Lined up backslashes near end of line. + (insert_jump): Cleaned up the header comment. + (re_search): Corrected the header comment. + (re_search_2): Cleaned up and completed the header comment. + (re_max_failures): Updated comment. + (struct register_info): Constructed as bits so as to save space + on the stack when pushing register information. + (IS_ACTIVE): Macro for struct register_info. + (MATCHED_SOMETHING): Macro for struct register_info. + (NUM_REG_ITEMS): How many register information items for each + register we have to push on the stack at each failure. + (MAX_NUM_FAILURE_ITEMS): If push all the registers on failure, + this is how many items we push on the stack. + (PUSH_FAILURE_POINT): Now pushes whether or not the register is + currently active, and whether or not it matched something. + Checks that there's enough space allocated to accomodate all the + items we currently want to push. (Before, a test for an empty + stack sufficed because we always pushed and popped the same + number of items). + Replaced ``2'' with MAX_NUM_FAILURE_POINTS when ``2'' refers + to how many things get pushed on the stack each time. + When copy the stack into the newly allocated storage, now only copy + the area in use. + Clarified comment. + (POP_FAILURE_POINT): Defined to use in places where put number + of registers on the stack into a variable before using it to + decrement the stack, so as to not confuse the compiler. + (IS_IN_FIRST_STRING): Defined to check if a pointer points into + the first string. + (SET_REGS_MATCHED): Changed to use the struct register_info + bits; also set the matched-something bit to false if the + register isn't currently active. (This is a redundant setting.) + (re_match_2): Cleaned up and completed the header comment. + Updated the failure stack comment. + Replaced the ``2'' with MAX_NUM_FAILURE_ITEMS in the static + allocation of initial_stack, because now more than two (now up + to MAX_FAILURE_ITEMS) items get pushed on the failure stack each + time. + Ditto for stackb. + Trashed restart_seg1, regend_seg1, best_regstart_seg1, and + best_regend_seg1 because they could have erroneous information + in them, such as when matching ``a'' (in string1) and ``ab'' (in + string2) with ``(a)*ab''; before using IS_IN_FIRST_STRING to see + whether or not the register starts or ends in string1, + regstart[1] pointed past the end of string1, yet regstart_seg1 + was 0! + Added variable reg_info of type struct register_info to keep + track of currently active registers and whether or not they + currently match anything. + Commented best_regs_set. + Trashed reg_active and reg_matched_something and put the + information they held into reg_info; saves space on the stack. + Replaced NULL with '\000'. + In begline case, compacted the code. + Used assert to exit if had an internal error. + In begbuf case, because now force the string we're working on + into string2 if there aren't two strings, now allow d == string2 + if there is no string1 (and the check for that is size1 == 0!); + also now succeeds if there aren't any strings at all. + (main, ifdef canned): Put test type into a variable so could + change it while debugging. + +Sat Mar 24 12:24:13 1990 Kathy Hargreaves (kathy at hayley) + + * regex.c (GET_UNSIGNED_NUMBER): Deleted references to num_fetches. + (re_compile_pattern): Deleted num_fetches because could keep + track of the number of fetches done by saving a pointer into the + pattern. + Added variable beg_interval to be used as a pointer, as above. + Assert that beg_interval points to something when it's used as above. + Initialize succeed_n's to lower_bound because re_compile_fastmap + needs to know it. + (re_compile_fastmap): Deleted unnecessary variable is_a_jump_n. + Added comment. + (re_match_2): Put number of registers on the stack into a + variable before using it to decrement the stack, so as to not + confuse the compiler. + Updated comments. + Used error routine instead of printf and exit. + In exactn case, restored longer code from ``original'' regex.c + which doesn't test translate inside a loop. + + * regex.h: Moved #define NULL and the enum regexpcode definition + and to regex.c. Changed some comments. + + regex.c (global): Updated comments about compiling and for the + re_compile_pattern jump routines. + Added #define NULL and the enum regexpcode definition (from + regex.h). + (enum regexpcode): Added set_number_at to reset the n's of + succeed_n's and jump_n's. + (re_set_syntax): Updated its comment. + (re_compile_pattern): Moved its heading comment to after its macros. + Moved its include statement to the top of the file. + Commented or added to comments of its macros. + In start_memory case: Push laststart value before adding + start_memory and its register number to the buffer, as they + might not get added. + Added code to put a set_number_at before each succeed_n and one + after each jump_n; rewrote code in what seemed a more + straightforward manner to put all these things in the pattern so + the succeed_n's would correctly jump to the set_number_at's of + the matching jump_n's, and so the jump_n's would correctly jump + to after the set_number_at's of the matching succeed_n's. + Initialize succeed_n n's to -1. + (insert_op_2): Added this to insert an operation followed by + two integers. + (re_compile_fastmap): Added set_number_at case. + (re_match_2): Moved heading comment to after macros. + Added mention of REGS to heading comment. + No longer turn a succeed_n with n = 0 into an on_failure_jump, + because n needs to be reset each time through a loop. + Check to see if a succeed_n's n is set by its set_number_at. + Added set_number_at case. + Updated some comments. + (main): Added another main to run posix tests, which is compiled + ifdef both test and canned. (Old main is still compiled ifdef + test only). + +Tue Mar 19 09:22:55 1990 Kathy Hargreaves (kathy at hayley) + + * regex.[hc]: Change all instances of the word ``legal'' to + ``valid'' and all instances of ``illegal'' to ``invalid.'' + +Sun Mar 4 12:11:31 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h: Added syntax bit RE_NO_EMPTY_RANGES which is set if + an ending range point has to collate higher or equal to the + starting range point. + Added syntax bit RE_NO_HYPHEN_RANGE_END which is set if a hyphen + can't be an ending range point. + Set to two above bits in RE_SYNTAX_POSIX_BASIC and + RE_SYNTAX_POSIX_EXTENDED. + + regex.c: (re_compile_pattern): Don't allow empty ranges if the + RE_NO_EMPTY_RANGES syntax bit is set. + Don't let a hyphen be a range end if the RE_NO_HYPHEN_RANGE_END + syntax bit is set. + (ESTACK_PUSH_2): renamed this PUSH_FAILURE_POINT and made it + push all the used registers on the stack, as well as the number + of the highest numbered register used, and (as before) the two + failure points. + (re_match_2): Fixed up comments. + Added arrays best_regstart[], best_regstart_seg1[], best_regend[], + and best_regend_seg1[] to keep track of the best match so far + whenever reach the end of the pattern but not the end of the + string, and there are still failure points on the stack with + which to backtrack; if so, do the saving and force a fail. + If reach the end of the pattern but not the end of the string, + but there are no more failure points to try, restore the best + match so far, set the registers and return. + Compacted some code. + In stop_memory case, if the subexpression we've just left is in + a loop, push onto the stack the loop's on_failure_jump failure + point along with the current pointer into the string (d). + In finalize_jump case, in addition to popping the failure + points, pop the saved registers. + In the fail case, restore the registers, as well as the failure + points. + +Sun Feb 18 15:08:10 1990 Kathy Hargreaves (kathy at hayley) + + * regex.c: (global): Defined a macro GET_BUFFER_SPACE which + makes sure you have a specified number of buffer bytes + allocated. + Redefined the macro BUFPUSH to use this. + Added comments. + + (re_compile_pattern): Call GET_BUFFER_SPACE before storing or + inserting any jumps. + + (re_match_2): Set d to string1 + pos and dend to end_match_1 + only if string1 isn't null. + Force exit from a loop if it's around empty parentheses. + In stop_memory case, if found some jumps, increment p2 before + extracting address to which to jump. Also, don't need to know + how many more times can jump_n. + In begline case, d must equal string1 or string2, in that order, + only if they are not null. + In maybe_finalize_jump case, skip over start_memorys' and + stop_memorys' register numbers, too. + +Thu Feb 15 15:53:55 1990 Kathy Hargreaves (kathy at hayley) + + * regex.c (BUFPUSH): off by one goof in deciding whether to + EXTEND_BUFFER. + +Wed Jan 24 17:07:46 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h: Moved definition of NULL to here. + Got rid of ``In other words...'' comment. + Added to some comments. + + regex.c: (re_compile_pattern): Tried to bulletproof some code, + i.e., checked if backward references (e.g., p[-1]) were within + the range of pattern. + + (re_compile_fastmap): Fixed a bug in succeed_n part where was + getting the amount to jump instead of how many times to jump. + + (re_search_2): Changed the name of the variable ``total'' to + ``total_size.'' + Condensed some code. + + (re_match_2): Moved the comment about duplicate from above the + start_memory case to above duplicate case. + + (global): Rewrote some comments. + Added commandline arguments to testing. + +Wed Jan 17 11:47:27 1990 Kathy Hargreaves (kathy at hayley) + + * regex.c: (global): Defined a macro STORE_NUMBER which stores a + number into two contiguous bytes. Also defined STORE_NUMBER_AND_INCR + which does the same thing and then increments the pointer to the + storage place to point after the number. + Defined a macro EXTRACT_NUMBER which extracts a number from two + continguous bytes. Also defined EXTRACT_NUMBER_AND_INCR which + does the same thing and then increments the pointer to the + source to point to after where the number was. + +Tue Jan 16 12:09:19 1990 Kathy Hargreaves (kathy at hayley) + + * regex.h: Incorporated rms' changes. + Defined RE_NO_BK_REFS syntax bit which is set when want to + interpret back reference patterns as literals. + Defined RE_NO_EMPTY_BRACKETS syntax bit which is set when want + empty bracket expressions to be illegal. + Defined RE_CONTEXTUAL_ILLEGAL_OPS syntax bit which is set when want + it to be illegal for *, +, ? and { to be first in an re or come + immediately after a | or a (, and for ^ not to appear in a + nonleading position and $ in a nontrailing position (outside of + bracket expressions, that is). + Defined RE_LIMITED_OPS syntax bit which is set when want +, ? + and | to always be literals instead of ops. + Fixed up the Posix syntax. + Changed the syntax bit comments from saying, e.g., ``0 means...'' + to ``If this bit is set, it means...''. + Changed the syntax bit defines to use shifts instead of integers. + + * regex.c: (global): Incorporated rms' changes. + + (re_compile_pattern): Incorporated rms' changes + Made it illegal for a $ to appear anywhere but inside a bracket + expression or at the end of an re when RE_CONTEXTUAL_ILLEGAL_OPS + is set. Made the same hold for $ except it has to be at the + beginning of an re instead of the end. + Made the re "[]" illegal if RE_NO_EMPTY_BRACKETS is set. + Made it illegal for | to be first or last in an re, or immediately + follow another | or a (. + Added and embellished some comments. + Allowed \{ to be interpreted as a literal if RE_NO_BK_CURLY_BRACES + is set. + Made it illegal for *, +, ?, and { to appear first in an re, or + immediately follow a | or a ( when RE_CONTEXTUAL_ILLEGAL_OPS is set. + Made back references interpreted as literals if RE_NO_BK_REFS is set. + Made recursive intervals either illegal (if RE_NO_BK_CURLY_BRACES + isn't set) or interpreted as literals (if is set), if RE_INTERVALS + is set. + Made it treat +, ? and | as literals if RE_LIMITED_OPS is set. + Cleaned up some code. + +Thu Dec 21 15:31:32 1989 Kathy Hargreaves (kathy at hayley) + + * regex.c: (global): Moved RE_DUP_MAX to regex.h and made it + equal 2^15 - 1 instead of 1000. + Defined NULL to be zero. + Moved the definition of BYTEWIDTH to regex.h. + Made the global variable obscure_syntax nonstatic so the tests in + another file could use it. + + (re_compile_pattern): Defined a maximum length (CHAR_CLASS_MAX_LENGTH) + for character class strings (i.e., what's between the [: and the + :]'s). + Defined a macro SET_LIST_BIT(c) which sets the bit for C in a + character set list. + Took out comments that EXTEND_BUFFER clobbers C. + Made the string "^" match itself, if not RE_CONTEXT_IND_OPS. + Added character classes to bracket expressions. + Change the laststart pointer saved with the start of each + subexpression to point to start_memory instead of after the + following register number. This is because the subexpression + might be in a loop. + Added comments and compacted some code. + Made intervals only work if preceded by an re matching a single + character or a subexpression. + Made back references to nonexistent subexpressions illegal if + using POSIX syntax. + Made intervals work on the last preceding character of a + concatenation of characters, e.g., ab{0,} matches abbb, not abab. + Moved macro PREFETCH to outside the routine. + + (re_compile_fastmap): Added succeed_n to work analogously to + on_failure_jump if n is zero and jump_n to work analogously to + the other backward jumps. + + (re_match_2): Defined macro SET_REGS_MATCHED to set which + current subexpressions had matches within them. + Changed some comments. + Added reg_active and reg_matched_something arrays to keep track + of in which subexpressions currently have matched something. + Defined MATCHING_IN_FIRST_STRING and replaced ``dend == end_match_1'' + with it to make code easier to understand. + Fixed so can apply * and intervals to arbitrarily nested + subexpressions. (Lots of previous bugs here.) + Changed so won't match a newline if syntax bit RE_DOT_NOT_NULL is set. + Made the upcase array nonstatic so the testing file could use it also. + + (main.c): Moved the tests out to another file. + + (tests.c): Moved all the testing stuff here. + +Sat Nov 18 19:30:30 1989 Kathy Hargreaves (kathy at hayley) + + * regex.c: (re_compile_pattern): Defined RE_DUP_MAX, the maximum + number of times an interval can match a pattern. + Added macro GET_UNSIGNED_NUMBER (used to get below): + Added variables lower_bound and upper_bound for upper and lower + bounds of intervals. + Added variable num_fetches so intervals could do backtracking. + Added code to handle '{' and "\{" and intervals. + Added to comments. + + (store_jump_n): (Added) Stores a jump with a number following the + relative address (for intervals). + + (insert_jump_n): (Added) Inserts a jump_n. + + (re_match_2): Defined a macro ESTACK_PUSH_2 for the error stack; + it checks for overflow and reallocates if necessary. + + * regex.h: Added bits (RE_INTERVALS and RE_NO_BK_CURLY_BRACES) + to obscure syntax to indicate whether or not + a syntax handles intervals and recognizes either \{ and + \} or { and } as operators. Also added two syntaxes + RE_SYNTAX_POSIX_BASIC and RE_POSIX_EXTENDED and two command codes + to the enumeration regexpcode; they are succeed_n and jump_n. + +Sat Nov 18 19:30:30 1989 Kathy Hargreaves (kathy at hayley) + + * regex.c: (re_compile_pattern): Defined INIT_BUFF_SIZE to get rid + of repeated constants in code. Tested with value 1. + Renamed PATPUSH as BUFPUSH, since it pushes things onto the + buffer, not the pattern. Also made this macro extend the buffer + if it's full (so could do the following): + Took out code at top of loop that checks to see if buffer is going + to be full after 10 additions (and reallocates if necessary). + + (insert_jump): Rearranged declaration lines so comments would read + better. + + (re_match_2): Compacted exactn code and added more comments. + + (main): Defined macros TEST_MATCH and MATCH_SELF to do + testing; took out loop so could use these instead. + +Tue Oct 24 20:57:18 1989 Kathy Hargreaves (kathy at hayley) + + * regex.c (re_set_syntax): Gave argument `syntax' a type. + (store_jump, insert_jump): made them void functions. + +Local Variables: +mode: indented-text +left-margin: 8 +version-control: never +End: |