summaryrefslogtreecommitdiff
path: root/src/dfasearch.c
Commit message (Collapse)AuthorAgeFilesLines
...
* grep: adjust timing back to kwset when dfaisfast is trueNorihiro Tanaka2014-04-301-1/+29
| | | | | | | | | | * src/dfasearch.c (EGexecute): If DFA fails after kwset succeeds, the code doesn't return to kwset until it reaches the end of the buffer or finds a match. Because of this, although some cases speed up, others slow down. Adjust the heuristic for switching to the DFA, so that it is more likely to switch at the right times.
* grep: simplify supersetNorihiro Tanaka2014-04-301-47/+59
| | | | | | | | | * src/dfa.h (dfahint): Remove decl. (dfasuperset): New decl. * src/dfa.c (dfahint): Remove. (dfassbuild): Rename from dfasuperset. (dfasuperset): New function. It returns the superset of D. * src/dfasearch.c: Use dfasuperset instead of dfahint, and simplify.
* dfa: fix index bug in previous patch, and simplifyPaul Eggert2014-04-261-14/+10
| | | | | | | | | * src/dfa.c, src/dfa.h (dfaisfast): Arg is const pointer. * src/dfa.c (dfaisfast): Simplify, since supersets never contain BACKREF. * src/dfa.h (dfaisfast): Declare to be pure. * src/dfasearch.c (EGexecute): Fix typo that could cause buffer read overrun when !dfafast. Hoist duplicate computation out of an if's then and else parts.
* grep: speed up for a case to repeat failure in DFA after success in kwsetNorihiro Tanaka2014-04-261-5/+10
| | | | | | | | | | | | | A DFA is typically much faster if it is unibyte and does not set BACKREF. Skip kwset if the DFA is fast. For example: yes abcdabc | head -50000000 >k env LC_ALL=C time -p src/grep -i 'abcd.bd' k This improved real-time from 4.86 to 1.34 s. * src/dfa.c, src/dfa.h (dfaisfast): New function. * src/dfasearch.c (EGexecute): Use it.
* dfa: minor improvements to previous patchPaul Eggert2014-04-211-12/+8
| | | | | | | | | | * src/dfa.c (dfamust): Use &=, not if-then. * src/dfa.h (struct dfamust): * src/dfasearch.c (begline, hwsmusts): Use bool for boolean. * src/dfasearch.c (kwsmusts): * src/kwsearch.c (Fcompile): Prefer decls after statements. * src/dfasearch.c (kwsmusts): Avoid conditional branch. * src/kwsearch.c (Fcompile): Unify the two calls to kwsincr.
* grep: speed-up for exact matching with begline and endline constraints.Norihiro Tanaka2014-04-211-5/+24
| | | | | | | | | | | | | | | | | | | | dfamust turns on the flag when a state exactly matches the proposed one. However, when the state has begline and/or endline constraints, turns off it. This patch enables to match a state exactly, even if the state has begline and/or endline constraints. If a exact string has one of their constrations, the string adding eolbyte to a head and/or foot is pushed to kwsincr(). In addition, if it has begline constration, start searching from just before the position of the text. * src/dfa.c (variable must): New members `begline' and `endline'. (dfamust): Consideration of begline and endline constrations. * src/dfa.h (struct dfamust): New members `begline' and `endline'. * src/dfasearch.c (kwsmusts): If a exact string has begline constration, start searching from just before the position of the text. (EGexecute): Same as above. * src/kwsearch.c (Fexecute): Same as above.
* grep: cleanup for empty-string fixPaul Eggert2014-04-111-22/+10
| | | | | | | | * NEWS: Document it. * src/dfasearch.c (GEAcompile): * src/kwsearch.c (Fcompile): Use C99-style decls to simplify. Avoid duplicate code. * tests/empty-line: Add some more tests like this.
* grep: no match for the empty string included in multiple patternsNorihiro Tanaka2014-04-111-0/+11
| | | | | * src/dfasearch.c (EGAcompile): Fix it. * src/kwsearch.c (Fcompile): Fix it.
* grep: remove trival_case_ignorePaul Eggert2014-04-071-6/+0
| | | | | | | | | | | | | | | | | | This optimization is no longer needed, given the other optimizations recently installed. Derived from a patch by Norihiro Tanaka; see <http://bugs.gnu.org/17019>. * bootstrap.conf (gnulib_modules): Remove assert-h. * src/dfa.c (CASE_FOLDED_BUFSIZE): Move here from dfa.h. Remove now-unnecessary static assert. (case_folded_counterparts): Now static. * src/dfa.h (CASE_FOLDED_BUFSIZE, case_folded_counterparts): Remove decls; no longer public. * src/dfasearch.c (kwsmusts): Use kwset even if fill MB_CUR_MAX > 1 and case-insensitive. * src/grep.c (MBRTOWC, WCRTOMB): Remove. (fgrep_to_grep_pattern): Use mbrtowc, not MBRTOWC. (trivial_case_ignore): Remove; this optimization is no longer needed. All uses removed.
* grep: simplify memory allocation in kwsetPaul Eggert2014-04-071-7/+3
| | | | | | | | | | | | | | | * src/kwset.c: Include kwset.h first, to check its prereqs. Include xalloc.h, for xmalloc. (kwsalloc): Use xmalloc, not malloc, so that the caller need not worry about memory allocation failure. (kwsalloc, kwsincr, kwsprep): Do not worry about obstack_alloc returning NULL, as that's not possible. (kwsalloc, kwsincr, kwsprep, bmexec, cwexec, kwsexec, kwsfree): Omit unnecessary conversion between struct kwset * and kwset_t. (kwsincr, kwsprep): Return void since memory-allocation failure is not possible now. All uses changed. * src/kwset.h: Include <stddef.h>, for size_t, so that this include file doesn't require other files to be included first.
* grep: cleanup DFA superset optimizationPaul Eggert2014-04-061-22/+15
| | | | | | | | | | | | | | | | | | * src/dfa.c (dfa_charclass_index): New function, with body of old dfa_charclass but with an extra parameter D. (charclass_index): Reimplement in terms of dfa_charclass_index. (dfahint): Clarify. (dfasuperset): Do not assign to 'dfa' static variable. Instead, use a local, and use the new dfa_charclass_index function. This doesn't fix any bugs, but it's clearer. Initialize a few more members, to simplify dfafree. Copy the charclasses with just one memcpy call. Don't assign nonnull to D->superset until it's known to be valid; that's simpler. (dfafree, dfaalloc): Simplify based on dfasuperset initializations. * src/dfa.h (dfahint): Add comment. * src/dfasearch.c (EGexecute): Simplify use of memchr. Simplify by using memrchr. Fix typo that could cause a buffer read overrun.
* grep: optimization with the superset of DFANorihiro Tanaka2014-04-061-14/+49
| | | | | | | | | | | | | | | | | | | | The superset of a DFA is like the DFA, except that for speed ANYCHAR, MBCSET and BACKREF are replaced by (CSET full bits) STAR, and mb_cur_max is 1. For example, for 'a\(b\)c\1': original: a b CAT c CAT BACKREF CAT superset: a b CAT c CAT CSET STAR CAT (The CSET has all bits set.) If a string matches a DFA, it matches the DFA's superset. Using the superset to filter can dramatically improve performance, over 200x in some cases. See <http://bugs.gnu.org/16966>. * src/dfa.c (struct dfa): New member 'superset'. (dfahint, dfasuperset): New functions. (dfacomp): Create and analyze the superset. (dfafree): Free only non-NULL items. (dfaalloc): Initialize superset member. (dfaoptimize): If succeed in optimization for UTF-8 locale, don't use the superset. * src/dfa.h (dfahint): New decl. * src/dfasearch.c (EGexecute): Use dfahint.
* grep: fix performance bug with regex in line-by-line modeNorihiro Tanaka2014-04-051-16/+18
| | | | * src/dfasearch.c (EGexecute): Match line-by-line with regex.
* grep: simplify dfa.c by having it not include mbsupport.h directlyPaul Eggert2014-04-051-3/+0
| | | | | | | | | | | | | | | | | | | * src/mbsupport.h: Remove. * src/Makefile.am (noinst_HEADERS): Remove mbsupport.h. * src/dfa.c, src/grep.c, src/search.h: Don't include mbsupport.h. * src/dfa.c: Include wchar.h and wctype.h unconditionally, as this simplifies the use of dfa.c in grep, and it does no harm in gawk. (setlocale, static_assert): Remove gawk-specific hacks, as gawk now does these itself. (struct dfa, dfambcache, mbs_to_wchar) (is_valid_unibyte_character, setbit_wc, using_utf8, FETCH_WC) (addtok_wc, add_utf8_anychar, atom, state_index, epsclosure) (dfaanalyze, dfastate, prepare_wc_buf, dfaoptimize, dfafree, dfamust): * src/dfasearch.c (EGexecute): * src/grep.c (main): * src/searchutils.c (mbtoupper): Assume MBS_SUPPORT.
* grep: perform the kwset-helping DFA match in narrower rangeNorihiro Tanaka2014-03-271-1/+7
| | | | | | | | | | | When kwsexec gives us the offset of a potential match, we compute line begin/end and then run the DFA matcher to see if there really is a match on that line. When the beginning of the line, BEG, is not on a multibyte character boundary, advance BEG until it on such a boundary, before running the DFA search. * src/dfasearch.c (EGexecute): As above. Add a comment. * tests/euc-mb: Add a test case that exercises this code. This addresses http://debbugs.gnu.org/17095.
* maint: use to_uchar function rather than explicit castsJim Meyering2014-02-011-2/+2
| | | | | | | | | * src/system.h (to_uchar): Define function. * src/kwsearch.c (Fexecute): Use to_uchar twice in place of casts. * src/dfasearch.c (EGexecute): Likewise. * src/main.c (prepend_args): Likewise. * src/kwset.c (U): Define in terms of to_uchar. * src/dfa.c (match_mb_charset): Use to_uchar, not an explicit cast.
* maint: move two local variable declarationsJim Meyering2014-01-261-4/+2
| | | | | * src/dfasearch.c (kwsmusts): Move one declaration down to the point of definition. Move another into the sole scope where it is used.
* dfasearch: skip kwset optimization when multi-byte+case-insensitiveNorihiro Tanaka2014-01-261-36/+15
| | | | | | | | | | | | | | | | | Now that DFA searching works with multi-byte locales, the only remaining reason to case-convert the searched input is the kwset optimization. But multi-byte case-conversion is so expensive that it's not worthwhile even to attempt that optimization. * src/dfasearch.c (kwsmusts): Skip this function in ignore-case mode when the locale is multi-byte. (EGexecute): Now that this code need not handle multi-byte case-ignoring matches, remove the expensive copy/case-conversion code. With no case-converted buffer, there is no longer any need to call mb_case_map_apply, so remove it and associated code. (kwsincr_case): Remove function. Now, every use of this function is equivalent to a use of kwsincr. Replace all uses. * tests/turkish-eyes: Test all of -E, -F and -G.
* maint: update copyright dates for 2014Jim Meyering2014-01-011-1/+1
| | | | Do that by running "make update-copyright".
* maint: update all copyright year number rangesJim Meyering2013-01-041-1/+1
| | | | Run "make update-copyright".
* maint: placate gcc's -Wjump-misses-init warningJim Meyering2012-10-031-3/+1
| | | | | | | * src/kwsearch.c (Fexecute): Replace a "goto" and "return" with a simple return statement, eliminating the label, since that was the sole use. * src/dfasearch.c (EGexecute): Likewise.
* grep -i '^$' in a multi-byte locale could report a false matchJim Meyering2012-08-071-1/+3
| | | | | | | | | | | | | * src/dfasearch.c (EGexecute): Do not match the sentinel "newline" that is appended to each buffer. This bug may sound like a big deal (it certainly surprised me), but realize that only the empty-line-matching regular expression '^$' can trigger it, and then only when you add the unnecessary (and arguably superfluous) -i, *and* run the command in a multi-byte locale. Using a multi-byte locale for such a regular expression is also pointless, and hurts performance. * NEWS (Bug fixes): Mention it. Reported by Alexander Katassonov <katasso@gmx.de>
* grep: fix ptrdiff/size_t clashPaul Eggert2012-07-191-6/+10
| | | | | | | Reported by Jaroslav Škarvada in <http://savannah.gnu.org/bugs/?36883>. * src/dfasearch.c (EGexecute): Use size_t, not ptrdiff_t, for lengths. Use regoff_t to store re_match's output, and test it before converting it to size_t.
* grep -i: work also when converting to lower-case inflates byte countJim Meyering2012-06-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit v2.12-16-g7aa698d addressed the case in which the lower-case representation of an input byte occupies fewer bytes than the original. However, even with commit v2.12-20-g074842d, grep -i would still misbehave when converting a character to lower-case increased its byte count. The map-manipulation code assumed that the case conversion could only shrink the byte count. With the consideration that it may also inflate it, the deltas recorded in the map array must be signed, and we must account for the one-to-two-or-more mapping when the original-to-lower-case conversion causes the byte count to increase. * src/searchutils.c (mbtolower): When a lower-case character occupies more than one byte, set its remaining map slots to zero. Change the type of the map to be signed, and compute the change in character byte count as new_length - old_length. * src/search.h: Include <stdint.h>, for decl of intmax_t. (mb_case_map_apply): Adjust for signed increments: each map entry is now signed. (mb_len_map_t): Define type. Thanks to Paul Eggert for noticing in review that using a bare "char" as the base type would be wrong on systems for which it is a signed type (as with gcc's -funsigned-char). * src/kwsearch.c (Fcompile, Fexecute): Likewise. * src/dfasearch.c (kwsincr_case, EGexecute): Likewise. * tests/turkish-I-without-dot: New test. Thanks to Paolo Bonzini for the tip that in the tr_TR.utf8 locale, mapping "I" to lower case increases the character's byte count. * tests/Makefile.am (TESTS): Add it. * tests/init.cfg (require_tr_utf8_locale_): New function. * NEWS (Bug fixes): Expand the existing entry.
* grep: fix how -i works with a match containing the Turkish I-with-dotJim Meyering2012-06-021-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a long-standing problem in the way grep's -i interacts with data whose byte count changes when we convert it to lower case. For example, the UTF-8 Turkish I-with-dot (İ) occupies two bytes, but its lower case analog, i, occupies just one byte. The code converts both search string and the haystack data to lower case, and then searches for the modified string in the modified buffer. The trouble arose when using a lowercase buffer <offset,length> pair to manipulate the original (longer) buffer. The solution is to change mbtolower to return additional information: a malloc'd mapping vector. With that, the caller maps the lowercase- relative <offset,length> to numbers that refer to the original buffer. This mapping is used only when lengths actually differ, so the cost in general should be small. * src/searchutils.c (mbtolower): Add the new map parameter. * src/search.h (mb_case_map_apply): New function. * src/kwsearch.c (Fexecute): Update mbtolower caller, and upon success, apply the new map. * src/dfasearch.c (EGexecute): Likewise. * tests/Makefile.am (XFAIL_TESTS): Remove turkish-I from this list; that test is no longer expected to fail. * NEWS (Bug fixes): Mention it. Reported by Ilya Basin in http://thread.gmane.org/gmane.comp.gnu.grep.bugs/3413 and later by Strahinja Kustudic in http://savannah.gnu.org/bugs/?36567
* maint: spelling fixesPaul Eggert2012-03-011-1/+1
|
* grep: fix some core dumps with long lines etc.Paul Eggert2012-03-011-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These problems mostly occur because the code attempts to stuff sizes into int or into unsigned int; this doesn't work on most 64-bit hosts and the errors can lead to core dumps. * NEWS: Document this. * src/dfa.c (token): Typedef to ptrdiff_t, since the enum's range could be as small as -128 .. 127 on practical hosts. (position.index): Now size_t, not unsigned int. (leaf_set.elems): Now size_t *, not unsigned int *. (dfa_state.hash, struct mb_char_classes.nchars, .nch_classes) (.nranges, .nequivs, .ncoll_elems, struct dfa.cindex, .calloc, .tindex) (.talloc, .depth, .nleaves, .nregexps, .nmultibyte_prop, .nmbcsets): (.mbcsets_alloc): Now size_t, not int. (dfa_state.first_end): Now token, not int. (state_num): New type. (struct mb_char_classes.cset): Now ptrdiff_t, not int. (struct dfa.utf8_anychar_classes): Now token[5], not int[5]. (struct dfa.sindex, .salloc, .tralloc): Now state_num, not int. (struct dfa.trans, .realtrans, .fails): Now state_num **, not int **. (struct dfa.newlines): Now state_num *, not int *. (prtok): Don't assume 'token' is no wider than int. (lexleft, parens, depth): Now size_t, not int. (charclass_index, nsubtoks) (parse_bracket_exp, addtok, copytoks, closure, insert, merge, delete) (state_index, epsclosure, state_separate_contexts) (dfaanalyze, dfastate, build_state, realloc_trans_if_necessary) (transit_state_singlebyte, match_anychar, match_mb_charset) (check_matching_with_multibyte_ops, transit_state_consume_1char) (transit_state, dfaexec, free_mbdata, dfaoptimize, dfafree) (freelist, enlist, addlists, inboth, dfamust): Don't assume indexes fit in 'int'. (lex): Avoid overflow in string-to-{hi,lo} conversions. (dfaanalyze): Redo indexing so that it works with size_t values, which cannot go negative. * src/dfa.h (dfaexec): Count argument is now size_t *, not int *. (dfastate): State numbers are now ptrdiff_t, not int. * src/dfasearch.c: Include "intprops.h", for TYPE_MAXIMUM. (kwset_exact_matches): Now size_t, not int. (EGexecute): Don't assume indexes fit in 'int'. Check for overflow before converting a ptrdiff_t to a regoff_t, as regoff_t is narrower than ptrdiff_t in 64-bit glibc (contra POSIX). Check for memory exhaustion in re_search rather than treating it merely as failure to match; use xalloc_die () to report any error. * src/kwset.c (struct trie.accepting): Now size_t, not unsigned int. (struct kwset.words): Now ptrdiff_t, not int. * src/kwset.h (struct kwsmatch.index): Now size_t, not int.
* maint: update all copyright year number rangesJim Meyering2012-01-011-1/+1
| | | | Run "make update-copyright".
* maint: dfa: simplify multi-byte-related conditionalsJim Meyering2011-09-161-2/+2
| | | | | | | | | | | | | | * src/dfa.c (setbit_case_fold_c, parse_bracket_exp, lex): (addtok_mb, dfaparse): Change each "MBS_SUPPORT && MB_CUR_MAX > 1" test to just "MB_CUR_MAX > 1". * src/dfasearch.c (kwsincr_case, EGexecute): Likewise. * src/kwsearch.c (Fcompile, Fexecute): Likewise. * src/searchutils.c (kwsinit): Likewise. * src/dfa.c (parse_bracket_exp): Convert "if (!MBS_SUPPORT || MB_CUR_MAX == 1)" to "if (MB_CUR_MAX == 1)" and do this: - assert(!MBS_SUPPORT || MB_CUR_MAX == 1); + assert(MB_CUR_MAX == 1);
* maint: convert #if-MBS_SUPPORT (EGexecute)Jim Meyering2011-09-161-5/+4
| | | | * src/dfasearch.c (EGexecute): Remove in-function #if MBS_SUPPORT.
* maint: convert #if-MBS_SUPPORT (kwsincr_case)Jim Meyering2011-09-161-10/+4
| | | | | * src/dfasearch.c (kwsincr_case): Remove in-function #if MBS_SUPPORT. Move decl's down.
* dfa: avoid possibility of overflowPaul Eggert2011-06-191-3/+1
| | | | | | | * src/dfa.c (REALLOC_IF_NECESSARY, CALLOC, MALLOC, REALLOC): Use functions from xalloc.h to avoid overflow. * src/dfasearch.c (GEAcompile): Use xnrealloc rather than realloc. * src/pcresearch.c (Pcompile): Use xnmalloc, not xmalloc.
* maint: update copyright year ranges to include 2011Jim Meyering2011-01-031-1/+1
| | | | Run "make update-copyright", so "make syntax-check" works in 2011.
* build: avoid compilation failure on the HurdJim Meyering2010-09-211-4/+4
| | | | | | * src/dfasearch.c (dfawarn): Rename enum symbols to use DW_ prefix, so as not to collide with "GNU", which is defined by the Hurd. Reported by Matthias Lanzinger in http://savannah.gnu.org/bugs/?31096
* grep: diagnose and exit-2 for bogus REs like [:space:], [:digit:], etc.Jim Meyering2010-09-011-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I make a mistake like this: grep '[:lower:]' ... be it in a script or on the command line, I want to know about it as soon as possible. I don't want grep to print a mere warning that it is interpreting this suspicious and almost guaranteed-wrong regular expression as a set of just 6 bytes. And I certainly don't want grep to silently do the wrong thing, even if that would be officially standards-conforming. It's obvious that I intended [[:lower:]], and I want my error to be diagnosed in a way that is most likely to get my attention. Thus, with this change, grep now prints a diagnostic and exits with status 2 the moment it encounters an offending [:char_class:] construct. This changes the way grep works by default, rather than putting this new behavior on an option. A new option would seldom be used in scripts (not portable), and would probably be used only rarely by those who need it the most. This new functionality provides a valuable safety measure and incurs truly negligible risk. For strict POSIX compliance, set POSIXLY_CORRECT in your environment. That disables this new feature. Revert the changes from commit 2cd3bcea, "grep: add --warnings={always,never,auto}.", and then do the following: * src/dfasearch.c (dfawarn): Call getenv("POSIXLY_CORRECT") here; Remove "warning: " from the diagnostic, now that it's more than a warning, and exit with status 2. * NEWS (New features): Describe the new semantics. * tests/warn-char-classes: Adjust one test to accommodate this change. * doc/grep.texi (Character Classes and Bracket Expressions): Document. (Environment Variables): Cross-reference it. Remove reference to obsolete getopt illegal vs. invalid difference. Thanks to Paul Eggert for suggestions and an initial prod.
* dfa: warn on [:space:] and similarPaolo Bonzini2010-08-271-0/+7
| | | | | | | | * src/dfa.c (parse_bracket_exp): Warn on regular expressions such as [:space:]. * src/dfa.h (dfawarn): New prototype. * src/dfasearch.c (dfawarn): New. * NEWS: Document.
* search: Avoid out-of-bounds access.Bruno Haible2010-05-241-1/+1
| | | | | * src/dfasearch.c (EGexecute): Avoid access beyond end of buffer that could happen if start != beg - buf.
* maint: restrict scope of two globals to dfasearch.cJim Meyering2010-04-191-2/+2
| | | | | * src/dfasearch.c (patterns, pcount): Declare these file-scoped globals to be static.
* convert all TABs to equivalent spaces in indentationJim Meyering2010-04-081-167/+167
| | | | | | | | | | | | | | | | | | | Using this file, cat > leading-blank.exempt <<\EOF (?:^|\/)ChangeLog[^/]*$ (?:^|\/)(?:GNU)?[Mm]akefile[^/]*$ \.(?:am|mk)$ EOF run this command to convert all non-conforming leading white space to be all spaces: git ls-files \ | pcregrep -vf leading-blank.exempt \ | xargs pcregrep -l '^ *\t' \ | xargs perl -MText::Tabs -ni -le \ '$m=/^( *\t[ \t]*)(.*)/; print $m ? expand($1) . $2 : $_'
* dfa: move internals from dfa.h to dfa.cArnold D. Robbins2010-04-071-7/+9
| | | | | | | | * src/dfa.h: Move internals into dfa.c. * src/dfa.c: The dfa internals are now totally local to this file. (dfaalloc, dfamusts, dfabroken): New functions to access features. * src/dfasearch.c (dfa): Change this global variable from struct to pointer. Adapt to that change, and use new functions, dfamusts and dfaalloc.
* maint: MBS_SUPPORT: define to 0/1, not undef/1Jim Meyering2010-04-021-3/+3
| | | | | | | | | | | | | | Prepare to remove many of these #ifdefs. * src/mbsupport.h (MBS_SUPPORT): Define to 0/1, not undef/1. Change each "#ifdef MBS_SUPPORT" to "#if MBS_SUPPORT". Use this: perl -pi -e 's/ifdef (MBS_SUPPORT)/if $1/' $(g grep -l ifdef.MBS_SUPPO) * src/dfa.c: s/#ifdef MBS_SUPPORT/#if MBS_SUPPORT/ * src/dfa.h: Likewise. * src/dfasearch.c: Likewise. * src/kwsearch.c: Likewise. * src/main.c: Likewise. * src/search.h: Likewise. * src/searchutils.c: Likewise.
* cleanup and improvement: parse command line arguments consistentlyJim Meyering2010-04-021-1/+1
| | | | | | | | | | | | * src/main.c: Include c-ctype.h, for this: (prepend_args): Use c_isspace, not ISSPACE. This is important so that we parse arguments consistently, and independently of the current locale. * bootstrap.conf (gnulib_modules): Add c-ctype. * src/system.h: Remove IS* definitions here, too. * src/dfasearch.c (WCHAR): Use isalnum, not ISALNUM. * src/kwsearch.c (WCHAR): Likewise. * src/searchutils.c (kwsinit): Use tolower, not TOLOWER.
* grep -F: fix a multi-byte erroneous-match-in-middle bugJim Meyering2010-03-281-1/+3
| | | | | | | | | | | | | | | | | | | | | Just as Perl prints nothing in this case, printf '\357\274\241\n' | perl -CIO -lne '/\357/ and print' grep should also print nothing when used as follows. However, these would mistakenly match with grep prior to 2.6.2: printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\357' printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\357\274' * src/searchutils.c (is_mb_middle): New parameter: the length of the match, in bytes, as determined by kwsexec. Use this to detect when the nominal match found by kwsexec must be skipped because it is for an incomplete multi-byte character that is a prefix of a character in the input. * src/dfasearch.c (EGexecute): Update caller. * src/kwsearch.c (Fexecute): Likewise. * src/search.h: Update prototype. * NEWS (Bug fixes): Mention it. Report and analysis by Norihiro Tanaka.
* grep: fix printing when -w is used and regex is needed for matchingNorihirio Tanaka2010-03-251-2/+1
| | | | | | | | * NEWS: Document bugfix. * src/dfasearch.c (EGexecute): After assess_pattern_match len, is either invalid or end-beg; jump to success. * tests/Makefile.am (TESTS): Add new test. * tests/backref-word: New.
* build: avoid warnings: tell gcc and clang that dfaerror never returnsJim Meyering2010-03-231-0/+4
| | | | | | * src/dfa.h (__attribute__): Define. (dfaerror): Declare with the "noreturn" attribute. * src/dfasearch.c (dfaerror): Add an unreachable use of abort.
* grep: libify *search.cPaolo Bonzini2010-03-221-3/+2
| | | | | | | | | | * src/Makefile.am (libsearch_a_SOURCES): Add dfasearch.c, kwsearch.c, pcresearch.c. * src/esearch.c, src/fsearch.c, * src/gsearch.c: Only include search.h. * src/dfasearch.c (GEAcompile, EGexecute): Export. * src/kwsearch.c (Fcompile, Fexecute): Export. * src/pcresearch.c (Pcompile, Pexecute): Export. * src/search.h: Add new exported functions.
* grep: prepare for libification of *search.cPaolo Bonzini2010-03-221-6/+0
| | | | | | * src/dfasearch.c (Ecompile): Remove. * src/esearch.c: Place it here... * src/gsearch.c: ... and here.
* grep: split search.cPaolo Bonzini2010-03-221-0/+395
* po/POTFILES.in: Update. * src/Makefile.am (grep_SOURCES, egrep_SOURCES, fgrep_SOURCES): Move kwset.c and dfa.c to libsearch.a. Add searchutils.c there too. * src/search.h, src/dfasearch.c, src/pcresearch.c, src/kwsearch.c, src/searchutils.c: New files, split out of src/search.c. * src/esearch.c, src/fsearch.c: Include the new files instead of search.c. * src/gsearch.c: Likewise, plus move Gcompile/Acompile here.