| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See Bug#25366.
* src/character.c (blankp): New function for checking Unicode
horizontal whitespace.
* src/regex.c (ISBLANK): Use 'blankp' for non-ASCII horizontal
whitespace.
(BIT_BLANK): New bit for range table.
(re_wctype_to_bit, execute_charset): Use it.
* test/lisp/subr-tests.el (subr-tests--string-match-p--blank): Add
unit test for [:blank:] character class.
* test/src/regex-tests.el (test): Adapt unit test.
* doc/lispref/searching.texi (Char Classes): Document new Unicode
behavior for [:blank:].
|
|
|
|
|
|
| |
Run admin/update-copyright in the master branch. This fixes files
that were not already fixed in the emacs-25 branch before it was
merged here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having one test for all character classes it is not always trivial to
determine which class is failing. This happens when failure is caused
by ‘(should (equal (point) (point-max)))’ not being met.
With per-character class tests, it is immidiatelly obvious which test
causes issues plus tests for all classes are run even if some of them
fail.
* test/src/regex-tests.el (regex-character-classes): Delete and split
into…
(regex-tests-alnum-character-class, regex-tests-alpha-character-class,
regex-tests-ascii-character-class, regex-tests-blank-character-class,
regex-tests-cntrl-character-class, regex-tests-digit-character-class,
regex-tests-graph-character-class, regex-tests-lower-character-class,
regex-tests-multibyte-character-class,
regex-tests-nonascii-character-class,
regex-tests-print-character-class, regex-tests-punct-character-class,
regex-tests-space-character-class,
regex-tests-unibyte-character-class,
regex-tests-upper-character-class, regex-tests-word-character-class,
regex-tests-xdigit-character-class): …new tests.
|
|
|
|
| |
* test/file-organization.org: Rename from test/file-organisation.org.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[82a487d: Fix reading of regex-resources in regex-tests] attempted to
fix regex-tests failing when run from the source tree (i.e. via make)
by hard-coding path to regex-resources directory relative to the test
directory.
This fixed runs from the tree but broke the test when run using other
methods.
Fix by trying ‘load-file-name’ or ‘buffer-file-name’, whichever is set.
* test/src/regex-tests.el (regex-tests--resources-dir): New variable
storing path to the regex-resources directory.
(regex-tests-generic-line): Use aforementioned variable.
|
|
|
|
|
|
|
|
|
|
| |
This fixes the following warning:
In toplevel form:
src/regex-tests.el:416:1:Warning: Unused lexical variable ‘newline’
* test/src/regex-tests.el (regex-tests-BOOST): Remove unused lexical
variable.
|
|
|
|
|
|
|
|
| |
* test/src/regex-tests.el (regex-tests): Remove and split into multiple
tests cases.
(regex-tests-glbic-BOOST, regex-tests-glibc-PCRE,
regex-tests-glibc-PTESTS, regex-tests-glibc-TESTS): New test cases split
from ‘regex-tests’.
|
|
|
|
|
| |
* test/src/regex-test.el: Don’t (require 'cl).
(regex-tests-PCRE): s/loop/cl-loop/
|
|
|
|
|
|
|
|
|
|
|
| |
* test/src/regex-tests.el (regex-tests-generic-line): Referring to
‘buffer-file-name’ does not work when running the test from command
line, i.e. via make, which results in (wrong-type-argument stringp nil)
failures. Replace it with hard-coded path.
(regex-tests-BOOST, regex-tests-PCRE, regex-tests-PTESTS-whitelist,
regex-tests-TESTS-whitelist): ‘regex-tests-generic-line’ now includes
the ‘regex-resources’ path component so the tests don’t need to specify
it explicitly.
|
|
|
|
|
|
|
| |
* test/src/regex-tests.el (regex-tests): Test executing glibc tests
cases.
[mina86@mina86.com: merged test with existing file]
|
|
The regex engine tries to optimise Kleene star by avoiding backtracking
when it can detect that star’s operand cannot match what follows it in
the pattern.
For example, when ‘[[:alpha:]]*1’ tries to match a ‘foo’, the engine
will test the longest match for ‘[[:alpha:]]*’, namely ’foo’ which is
the entire string. Literal digit one still present in the pattern will
however not match the remaining empty string.
Normally, backtracking would be performed trying a shorter match for the
character class (namely ‘fo’ leaving ‘o’ in the string), but since the
engine knows whatever would be put back into the string cannot possibly
match literal digit one so no backtracking will be attempted.
In the regexes of the form ‘[[:CC:]]*X’, the optimisation can be applied
if the character class CC does not match character X. In the above
example, this holds because digit one is not in alpha character class.
This test is performed by mutually_exclusive_p function but it did not
check class bits of a charset opcode. This resulted in an assumption
that character classes do not match multibyte characters. For example,
it would incorrectly conclude that [[:alpha:]] doesn’t match ‘ż’.
This, in turn, led to the aforementioned Kleene star optimisation being
incorrectly applied in patterns such as ‘[[:graph:]]*☠’ (which should
match ‘☠’ but doesn’t as can be tested by executing
(string-match-p "[[:graph:]]*☠" "☠")
which should return 0 but instead yields nil.
This issue affects any class witch matches multibyte characters, i.e.
if ‘[[:cc:]]’ matches a multibyte character X then ‘[[:cc:]]*X’ will
fail to match ‘X’.
* src/regex.c (executing_charset): A new function for executing the
charset and charset_not opcodes. It performs check on the character
taking into consideration existing bitmap, range table and class bits.
It also advances the pointer in the regex bytecode past the parsed
opcode.
(CHARSET_LOOKUP_RANGE_TABLE_RAW, CHARSET_LOOKUP_RANGE_TABLE): Removed.
Code now included in executing_charset.
(mutually_exclusive_p, re_match_2_internal): Changed to take advantage
of executing_charset function.
* test/src/regex-tests.el: New file with tests for the character class
matching.
|