| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
This was an off-by-one error caused by my failing to realize that things
had to be done differently at the 255/256 boundary depending on whether
U+00FF matched or did not match the property.
Two properties were affected, [:upper:] and [:punct:]. The bug was that
all code points above the first one > 255 that legitimately matches the
property will match whether or not they should. In the case of
[:upper:], this meant that effectively anything from 256..infinity
matched. For [:punct:], it was anything above U+037D.
|
|
|
|
|
|
| |
Same for [[:upper:]] and \p{Upper}. These were matching instead all of
[[:alpha:]] or \p{Alpha}. What /\p{Lower}/i and /\p{Upper}/i match instead
is \p{Cased}, and so that is what these should match.
|
|
|
|
|
| |
\h and \p{XPosixBlank} contain the same code points, so there is no need
to have both of them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These will be used in regcomp.c to replace the existing bit-wise
handling of these, enabling subsequent optimizations.
These are compiled-in, and hence affect the memory footprint of every
program, including those that don't use Unicode. The lists that aren't
tiny are therefore currently restricted to only the Latin1 range;
anything needed beyond that will have to be read in at execution time,
just as before.
The design allows for easy conversion from Latin1 to use the full
Unicode range, should it be deemed desirable for some or all of these.
|
|
This will be used to generate compile-time inversion lists in a C hdr
file that can be included in programs for initialization speed
Three simple inversion lists are included in this initial commit
|