summaryrefslogtreecommitdiff
path: root/regcharclass.h
Commit message (Collapse)AuthorAgeFilesLines
* regen/regcharclass.pl: Generate better code for some macrosKarl Williamson2012-10-201-13/+13
| | | | | | | | | | | | This commit revamps the recently added function calculate_mask() to not just work to give a single mask/compare value for its input and fail if there are none, but to return a list of masks/compares when the set can be split up into subsets that each can be represented by a mask/compare. If this list taken as a whole yields fewer branches than what we get otherwise, it is better code, and is used. Said another way, what we had there before was all or nothing; this works to improve things even if we can't do it all.
* regen/regcharclass.pl: Change name of generated macroKarl Williamson2012-10-161-2/+2
| | | | | | | | This changes the macro isMULTI_CHAR_FOLD() (non-utf8 version) from just generating ascii-range code points to generating the full Latin1 range. However there are no such non-ASCII values, so the macro expansion is unchanged. By changing the name, it becomes clearer in future commits that we aren't excluding things that we should be considering.
* regen/regcharclass.pl: Generate macros for multi-char fold sequencesKarl Williamson2012-10-091-0/+225
| | | | These will be used in future commits
* regen/regcharclass.pl: improved optree generationYves Orton2012-10-031-12/+6
| | | | | | Karl Williamson noticed that we dont always deal with common suffixes in the most efficient way. This change reworks how we convert a trie to an optree so that common suffixes are always grouped together.
* remove test define from regen/regcharclass.plYves Orton2012-09-291-16/+0
|
* improve conditional folding logic in regen/regcharclass.plYves Orton2012-09-291-122/+40
|
* fix perl #115078, ternary folding logic failureYves Orton2012-09-291-4/+5
|
* add a new define for testing perl #115078Yves Orton2012-09-291-0/+19
| | | | | | We dont have any easy way to test regen/regcharclass.pl currently. Perl #115078 is related to a bug in the _cleanup() routine which is fixed with next patch.
* utf8.h: Remove some EBCDIC dependenciesKarl Williamson2012-09-131-0/+39
| | | | | | | | | | | regen/regcharclass.pl has been enhanced in previous commits so that it generates as good code as these hand-defined macro definitions for various UTF-8 constructs. And, it should be able to generate EBCDIC ones as well. By using its definitions, we can remove the EBCDIC dependencies for them. It is quite possible that the EBCDIC versions were wrong, since they have never been tested. Even if regcharclass.pl has bugs under EBCDIC, it is easier to find and fix those in one place, than all the sundry definitions.
* regen/regcharclass.pl: Add optimizationKarl Williamson2012-09-131-39/+46
| | | | | | On UTF-8 input known to be valid, continuation bytes must be in the range 0x80 .. 0x9F. Therefore, any tests for being within those bounds will always be true, and may be omitted.
* regen/regcharclass.pl: Extend previously added optimizationKarl Williamson2012-09-131-4/+4
| | | | | | | | | A previous commit added an optimization to save a branch in the generated code at the expense of an extra mask when the input class has certain characteristics. This extends that to the case where sub-portions of the class have similar characteristics. The first optimization for the entire class is moved to right before the new loop that checks each range in it.
* regen/regcharclass.pl: Add an optimizationKarl Williamson2012-09-131-30/+30
| | | | | | Branches can be eliminated from the macros that are generated here by using a mask in cases where applicable. This adds checking to see if this optimization is possible, and applies it if so.
* Use macro not swash for utf8 quotemetaKarl Williamson2012-09-131-0/+65
| | | | | | | | | | | | | | The rules for matching whether an above-Latin1 code point are now saved in a macro generated from a trie by regen/regcharclass.pl, and these are now used by pp.c to test these cases. This allows removal of a wrapper subroutine, and also there is no need for dynamic loading at run-time into a swash. This macro is about as big as I'm comfortable compiling in, but it saves the building of a hash that can grow over time, and removes a subroutine and interpreter variables. Indeed, performance benchmarks show that it is about the same speed as a hash, but it does not require having to load the rules in from disk the first time it is used.
* regen/regcharclass.pl: Generate macros for \X processingKarl Williamson2012-09-131-0/+128
| | | | | | | \X is implemented in regexec.c as a complicated series of property look-ups. It turns out that many of those are for just a few code points, and so can be more efficiently implemented with a macro than a swash. This generates those.
* regen/regcharclass.pl: Handle ranges, \p{}Karl Williamson2012-09-131-33/+3
| | | | | | | | Instead of having to list all code points in a class, you can now use \p{} or a range. This changes some classes to use the \p{}, so that any changes Unicode makes to the definitions don't have to manually be done here as well.
* /regcharclass.pl, utf8_strings.pl: Add guard to .hKarl Williamson2012-09-131-0/+6
| | | | | | Future commits will have other headers #include the headers generated by these programs. It is best to guard against the preprocessor from trying to process these twice
* regen/regcharclass.pl: Comment out obsolete codeKarl Williamson2012-08-271-163/+0
| | | | | | Tricky folds have been removed from the code, so the removed #defines are obsolete. I'm leaving this in, in so it can conveniently be referred to in case we ever need it again.
* Bump several file copyright datesSteffen Schwigon2012-01-191-1/+1
| | | | | | | Sync copyright dates with actual changes according to git history. [Plus run regen_perly.h to update the SHA-256 checksums, and regen/regcharclass.pl to update regcharclass.h]
* regex: Add commentsKarl Williamson2011-05-191-1/+1
|
* regcharclass: Add tricky fold characters.Karl Williamson2011-03-201-5/+77
| | | | | | | The tricky fold characters need to be expanded to include the ones that map to the same ones as the original set. This isn't because the new ones have a length issue, it's that they get left out of comparisons because of the special regnodes generated for the tricky ones.
* Move regencharclass.pl from Porting/ to regen/Nicholas Clark2011-01-231-1/+1
|
* Convert regcharclass.pl to use regen_lib.plNicholas Clark2011-01-231-4/+3
| | | | | This results in small changes to the formatting of the generated comments in regcharclass.h
* Change Porting/regcharclass.pl so it doesn't depend on unpack 'U0C*'Yves Orton2007-07-191-1/+2
| | | | | | Includes an updated regcharclass.h without datestamp in it so when it is trivially rebuilt it doesnt change in terms of contents. p4raw-id: //depot/perl@31636
* Re: Analysis of problems with mixed encoding case insensitive matches in ↵Yves Orton2007-04-271-228/+310
| | | | | | | regex engine. Message-ID: <9b18b3110704270709y50ef652ci436b3bb29abca275@mail.gmail.com> p4raw-id: //depot/perl@31102
* Re: Analysis of problems with mixed encoding case insensitive matches in ↵Yves Orton2007-04-261-15/+68
| | | | | | | regex engine. Message-ID: <9b18b3110704240746u461e4bdcl208ef7d7f9c5ef64@mail.gmail.com> p4raw-id: //depot/perl@31081
* Switch to hex format for integer constants in regcharclass.hRafael Garcia-Suarez2007-04-231-160/+160
| | | | | (Yves Orton). Also, avoid trailing spaces. p4raw-id: //depot/perl@31037
* Change boilerplate of regcharclass.hRafael Garcia-Suarez2007-04-231-13/+17
| | | p4raw-id: //depot/perl@31031
* Add Yves Orton's script to regenerate regcharclass.hRafael Garcia-Suarez2007-04-231-53/+117
| | | p4raw-id: //depot/perl@31030
* Change meaning of \v, \V, and add \h, \H to match Perl6, add \R to match ↵Yves Orton2007-04-231-0/+250
PCRE and unicode tr18 Message-ID: <9b18b3110704221434g43457742p28cab00289f83639@mail.gmail.com> p4raw-id: //depot/perl@31026