summaryrefslogtreecommitdiff
path: root/embed.fnc
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2014-09-05 09:09:28 -0600
committerKarl Williamson <khw@cpan.org>2014-09-06 21:44:49 -0600
commit8f0cd35a38dde9ab975f5ee1a663b81939e17745 (patch)
tree1b79e320980b4937f349841c068458ce5d68c529 /embed.fnc
parenta5454c469023876ca9422440f302f587dba2a438 (diff)
downloadperl-8f0cd35a38dde9ab975f5ee1a663b81939e17745.tar.gz
Allow \N{named seq} in qr/[...]/
This commit changes the regex handler to properly match in many instances a \N{named sequence} in a bracketed character class. A named sequence is one which consists of a string of multiple characters but given one name. Unicode has hundreds of them, like LATIN CAPITAL LETTER A WITH MACRON AND GRAVE. These are encoded by Unicode when there is some user community that thinks of the conglomeration as a single unit, but there was no prior standard that had it so, and it is possible to encode it in Unicode using other means, typically a sequence of a base character followed by some combining marks. (If there had not been such a prior standard, 8859-1, things like LATIN CAPITAL LETTER A WITH GRAVE would have been put into Unicode this way too.) If they did not do it this way, they would run out of availble code points much sooner. Not having these as single characters adds a burden to the programmer having to deal with them. Hiding this detail as much as possible makes it easier to program. This commit hides this in one more place than previously. It takes advantage of the infrastructure added some releases ago dealing with the fact that the match of some single characters case-insensitively can be 2 or even 3 characters. "ss" =~ /[ß]/i; is the most prominent example. We earlier discovered that /[^ß]/ leads to unexpected behavior, and using one of these sequences as an endpoint in a range is also unclear as to what is meant. This commit leaves existing behavior for those cases. That behavior is to use just the first code point in the sequence for regular [...], and to generate a fatal syntax error for (?[...]).
Diffstat (limited to 'embed.fnc')
-rw-r--r--embed.fnc2
1 files changed, 1 insertions, 1 deletions
diff --git a/embed.fnc b/embed.fnc
index d25c78ed47..88adce209b 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -2099,7 +2099,7 @@ Es |void |set_ANYOF_arg |NN RExC_state_t* const pRExC_state \
|NULLOK SV* const swash \
|const bool has_user_defined_property
Es |AV* |add_multi_match|NULLOK AV* multi_char_matches \
- |NN SV* multi_fold \
+ |NN SV* multi_string \
|const STRLEN cp_count
Es |regnode*|regclass |NN RExC_state_t *pRExC_state \
|NN I32 *flagp|U32 depth|const bool stop_at_1 \