summaryrefslogtreecommitdiff
path: root/lib/diagnostics.t
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2014-09-05 09:09:28 -0600
committerKarl Williamson <khw@cpan.org>2014-09-06 21:44:49 -0600
commit8f0cd35a38dde9ab975f5ee1a663b81939e17745 (patch)
tree1b79e320980b4937f349841c068458ce5d68c529 /lib/diagnostics.t
parenta5454c469023876ca9422440f302f587dba2a438 (diff)
downloadperl-8f0cd35a38dde9ab975f5ee1a663b81939e17745.tar.gz
Allow \N{named seq} in qr/[...]/
This commit changes the regex handler to properly match in many instances a \N{named sequence} in a bracketed character class. A named sequence is one which consists of a string of multiple characters but given one name. Unicode has hundreds of them, like LATIN CAPITAL LETTER A WITH MACRON AND GRAVE. These are encoded by Unicode when there is some user community that thinks of the conglomeration as a single unit, but there was no prior standard that had it so, and it is possible to encode it in Unicode using other means, typically a sequence of a base character followed by some combining marks. (If there had not been such a prior standard, 8859-1, things like LATIN CAPITAL LETTER A WITH GRAVE would have been put into Unicode this way too.) If they did not do it this way, they would run out of availble code points much sooner. Not having these as single characters adds a burden to the programmer having to deal with them. Hiding this detail as much as possible makes it easier to program. This commit hides this in one more place than previously. It takes advantage of the infrastructure added some releases ago dealing with the fact that the match of some single characters case-insensitively can be 2 or even 3 characters. "ss" =~ /[ß]/i; is the most prominent example. We earlier discovered that /[^ß]/ leads to unexpected behavior, and using one of these sequences as an endpoint in a range is also unclear as to what is meant. This commit leaves existing behavior for those cases. That behavior is to use just the first code point in the sequence for regular [...], and to generate a fatal syntax error for (?[...]).
Diffstat (limited to 'lib/diagnostics.t')
-rw-r--r--lib/diagnostics.t2
1 files changed, 1 insertions, 1 deletions
diff --git a/lib/diagnostics.t b/lib/diagnostics.t
index 4ac2ebfe2b..0b35d16c06 100644
--- a/lib/diagnostics.t
+++ b/lib/diagnostics.t
@@ -106,7 +106,7 @@ seek STDERR, 0,0;
$warning = '';
warn "Using just the first character returned by \\N{} in character class in regex; marked by <-- HERE in m/%s/";
like $warning,
- qr/A charnames handler may return a sequence/s,
+ qr/Named Unicode character escapes/s,
'multi-line entries in perldiag.pod match';
# ; at end of entry in perldiag.pod