diff options
author | Karl Williamson <khw@cpan.org> | 2014-09-05 09:09:28 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2014-09-06 21:44:49 -0600 |
commit | 8f0cd35a38dde9ab975f5ee1a663b81939e17745 (patch) | |
tree | 1b79e320980b4937f349841c068458ce5d68c529 /lib/diagnostics.t | |
parent | a5454c469023876ca9422440f302f587dba2a438 (diff) | |
download | perl-8f0cd35a38dde9ab975f5ee1a663b81939e17745.tar.gz |
Allow \N{named seq} in qr/[...]/
This commit changes the regex handler to properly match in many
instances a \N{named sequence} in a bracketed character class.
A named sequence is one which consists of a string of multiple
characters but given one name. Unicode has hundreds of them, like LATIN
CAPITAL LETTER A WITH MACRON AND GRAVE. These are encoded by Unicode
when there is some user community that thinks of the conglomeration as a
single unit, but there was no prior standard that had it so, and it is
possible to encode it in Unicode using other means, typically a sequence
of a base character followed by some combining marks. (If there had not
been such a prior standard, 8859-1, things like LATIN CAPITAL LETTER A
WITH GRAVE would have been put into Unicode this way too.) If they did
not do it this way, they would run out of availble code points much
sooner.
Not having these as single characters adds a burden to the programmer
having to deal with them. Hiding this detail as much as possible makes
it easier to program. This commit hides this in one more place than
previously.
It takes advantage of the infrastructure added some releases ago dealing
with the fact that the match of some single characters
case-insensitively can be 2 or even 3 characters.
"ss" =~ /[ß]/i;
is the most prominent example.
We earlier discovered that /[^ß]/ leads to unexpected behavior, and
using one of these sequences as an endpoint in a range is also unclear
as to what is meant. This commit leaves existing behavior for those
cases. That behavior is to use just the first code point in the
sequence for regular [...], and to generate a fatal syntax error for
(?[...]).
Diffstat (limited to 'lib/diagnostics.t')
-rw-r--r-- | lib/diagnostics.t | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/lib/diagnostics.t b/lib/diagnostics.t index 4ac2ebfe2b..0b35d16c06 100644 --- a/lib/diagnostics.t +++ b/lib/diagnostics.t @@ -106,7 +106,7 @@ seek STDERR, 0,0; $warning = ''; warn "Using just the first character returned by \\N{} in character class in regex; marked by <-- HERE in m/%s/"; like $warning, - qr/A charnames handler may return a sequence/s, + qr/Named Unicode character escapes/s, 'multi-line entries in perldiag.pod match'; # ; at end of entry in perldiag.pod |