summaryrefslogtreecommitdiff
path: root/pod/perlre.pod
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2010-09-20 18:57:24 -0600
committerFather Chrysostomos <sprout@cpan.org>2010-09-22 22:54:23 -0700
commit9de15fec376a8ff90a38fad0ff322c72c2995765 (patch)
tree95729b4e82e14d795b481df7902da07be8c9ab67 /pod/perlre.pod
parent4c2c679ff9fc18054795b9b7b28e37453e57d146 (diff)
downloadperl-9de15fec376a8ff90a38fad0ff322c72c2995765.tar.gz
Add /d, /l, /u (infixed) regex modifiers
This patch adds recognition of these modifiers, with appropriate action for d and l. u does nothing useful yet. This allows for the interpolation of a regex into another one without losing the character set semantics that it was compiled with, as for the first time, the semantics is now specified in the stringification as one of these modifiers. To this end, it allocates an unused bit in the structures. The off- sets change so as to not disturb other bits.
Diffstat (limited to 'pod/perlre.pod')
-rw-r--r--pod/perlre.pod65
1 files changed, 48 insertions, 17 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 6e68bcd1db..b9216c156c 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -594,20 +594,15 @@ whitespace formatting, a simple C<#> will suffice. Note that Perl closes
the comment as soon as it sees a C<)>, so there is no way to put a literal
C<)> in the comment.
-=item C<(?pimsx-imsx)>
+=item C<(?dlupimsx-imsx)>
-=item C<(?^pimsx)>
+=item C<(?^lupimsx)>
X<(?)> X<(?^)>
One or more embedded pattern-match modifiers, to be turned on (or
turned off, if preceded by C<->) for the remainder of the pattern or
the remainder of the enclosing pattern group (if any).
-Starting in Perl 5.14, a C<"^"> (caret or circumflex accent) immediately
-after the C<"?"> is a shorthand equivalent to C<-imsx> and compiling the
-regex under C<no locale>. Flags may follow the caret to override it.
-But a minus sign is not legal with it.
-
This is particularly useful for dynamic patterns, such as those read in from a
configuration file, taken from an argument, or specified in a table
somewhere. Consider the case where some patterns want to be case
@@ -634,17 +629,53 @@ These modifiers do not carry over into named subpatterns called in the
enclosing group. In other words, a pattern such as C<((?i)(&NAME))> does not
change the case-sensitivity of the "NAME" pattern.
-Note that the C<p> modifier is special in that it can only be enabled,
-not disabled, and that its presence anywhere in a pattern has a global
-effect. Thus C<(?-p)> and C<(?-p:...)> are meaningless and will warn
-when executed under C<use warnings>.
+Starting in Perl 5.14, a C<"^"> (caret or circumflex accent) immediately
+after the C<"?"> is a shorthand equivalent to C<d-imsx>. Flags (except
+C<"d">) may follow the caret to override it.
+But a minus sign is not legal with it.
+
+Also, starting in Perl 5.14, are modifiers C<"d">, C<"l">, and C<"u">,
+which for 5.14 may not be used as suffix modifiers.
+
+C<"l"> means to use a locale (see L<perllocale>) when pattern matching.
+The locale used will be the one in effect at the time of execution of
+the pattern match. This may not be the same as the compilation-time
+locale, and can differ from one match to another if there is an
+intervening call of the
+L<setlocale() function|perllocale/The setlocale function>.
+This modifier is automatically set if the regular expression is compiled
+within the scope of a C<"use locale"> pragma.
+
+C<"u"> has no effect currently. It is automatically set if the regular
+expression is compiled within the scope of a
+L<C<"use feature 'unicode_strings">|feature> pragma.
+
+C<"d"> means to use the traditional Perl pattern matching behavior.
+This is dualistic (hence the name C<"d">, which also could stand for
+"default"). When this is in effect, Perl matches utf8-encoded strings
+using Unicode rules, and matches non-utf8-encoded strings using the
+platform's native character set rules.
+See L<perlunicode/The "Unicode Bug">. It is automatically selected by
+default if the regular expression is compiled neither within the scope
+of a C<"use locale"> pragma nor a <C<"use feature 'unicode_strings">
+pragma.
+
+Note that the C<d>, C<l>, C<p>, and C<u> modifiers are special in that
+they can only be enabled, not disabled, and the C<d>, C<l>, and C<u>
+modifiers are mutually exclusive; a maximum of one may appear in the
+construct. Specifying one de-specifies the others. Thus, for example,
+C<(?-p)> and C<(?-d:...)> are meaningless and will warn when compiled
+under C<use warnings>.
+
+Note also that the C<p> modifier is special in that its presence
+anywhere in a pattern has a global effect.
=item C<(?:pattern)>
X<(?:)>
-=item C<(?imsx-imsx:pattern)>
+=item C<(?dluimsx-imsx:pattern)>
-=item C<(?^imsx:pattern)>
+=item C<(?^luimsx:pattern)>
X<(?^:)>
This is for clustering, not capturing; it groups subexpressions like
@@ -660,7 +691,7 @@ but doesn't spit out extra fields. It's also cheaper not to capture
characters if you don't need to.
Any letters between C<?> and C<:> act as flags modifiers as with
-C<(?imsx-imsx)>. For example,
+C<(?dluimsx-imsx)>. For example,
/(?s-i:more.*than).*million/i
@@ -669,8 +700,8 @@ is equivalent to the more verbose
/(?:(?s-i)more.*than).*million/i
Starting in Perl 5.14, a C<"^"> (caret or circumflex accent) immediately
-after the C<"?"> is a shorthand equivalent to C<-imsx> and compiling the
-regex under C<no locale>. Any positive flags may follow the caret, so
+after the C<"?"> is a shorthand equivalent to C<d-imsx>. Any positive
+flags (except C<"d">) may follow the caret, so
(?^x:foo)
@@ -679,7 +710,7 @@ is equivalent to
(?x-ims:foo)
The caret tells Perl that this cluster doesn't inherit the flags of any
-surrounding pattern, but to go back to the system defaults (C<-imsx>),
+surrounding pattern, but to go back to the system defaults (C<d-imsx>),
modified by any flags specified.
The caret allows for simpler stringification of compiled regular