summaryrefslogtreecommitdiff
path: root/pod/perlre.pod
diff options
context:
space:
mode:
Diffstat (limited to 'pod/perlre.pod')
-rw-r--r--pod/perlre.pod65
1 files changed, 48 insertions, 17 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 6e68bcd1db..b9216c156c 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -594,20 +594,15 @@ whitespace formatting, a simple C<#> will suffice. Note that Perl closes
the comment as soon as it sees a C<)>, so there is no way to put a literal
C<)> in the comment.
-=item C<(?pimsx-imsx)>
+=item C<(?dlupimsx-imsx)>
-=item C<(?^pimsx)>
+=item C<(?^lupimsx)>
X<(?)> X<(?^)>
One or more embedded pattern-match modifiers, to be turned on (or
turned off, if preceded by C<->) for the remainder of the pattern or
the remainder of the enclosing pattern group (if any).
-Starting in Perl 5.14, a C<"^"> (caret or circumflex accent) immediately
-after the C<"?"> is a shorthand equivalent to C<-imsx> and compiling the
-regex under C<no locale>. Flags may follow the caret to override it.
-But a minus sign is not legal with it.
-
This is particularly useful for dynamic patterns, such as those read in from a
configuration file, taken from an argument, or specified in a table
somewhere. Consider the case where some patterns want to be case
@@ -634,17 +629,53 @@ These modifiers do not carry over into named subpatterns called in the
enclosing group. In other words, a pattern such as C<((?i)(&NAME))> does not
change the case-sensitivity of the "NAME" pattern.
-Note that the C<p> modifier is special in that it can only be enabled,
-not disabled, and that its presence anywhere in a pattern has a global
-effect. Thus C<(?-p)> and C<(?-p:...)> are meaningless and will warn
-when executed under C<use warnings>.
+Starting in Perl 5.14, a C<"^"> (caret or circumflex accent) immediately
+after the C<"?"> is a shorthand equivalent to C<d-imsx>. Flags (except
+C<"d">) may follow the caret to override it.
+But a minus sign is not legal with it.
+
+Also, starting in Perl 5.14, are modifiers C<"d">, C<"l">, and C<"u">,
+which for 5.14 may not be used as suffix modifiers.
+
+C<"l"> means to use a locale (see L<perllocale>) when pattern matching.
+The locale used will be the one in effect at the time of execution of
+the pattern match. This may not be the same as the compilation-time
+locale, and can differ from one match to another if there is an
+intervening call of the
+L<setlocale() function|perllocale/The setlocale function>.
+This modifier is automatically set if the regular expression is compiled
+within the scope of a C<"use locale"> pragma.
+
+C<"u"> has no effect currently. It is automatically set if the regular
+expression is compiled within the scope of a
+L<C<"use feature 'unicode_strings">|feature> pragma.
+
+C<"d"> means to use the traditional Perl pattern matching behavior.
+This is dualistic (hence the name C<"d">, which also could stand for
+"default"). When this is in effect, Perl matches utf8-encoded strings
+using Unicode rules, and matches non-utf8-encoded strings using the
+platform's native character set rules.
+See L<perlunicode/The "Unicode Bug">. It is automatically selected by
+default if the regular expression is compiled neither within the scope
+of a C<"use locale"> pragma nor a <C<"use feature 'unicode_strings">
+pragma.
+
+Note that the C<d>, C<l>, C<p>, and C<u> modifiers are special in that
+they can only be enabled, not disabled, and the C<d>, C<l>, and C<u>
+modifiers are mutually exclusive; a maximum of one may appear in the
+construct. Specifying one de-specifies the others. Thus, for example,
+C<(?-p)> and C<(?-d:...)> are meaningless and will warn when compiled
+under C<use warnings>.
+
+Note also that the C<p> modifier is special in that its presence
+anywhere in a pattern has a global effect.
=item C<(?:pattern)>
X<(?:)>
-=item C<(?imsx-imsx:pattern)>
+=item C<(?dluimsx-imsx:pattern)>
-=item C<(?^imsx:pattern)>
+=item C<(?^luimsx:pattern)>
X<(?^:)>
This is for clustering, not capturing; it groups subexpressions like
@@ -660,7 +691,7 @@ but doesn't spit out extra fields. It's also cheaper not to capture
characters if you don't need to.
Any letters between C<?> and C<:> act as flags modifiers as with
-C<(?imsx-imsx)>. For example,
+C<(?dluimsx-imsx)>. For example,
/(?s-i:more.*than).*million/i
@@ -669,8 +700,8 @@ is equivalent to the more verbose
/(?:(?s-i)more.*than).*million/i
Starting in Perl 5.14, a C<"^"> (caret or circumflex accent) immediately
-after the C<"?"> is a shorthand equivalent to C<-imsx> and compiling the
-regex under C<no locale>. Any positive flags may follow the caret, so
+after the C<"?"> is a shorthand equivalent to C<d-imsx>. Any positive
+flags (except C<"d">) may follow the caret, so
(?^x:foo)
@@ -679,7 +710,7 @@ is equivalent to
(?x-ims:foo)
The caret tells Perl that this cluster doesn't inherit the flags of any
-surrounding pattern, but to go back to the system defaults (C<-imsx>),
+surrounding pattern, but to go back to the system defaults (C<d-imsx>),
modified by any flags specified.
The caret allows for simpler stringification of compiled regular