summaryrefslogtreecommitdiff
path: root/pod/perlrebackslash.pod
diff options
context:
space:
mode:
authorKarl Williamson <khw@khw-desktop.(none)>2010-06-22 14:29:10 -0600
committerJesse Vincent <jesse@bestpractical.com>2010-06-28 22:30:04 -0400
commitd8b950dcbc51bd501c5dc196cc12d87eaf47b60c (patch)
treefd00ef847f27621f035f8c4fd827df582fa1433d /pod/perlrebackslash.pod
parentc27a5cfe2661343fcb3b4f58478604d8b59b20de (diff)
downloadperl-d8b950dcbc51bd501c5dc196cc12d87eaf47b60c.tar.gz
Prefer \g1 over \1 in pods
\g was added to avoid ambiguities that \digit causes. This updates the pod documentation to use \g in examples, and to prefer it when explaining the concepts. Some non-symmetrical outlined text dealing with it was also cleaned up.
Diffstat (limited to 'pod/perlrebackslash.pod')
-rw-r--r--pod/perlrebackslash.pod33
1 files changed, 15 insertions, 18 deletions
diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod
index 5e514ceec6..4f1bed67a5 100644
--- a/pod/perlrebackslash.pod
+++ b/pod/perlrebackslash.pod
@@ -227,10 +227,10 @@ as a character without special meaning by the regex engine, and will match
=head4 Caveat
-Octal escapes potentially clash with backreferences. They both consist
-of a backslash followed by numbers. So Perl has to use heuristics to
-determine whether it is a backreference or an octal escape. Perl uses
-the following rules:
+Octal escapes potentially clash with old-style backreferences (see L</Absolute
+referencing> below). They both consist of a backslash followed by numbers. So
+Perl has to use heuristics to determine whether it is a backreference or an
+octal escape. Perl uses the following rules:
=over 4
@@ -348,7 +348,6 @@ L<perlunicode/Unicode Character Properties>.
Mnemonic: I<p>roperty.
-
=head2 Referencing
If capturing parenthesis are used in a regular expression, we can refer
@@ -361,18 +360,18 @@ absolutely, relatively, and by name.
=head3 Absolute referencing
Either C<\gI<N>> (starting in Perl 5.10.0), or C<\I<N>> (old-style) where I<N>
-is an positive (unsigned) decimal number of any length is an absolute reference
+is a positive (unsigned) decimal number of any length is an absolute reference
to a capturing group.
-I<N> refers to the Nth set of parentheses - or more accurately, whatever has
+I<N> refers to the Nth set of parentheses - so C<\gI<N>> refers to whatever has
been matched by that set of parenthesis. Thus C<\g1> refers to the first
capture group in the regex.
The C<\gI<N>> form can be equivalently written as C<\g{I<N>}>
which avoids ambiguity when building a regex by concatenating shorter
-strings. Otherwise if you had a regex C</$a$b/>, and C<$a> contained C<"\g1">,
-and C<$b> contained C<"37">, you would get C</\g137/> which is probably not
-what you intended.
+strings. Otherwise if you had a regex C<qr/$a$b/>, and C<$a> contained
+C<"\g1">, and C<$b> contained C<"37">, you would get C</\g137/> which is
+probably not what you intended.
In the C<\I<N>> form, I<N> must not begin with a "0", and there must be at
least I<N> capturing groups, or else I<N> will be considered an octal escape
@@ -413,17 +412,15 @@ even if the larger pattern also contains capture groups.
=head3 Named referencing
-Also new in perl 5.10.0 is the use of named capture groups, which can be
-referred to by name. This is done with C<\g{name}>, which is a
-backreference to the capture group with the name I<name>.
+C<\g{I<name>}> (starting in Perl 5.10.0) can be used to back refer to a
+named capture group, dispensing completely with having to think about capture
+buffer positions.
To be compatible with .Net regular expressions, C<\g{name}> may also be
written as C<\k{name}>, C<< \k<name> >> or C<\k'name'>.
-Note that C<\g{}> has the potential to be ambiguous, as it could be a named
-reference, or an absolute or relative reference (if its argument is numeric).
-However, names are not allowed to start with digits, nor are they allowed to
-contain a hyphen, so there is no ambiguity.
+To prevent any ambiguity, I<name> must not start with a digit nor contain a
+hyphen.
=head4 Examples
@@ -582,7 +579,7 @@ Mnemonic: eI<X>tended Unicode character.
"\x{256}" =~ /^\C\C$/; # Match as chr (256) takes 2 octets in UTF-8.
$str =~ s/foo\Kbar/baz/g; # Change any 'bar' following a 'foo' to 'baz'
- $str =~ s/(.)\K\1//g; # Delete duplicated characters.
+ $str =~ s/(.)\K\g1//g; # Delete duplicated characters.
"\n" =~ /^\R$/; # Match, \n is a generic newline.
"\r" =~ /^\R$/; # Match, \r is a generic newline.