summaryrefslogtreecommitdiff
path: root/pod/perlreapi.pod
Commit message (Collapse)AuthorAgeFilesLines
* Add documentation for /n (non-capture) regexp flag.Matthew Horsfall2014-12-301-1/+1
|
* Change 'semantics' to 'rules'Karl Williamson2014-02-201-1/+1
| | | | | | The term 'semantics' in documentation when applied to character sets is changed to 'rules' as being a shorter less-jargony synonym in this case. This was discussed several releases ago, but I didn't get around to it.
* perlreapi: use parent in example, not baseRicardo Signes2013-09-121-1/+1
|
* Use SSize_t/STRLEN in more places in regexp codeFather Chrysostomos2013-08-251-2/+2
| | | | | | | | | | | | | | | | | | | As part of getting the regexp engine to handle long strings, this com- mit changes any variables, parameters and struct members that hold lengths of the string being matched against (or parts thereof) to use SSize_t or STRLEN instead of [IU]32. To avoid having to change any logic, I kept the signedness the same. I did not change anything that affects the length of the regular expression itself, so regexps are still practically limited to I32_MAX. Changing that would involve changing the size of regnodes, which would be a lot more involved. These changes should fix bugs, but are very hard to test. In most cases, I don’t know the regexp engine well enough to come up with test cases that test the paths in question with long strings. In other cases I don’t have a box with enough memory to test the fix.
* improve regexec_flags() API documentationDavid Mitchell2013-08-131-5/+5
| | | | | | In the API, rename the 'screamer' arg to be 'sv' instead; update the description of the functions args; improve the documentation of the REXEC_* flags for the 'flags' arg.
* Fix typo in perlreapi(1perl).Nathan Trapuzzano2013-07-091-1/+1
|
* add strbeg argument to Perl_re_intuit_start()David Mitchell2013-06-021-3/+24
| | | | | | | | | | | | | | | | | | | | | | | | | (note that this is a change both to the perl API and the regex engine plugin API). Currently, Perl_re_intuit_start() is passed an SV, plus pointers to: where in the string to start matching (strpos); and to the end of the string (strend). Unlike Perl_regexec_flags(), it doesn't also have a strbeg arg. Because of this this, it guesses strbeg: based on the passed SV (if its svPOK()); or just set to strpos otherwise. This latter can happen if for example the SV is overloaded. Note also that this latter guess is wrong, and could in theory make /\b.../ fail. But just to confuse matters, although Perl_re_intuit_start() itself uses its guesstimate strbeg var, some of the functions it calls use the global value of PL_bostr instead. To make this work, the *callers* of Perl_re_intuit_start() currently set PL_bostr first. This is why \b doesn't actually break. The fix to this unholy mess is to simply add a strbeg arg to Perl_re_intuit_start(). It's also the first step to eliminating PL_bostr altogether.
* typo fix for reapi podDavid Steinbrunner2013-05-251-1/+1
|
* rework split() special case interaction with regex engineYves Orton2013-03-271-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch resolves several issues at once. The parts are sufficiently interconnected that it is hard to break it down into smaller commits. The tickets open for these issues are: RT #94490 - split and constant folding RT #116086 - split "\x20" doesn't work as documented It additionally corrects some issues with cached regexes that were exposed by the split changes (and applied to them). It effectively reverts 5255171e6cd0accee6f76ea2980e32b3b5b8e171 and cccd1425414e6518c1fc8b7bcaccfb119320c513. Prior to this patch the special RXf_SKIPWHITE behavior of split(" ", $thing) was only available if Perl could resolve the first argument to split at compile time, meaning under various arcane situations. This manifested as oddities like my $delim = $cond ? " " : qr/\s+/; split $delim, $string; and split $cond ? " ", qr/\s+/, $string not behaving the same as: ($cond ? split(" ", $string) : split(/\s+/, $string)) which isn't very convenient. This patch changes this by adding a new flag to the op_pmflags, PMf_SPLIT which enables pp_regcomp() to know whether it was called as part of split, which allows the RXf_SPLIT to be passed into run time regex compilation. We also preserve the original flags so pattern caching works properly, by adding a new property to the regexp structure, "compflags", and related macros for accessing it. We preserve the original flags passed into the compilation process, so we can compare when we are trying to decide if we need to recompile. Note that this essentially the opposite fix from the one applied originally to fix #94490 in 5255171e6cd0accee6f76ea2980e32b3b5b8e171. The reverted patch was meant to make: split( 0 || " ", $thing ) #1 consistent with my $x=0; split( $x || " ", $thing ) #2 and not with split( " ", $thing ) #3 This was reverted because it broke C<split("\x{20}", $thing)>, and because one might argue that is not that #1 does the wrong thing, but rather that the behavior of #2 that is wrong. In other words we might expect that all three should behave the same as #3, and that instead of "fixing" the behavior of #1 to be like #2, we should really fix the behavior of #2 to behave like #3. (Which is what we did.) Also, it doesn't make sense to move the special case detection logic further from the regex engine. We really want the regex engine to decide this stuff itself, otherwise split " ", ... wouldn't work properly with an alternate engine. (Imagine we add a special regexp meta pattern that behaves the same as " " does in a split /.../. For instance we might make split /(*SPLITWHITE)/ trigger the same behavior as split " ". The other major change as result of this patch is it effectively reverts commit cccd1425414e6518c1fc8b7bcaccfb119320c513, which was intended to get rid of RXf_SPLIT and RXf_SKIPWHITE, which and free up bits in the regex flags structure. But we dont want to get rid of these vars, and it turns out that RXf_SEEN_LOOKBEHIND is used only in the same situation as the new RXf_MODIFIES_VARS. So I have renamed RXf_SEEN_LOOKBEHIND to RXf_NO_INPLACE_SUBST, and then instead of using two vars we use only the one. Which in turn allows RXf_SPLIT and RXf_SKIPWHITE to have their bits back.
* perlreapi.pod: Consistent spaces after dotsFather Chrysostomos2012-10-111-57/+63
|
* perlreapi.pod: Document RXf_MODIFIES_VARSFather Chrysostomos2012-10-111-0/+6
|
* perlreapi.pod: Update RXf_SKIPWHITE sectionFather Chrysostomos2012-10-111-0/+3
|
* perlreapi.pod: Update RXf_SPLIT sectionFather Chrysostomos2012-10-111-0/+4
|
* perlreapi.pod: grammar and other nitsKarl Williamson2012-10-091-72/+72
|
* regcomp.c: min len is chars, not bytesKarl Williamson2012-10-091-5/+7
| | | | | | | | The traditionally-called tricky folds occur because, under /i, a 6-byte/3-character sequence can match a 2-byte/1-character sequence. The code here has assumed that the delta quantity is measured in bytes (6-2=4), whereas everywhere else (AFAICT), assumes the measure is to be in characters (3-2=1).
* perlreapi.pod: Reflow verbatim lines to 79 cols.Karl Williamson2012-09-261-24/+54
|
* update docs for $`, $&, $' changesDavid Mitchell2012-09-121-3/+14
| | | | | mention that they're now detected individually, and mention in reapi the new RX_BUFF_IDX_* symbolic constants.
* Don't copy all of the match string bufferDavid Mitchell2012-09-081-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a pattern matches, and that pattern contains captures (or $`, $&, $' or /p are present), a copy is made of the whole original string, so that $1 et al continue to hold the correct value even if the original string is subsequently modified. This can have severe performance penalties; for example, this code causes a 1Mb buffer to be allocated, copied and freed a million times: $&; $x = 'x' x 1_000_000; 1 while $x =~ /(.)/g; This commit changes this so that, where possible, only the needed substring of the original string is copied: in the above case, only a 1-byte buffer is copied each time. Also, it now reuses or reallocs the buffer, rather than freeing and mallocing each time. Now that PL_sawampersand is a 3-bit flag indicating separately whether $`, $& and $' have been seen, they each contribute only their own individual penalty; which ones have been seen will limit the extent to which we can avoid copying the whole buffer. Note that the above code *without* the $& is not currently slow, but only because the copying is artificially disabled to avoid the performance hit. The next but one commit will remove that hack, meaning that it will still be fast, but will now be correct in the presence of a modified original string. We achieve this by by adding suboffset and subcoffset fields to the existing subbeg and sublen fields of a regex, to indicate how many bytes and characters have been skipped from the logical start of the string till the physical start of the buffer. To avoid copying stuff at the end, we just reduce sublen. For example, in this: "abcdefgh" =~ /(c)d/ subbeg points to a malloced buffer containing "c\0"; sublen == 1, and suboffset == 2 (as does subcoffset). while if $& has been seen, subbeg points to a malloced buffer containing "cd\0"; sublen == 2, and suboffset == 2. If in addition $' has been seen, then subbeg points to a malloced buffer containing "cdefgh\0"; sublen == 6, and suboffset == 2. The regex engine won't do this by default; there are two new flag bits, REXEC_COPY_SKIP_PRE and REXEC_COPY_SKIP_POST, which in conjunction with REXEC_COPY_STR, request that the engine skip the start or end of the buffer (it will still copy in the presence of the relevant $`, $&, $', /p). Only pp_match has been enhanced to use these extra flags; substitution can't easily benefit, since the usual action of s///g is to copy the whole string first time round, then perform subsequent matching iterations against the copy, without further copying. So you still need to copy most of the buffer.
* document args to regexec_flags and APIDavid Mitchell2012-09-081-1/+43
| | | | | | | | Document in the API, and clarify in the source code, what the arguments to Perl_regexec_flags are. NB: this info is based on code inspection, not any real knowledge on my part.
* Remove RXf_UTF8 from perlreapiDavid Mitchell2012-06-151-9/+0
| | | | This flag was removed 4 years ago by 8f6ae13c. Update docs to match.
* perlreapi: fix documentation on last(close)?parenDavid Mitchell2012-06-131-2/+2
| | | | | lastparen was described as being last open paren; it's actually highest close paren. Also make it clear these correspond to $+ and $^N
* add op_comp field to regexp_engine APIDavid Mitchell2012-06-131-0/+7
| | | | | | | | | | | | | | | | | | | | Perl's internal function for compiling regexes that knows about code blocks, Perl_re_op_compile, isn't part of the engine API. However, the way that regcomp.c is dual-lifed as ext/re/re_comp.c with debugging compiled in, means that Perl_re_op_compile is also compiled as my_re_op_compile. These days days the mechanism to choose whether to call the main functions or the debugging my_* functions when 'use re debug' is in scope, is the re engine API jump table. Ergo, to ensure that my_re_op_compile gets called appropriately, this method needs adding to the jump table. So, I've added it, but documented as 'for perl internal use only, set to null in your engine'. I've also updated current_re_engine() to always return a pointer to a jump table, even if we're using the internal engine (formerly it returned null). This then allows us to use the simple condition (eng->op_comp) to determine whether the current engine supports code blocks.
* [RT #36079] Convert ` to '.jkeenan2011-11-221-1/+1
|
* More typo fixes in pod/perl*.pod filesKeith Thompson2011-07-311-1/+1
|
* perlreapi: nitsKarl Williamson2011-05-181-4/+6
|
* perlreapi: Update as little as possible for 5.14Karl Williamson2011-04-121-9/+14
| | | | This keeps the docs at parity with earlier Perls.
* Fix bad pod links found by Test::Pod::LinkCheckApocalypse2011-02-151-1/+1
|
* Fix typos in pod/*Peter J. Acklam) (via RT2011-01-071-1/+1
| | | | | | | # New Ticket Created by (Peter J. Acklam) # Please include the string: [perl #81906] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=81906 >
* Change comments, documentation for new (?^...)Karl Williamson2010-09-221-1/+1
| | | | I overlooked these earlier in adding the caret notation.
* Standardize on use of 'capture group' over 'buffer'Karl Williamson2010-06-281-4/+4
| | | | | | Both terms 'capture group' and 'capture buffer' are used in the documentation. This patch changes most uses of the latter to the former, as they are referenced using "\g".
* Fix POD: C<...->...> => C<< ...-> ... >>Frank Wiegand2009-11-191-1/+1
| | | | Signed-off-by: Abigail <abigail@abigail.be>
* much better swap logic to support reentrancy and fix assert failureGeorge Greer2009-07-261-1/+1
| | | | | | | | | | | Commit c74340f9 added backreferences as well as the idea of a ->swap regex pointer to keep track of the match offsets in case of backtracking. The problem is that when Perl re-enters the regex engine to handle utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture from the utf8 (Perl) code could inadvertently modify the regex match data that caused the utf8 swash to get built. This change should close out RT #60508
* Re: [PATCH] POD fixesVincent Pit2008-02-251-1/+1
| | | | | Message-ID: <47BFFDCB.60107@profvince.com> p4raw-id: //depot/perl@33366
* Doc nits -- avoid bare "5.10" version numbers without aRafael Garcia-Suarez2007-11-271-1/+1
| | | | | third component. (Suggested by Jarkko) p4raw-id: //depot/perl@32523
* Fix a bunch of typosRafael Garcia-Suarez2007-08-091-14/+16
| | | p4raw-id: //depot/perl@31694
* Optimize split //Ævar Arnfjörð Bjarmason2007-08-091-0/+10
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80708090049p2cf4810ep5a437ad53f64fa78@mail.gmail.com> p4raw-id: //depot/perl@31693
* Small pod fixRafael Garcia-Suarez2007-06-291-1/+1
| | | p4raw-id: //depot/perl@31499
* Rename various regex defined so that they have distinct prefixes based on ↵Yves Orton2007-06-281-14/+14
| | | | | | | | | | | | | | their usage. RXf_ => flags used in pm_flags argument to regcomp and stored in the regex via rx->extflags PREGf_ => flags stored in rx->intflags RXapif_ => argument flags for regex named capture api RX_BUFF_IDX_ => special indexes to represent $` $' $& used in the numeric capture buffer api PREGf is untouched by this change, but RXf_ is split into RXapif and RX_BUFF_IDX_. p4raw-id: //depot/perl@31497
* Move the RXf_WHITE logic for split " " into the regex engineÆvar Arnfjörð Bjarmason2007-06-281-17/+21
| | | | | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80706281306i4dbba39em3eeb8da1d67ea27c@mail.gmail.com> (with tweaks) p4raw-id: //depot/perl@31495
* perlreapi.pod documentation for flags & cleanupÆvar Arnfjörð Bjarmason2007-06-181-70/+95
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80706171109r37c294c4h78a51083c3b851ba@mail.gmail.com> p4raw-id: //depot/perl@31411
* SvRX() and SvRXOK() macrosÆvar Arnfjörð Bjarmason2007-06-181-14/+4
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80706172033h1908aa0ge15698204e0b79ed@mail.gmail.com> p4raw-id: //depot/perl@31409
* Re: [PATCH] Callbacks for named captures (%+ and %-)Ævar Arnfjörð Bjarmason2007-06-061-49/+116
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80706031324y5618d519p460da27a2e7fe712@mail.gmail.com> p4raw-id: //depot/perl@31341
* Minor perlreapi.pod cleanupÆvar Arnfjörð Bjarmason2007-05-201-41/+31
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80705160938w13789b63m6d5f4710441ceac@mail.gmail.com> p4raw-id: //depot/perl@31244
* FETCH/STORE/LENGTH callbacks for numbered capture variablesÆvar Arnfjörð Bjarmason2007-05-031-22/+100
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80705011658g1156e14cw4d2b21a8d772ed41@mail.gmail.com> p4raw-id: //depot/perl@31130
* Re: [PATCH] Cleanup of the regexp APIÆvar Arnfjörð Bjarmason2007-04-301-35/+53
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80704261922j3db0615wa86ccc4cb65b2713@mail.gmail.com> p4raw-id: //depot/perl@31106
* Re: [PATCH (incomplete)] Make regcomp use SV* sv, instead of char* exp, ↵Ævar Arnfjörð Bjarmason2007-04-231-7/+24
| | | | | | | char* xend Message-ID: <51dd1af80704211430m6ad1b4afy49b069faa61e33a9@mail.gmail.com> p4raw-id: //depot/perl@31027
* Add the perlreapi man page, by Ævar Arnfjörð BjarmasonRafael Garcia-Suarez2007-04-121-0/+498
(largely from perlreguts) p4raw-id: //depot/perl@30922