delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	One more spelling correction from Ville Skyttä.	James E Keenan	2018-04-08	1	-1/+1
\| \| \| \| \|	The original patch had a problem during 'git apply', so the committer chopped it up and applied all parts except this in commit f0d9624a416d3eb926048f8054b82304fba159.
*	Change name of regnode for clarity	Karl Williamson	2018-02-16	1	-148/+153
\| \| \| \| \| \| \|	The EXACTFA nodes are in fact not generated by /a, but by /aa. Change the name to EXACTFAA to correspond. I found myself getting confused by this.
*	recomp.sym: Add ANYOFM regnode	Karl Williamson	2018-01-30	1	-0/+2
\| \| \| \| \|	This uses a mask instead of a bitmap, and is restricted to representing invariant characters under UTF-8 that meet particular bit patterns.
*	expand documentation of $DB::sub	Zefram	2018-01-17	1	-7/+16
\|
*	regcomp.sym: Add regnodes for [[:ascii:]]	Karl Williamson	2017-12-29	1	-0/+3
\| \| \| \|	These will be used in a future commit
*	regcomp.sym: Add nodes for script runs	Karl Williamson	2017-12-24	1	-0/+2
\| \| \| \|	To be used in the implementation thereof.
*	regcomp.sym: Clarify regnode comment	Karl Williamson	2017-12-16	1	-1/+1
\|
*	clarify conditions for calling DB::sub	Zefram	2017-12-05	1	-2/+2
\| \| \| \| \|	The wording was ambiguous about which subroutine's compilation matters. Fixes [perl #131672].
*	Unify GOSTART and GOSUB	Yves Orton	2016-03-06	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GOSTART is a special case of GOSUB, we can remove a lot of offset twiddling, and other special casing by unifying them, at pretty much no cost. GOSUB has 2 arguments, ARG() and ARG2L(), which are interpreted as a U32 and an I32 respectively. ARG() holds the "parno" we will recurse into. ARG2L() holds a signed offset to the relevant start node for the recursion. Prior to this patch the argument to GOSUB would always be >=, and unlike other parts of our logic we would not use 0 to represent "start/end" of pattern, as GOSTART would be used for "recurse to beginning of pattern", after this patch we use 0 to represent "start/end", and a lot of complexity "goes away" along with GOSTART regops.
*	fix perl #126186 make all verbs allow an optional arg	Yves Orton	2015-10-05	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	In perl #126186 it was pointed out we had started allowing name arguments for verbs where we did not document them to be supported, albeit in an inconsistent way. The previous patch cleaned up some of the cause of this, but it seems better to just generally allow the existing verbs to all support a mark name argument. So this patch reverses the effect of the previous patch, and makes all verbs, FAIL, ACCEPT, etc, allow an optional argument, and set REGERROR/REGMARK appropriately as well.
*	Start fixing some pod pedantic errors	Karl Williamson	2015-09-03	1	-111/+111
\| \| \| \|	This fixes a bunch of them, but there are many more
*	Add ANYOFD regex node	Karl Williamson	2015-08-24	1	-0/+1
\| \| \| \| \|	This is like an ANYOF node, but just for when /d is in effect. It will be used in future commits
*	perldebguts: Add clarification	Karl Williamson	2015-08-24	1	-1/+2
\|
*	remove deprecated /\C/ RE character class	David Mitchell	2015-06-19	1	-1/+0
\| \| \| \| \| \|	This horrible thing broke encapsulation and was as buggy as a very buggy thing. It's been officially deprecated since 5.20.0 and now it can finally die die die!!!!
*	regcomp.sym: Update \b descriptions	Karl Williamson	2015-03-18	1	-15/+13
\|
*	Add qr/\b{gcb}/	Karl Williamson	2015-02-19	1	-11/+15
\| \| \| \| \| \| \| \| \| \| \|	A function implements seeing if the space between any two characters is a grapheme cluster break. Afer I wrote this, I realized that an array lookup might be a better implementation, but the deadline for v5.22 was too close to change it. I did see that my gcc optimized it down to an array lookup. This makes the implementation of \X go from being complicated to trivial.
*	Add regex nodes for locale	Karl Williamson	2014-12-29	1	-0/+6
\| \| \| \| \|	These will be used in a future commit to distinguish between /l patterns vs non-/l.
*	Eliminate unused BACK regnode	Aaron Crane	2014-09-29	1	-8/+1
\|
*	Up regex flags limit for (??{})	Karl Williamson	2014-09-29	1	-1/+2
\| \| \| \| \| \| \| \| \|	Previously the regex pattern compilation flags needed for this construct would fit into an 8-bit byte. This conveniently fits into the flags structure element of a regnode. There are changes coming that require more than 8 bits, so in preparation, this commit adds an argument to the node that implements (??{}) (31-bits usable for flags), and moves the storage to that.
*	regcomp.sym: ANYOF nodes have an argument	Karl Williamson	2014-09-29	1	-1/+1
\| \| \| \| \| \|	Plus a bitmap, but they always have an argument besides, contrary to what was specified here. Future commits rely on this, whereas heretofore this error was harmless.
*	Eliminate the duplicative regops BOL and EOL	Yves Orton	2014-09-17	1	-16/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See also perl5porters thread titled: "Perl MBOLism in regex engine" In the perl 5.000 release (a0d0e21ea6ea90a22318550944fe6cb09ae10cda) the BOL regop was split into two behaviours MBOL and SBOL, with SBOL and BOL behaving identically. Similarly the EOL regop was split into two behaviors SEOL and MEOL, with EOL and SEOL behaving identically. This then resulted in various duplicative code related to flags and case statements in various parts of the regex engine. It appears that perhaps BOL and EOL were kept because they are the type ("regkind") for SBOL/MBOL and SEOL/MEOL/EOS. Reworking regcomp.pl to handle aliases for the type data so that SBOL/MBOL are of type BOL, even though BOL == SBOL seems to cover that case without adding to the confusion. This means two regops, a regstate, and an internal regex flag can be removed (and used for other things), and various logic relating to them can be removed. For the uninitiated, SBOL is /^/ and /\A/ (with or without /m) and MBOL is /^/m. (I consider it a fail we have no way to say MBOL without the /m modifier). Similarly SEOL is /$/ and MEOL is /$/m (there is also a /\z/ which is EOS "end of string" with or without the /m).
*	Change 'semantics' to 'rules'	Karl Williamson	2014-02-20	1	-14/+14
\| \| \| \| \| \|	The term 'semantics' in documentation when applied to character sets is changed to 'rules' as being a shorter less-jargony synonym in this case. This was discussed several releases ago, but I didn't get around to it.
*	Revert "Free up bit for regex ANYOF nodes"	Karl Williamson	2014-02-15	1	-190/+137
\| \| \| \| \|	This reverts commit 34fdef848b1687b91892ba55e9e0c3430e0770f6, and adds comments referring to it, in case it is ever needed.
*	Free up bit for regex ANYOF nodes	Karl Williamson	2014-02-15	1	-137/+190
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit frees up a bit by using an extra regnode to pass the information to the regex engine instead of the flag. I originally thought that if this was needed, it should be the ANYOF_ABOVE_LATIN1_ALL bit, as that might speed some things up. But if we need to do this again by adding another node to get another bit, we want one that is mutually exclusive of the first one we did, For otherwise we start having to make 3 nodes instead of two to get the combinations: 1 0 0 1 1 1 This combinatorial problem is avoided by using bits that are mutually exclusive, which the ABOVE_LATIN1_ALL isn't, but the one freed by this commit ANYOF_NON_UTF8_NON_ASCII_ALL is only set under /d matching, and there are other bits that are set only under /l, so if we need to do this again, we should use one of those. I wrote this code when I thought I really needed a bit. But since, I have figured out a better way to get the bit needed now. But I don't want to lose this code to posterity, so this commit is being made long enough to get the commit number, then it will be reverted, adding comments referring to the commit number, so that it can easily be reconstructed when necessary.
*	pod/perldebguts: Stress ephemeral nature of regnode types	Karl Williamson	2014-02-12	1	-1/+1
\|
*	Use bit instead of node for regex SSC	Karl Williamson	2014-01-22	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The flag bits in regular expression ANYOF nodes are perennially in short supply. However there are still plenty of regex nodes possible. So one solution to needing to pass more information is to create a node that encapsulates what is needed. That is what commit 9aa1e39f96ac28f6ce5d814d9a1eccf1464aba4a did to tell regexec.c that a particular ANYOF node is for the synthetic start class (SSC). However this solution introduces other issues. If you have to express two things, then you need a regnode for A, a regnode for B, a regnode for both A and B, and another regnode for both not A nor B; With three things, you need 8 regnodes to express all possible combinations. This becomes unwieldy to write code for. The number of combinations goes way down if some of them are mutually exclusive. At the time of that commit, I thought that a SSC need not ever warn if matching against an above-Unicode code point. I was wrong, and that has been corrected earlier in the 5.19 series. But it finally came to me how to tell regexec that an ANYOF node is for the SSC without taking up a flag bit and without requiring a regnode type. The 'next_off' field in a regnode tells the engine the offeset in the regex program to the node it's supposed to go to after processing this one. Since the SSC stands alone, its 'next_off' field is unused, and we can put anything we want in it. That, however, is not true of other ANYOF regnodes. But it turns out that there are certain values that will never be legitimate in the 'next_off' field in these, and so this commit uses one of those to signal that this ANYOF field is an SSC. regnodes come in various sizes, and the offset is in terms of how many of the smallest ones are there to the next node to look at. Since ANYOF nodes are large, the offset is always > 1, and so this commit uses 1 to indicate an SSC.
*	Convert regnode to a flag for [...]	Karl Williamson	2013-12-31	1	-144/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this commit, there were 3 types of ANYOF nodes; now there are two: regular, and one for the synthetic start class (ssc). This commit converted the third type dealing with warning about matching \p{} against non-Unicode code points, into using the spare flag bit for ANYOF nodes. This allows this bit to apply to ssc ANYOF nodes, whereas previously it couldn't. There is a bug in which the warning isn't raised if the match is rejected by the optimizer, because of this inability. This bug will be fixed in a later commit. Another option would have been to create a new node-type which was an ANYOF_SSC_WARN_SUPER node. But this adds extra complications to things; and we have a spare bit that we might as well use. The comments give better possibilities for freeing up 2 bits should they be needed.
*	briefly document DB::lsub	Tony Cook	2013-09-16	1	-0/+4
\|
*	test and briefly document DB::goto	Tony Cook	2013-09-16	1	-0/+7
\|
*	Allow trie use for /iaa matching	Karl Williamson	2013-08-29	1	-0/+3
\| \| \| \| \| \| \| \| \|	This adds code so that tries can be formed under /iaa, which formerly weren't handled. A problem occurs when the string contains the LATIN SMALL LETTER SHARP S when the regex pattern is not UTF-8 encoded. I tried several ways to get this to work easily, but ended up deciding it was too hard, to in this one situation, a new regnode is created to prevent the trie code from even trying to turn it into a trie.
*	Remove newly unnecessary regnode, code	Karl Williamson	2013-08-29	1	-149/+141
\| \| \| \| \|	The previous commit fixed things up so that this work-around regnode doesn't have to exist; nor the work around for the EXACTFU_SS regnode
*	Add new regnode for synthetic start class	Karl Williamson	2012-12-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This creates a regnode specifically for the synthetic start class, which is a type of ANYOF node. The flag bit previously used to denote this is removed. This paves the way for this bit to be freed up, but first the other use of this bit must also be removed, which will be done in the next commit. There are now three ANYOF-type regnodes. This one should be called only in one place in regexec.c. The other special one is ANYOF_WARN_SUPER. A synthetic start class node should not do any warning, so there is no issue of having something need to be both types.
*	Free up regex ANYOF bit.	Karl Williamson	2012-12-28	1	-0/+3
\| \| \| \| \| \| \|	This uses a regnode type, of which we have many available, to free up a bit in the ANYOF regnode flag field, of which we have none, and are trying to have the same bit do double duty. This will enable us to remove some of that double duty in the next commit.
*	Regenerate the regnode table in perldebguts.pod automatically	Father Chrysostomos	2012-12-22	1	-160/+178
\|
*	Fix up runtime regex codeblocks.	David Mitchell	2012-06-13	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous commits in this branch have brought literal code blocks into the New World Order; now do the same for runtime blocks, i.e. those needing "use re 'eval'". The main user-visible changes from this commit are that: * the code is now fully parsed, rather than needing balanced {}'s; i.e. this now works: my $code = q[ (?{ $a = '{' }) ]; use re 'eval'; /$code/ * warnings and errors are now reported as coming from "(eval NNN)" rather than "(re_eval NNN)" (although see the next commit for some fixups to that). Indeed, the string "re_eval" has been expunged from the source and documentation. The big internal difference is that the sv_compile_2op() and sv_compile_2op_is_broken() functions are no longer used, and will be removed shorty. It works by the regex compiler detecting the presence of run-time code blocks, and feeding the whole pattern string back into the parser (where the run-time blocks are now seen as compile-time), then extracting out any compiled code blocks and adding them to the mix. For example, in the following: $c = '(?{"runtime"})d'; use re 'eval'; /a(?{"literal"})\b'$c/ At the point the regex compiler is called, the perl parser will already have compiled the literal code block and presented it to the regex engine. The engine examines the pattern string, sees two '(?{', but only one accounted for by the parser, and so constructs a short string to be evalled: based on the pattern, but with literal code-blocks blanked out, and \ and ' escaped. In the above example, the pattern string is a(?{"literal"})\b'(?{"runtime"})d and we call eval_sv() with an SV containing the text qr'a \\b\'(?{"runtime"})d' The returned qr will contain the new code-block (and associated CV and pad) which can be extracted and added to the list of compiled code blocks of the original pattern. Note that with this scheme, the requirement for "use re 'eval'" is easily determined, and no longer requires all the pp_regcreset / PL_reginterp_cnt machinery, which will be removed shortly. Two subtleties of this scheme are that normally, \\ isn't collapsed into \ for literal regexes (unlike literal strings), and hints aren't inherited when using eval_sv(). We get round both of these by adding and setting a new flag, PL_reg_state.re_reparsing, which indicates that we are refeeding a pattern into the perl parser.
*	[RT #36079] Convert ` to '.	jkeenan	2011-11-22	1	-8/+8
\|
*	perldebguts: Update regnodes to 5.14	Karl Williamson	2011-03-24	1	-109/+230
\| \| \| \| \|	This hadn't been updated for quite some time. It just takes what is in regcomp.sym, and removes some columns, and some reformatting.
*	Fix bad pod links found by Test::Pod::LinkCheck	Apocalypse	2011-02-15	1	-1/+1
\|
*	perldebguts tweaks	Father Chrysostomos	2011-02-11	1	-18/+18
\|
*	Update: re pragma is lexically scoped since Perl 5.9.5.	Nick Johnston	2010-06-28	1	-1/+2
\|
*	Use POD references to documentation when possible.	Tom Hukins	2010-03-12	1	-1/+1
\|
*	POD markup fix	Gisle Aas	2008-11-12	1	-0/+1
\| \| \|	p4raw-id: //depot/perl@34822
*	Several POD fixes by Jonathan Stowe	Rafael Garcia-Suarez	2007-05-28	1	-4/+4
\| \| \|	p4raw-id: //depot/perl@31294
*	Fix a broken link and a meaningless phrase in perldebguts	Rafael Garcia-Suarez	2007-03-22	1	-3/+2
\| \| \|	p4raw-id: //depot/perl@30686
*	Hack out -DL documentation from perldebuguts.pod	Steve Hay	2005-07-12	1	-152/+2
\| \| \| \| \| \|	Now that the perl core uses Newx() rather than New() this chunk of old documentation is more obsolete than ever before. p4raw-id: //depot/perl@25118
*	Some updates for the memory use debugging section:	Jarkko Hietaniemi	2003-08-09	1	-6/+17
\| \| \| \| \|	-DL is obsolete, mention Devel::Size, and Purify and valgrind. p4raw-id: //depot/perl@20579
*	Whitespace tweaks.	Jarkko Hietaniemi	2002-03-20	1	-5/+5
\| \| \|	p4raw-id: //depot/perl@15351
*	consistent commands for perl5db.pl etc.	Richard Foley	2002-02-25	1	-3/+3
\| \| \| \| \|	Message-ID: <16fJgP-1mbVeSC@fwd04.sul.t-online.com> p4raw-id: //depot/perl@14865
*	Re: perldebguts minor tweaks	Joe McMahon	2002-01-25	1	-1/+1
\| \| \| \| \|	Message-ID: <Pine.LNX.4.33.0201251031530.9326-100000@tribal.metalab.unc.edu> p4raw-id: //depot/perl@14415
*	perldebguts minor tweaks	Joe McMahon	2002-01-24	1	-51/+97
\| \| \| \| \|	Message-ID: <Pine.LNX.4.33.0201241646580.14744-100000@tribal.metalab.unc.edu> p4raw-id: //depot/perl@14409