delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	null ptr deref in Perl_cv_forget_slab	David Mitchell	2015-05-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RT #124385 Parsing following a syntax error could result in a null ptr dereference. This commit contains a band-aid that returns from Perl_cv_forget_slab() if the cv arg is null; but the real issue is much deeper and needs a more general fix at some point. Basically, both the lexer and the parser use the save stack, and after an error, they can get out of sync. In particular: 1) when handling a double-quoted string, the lexer does an ENTER, saves most of its current state on the save stack, then uses the literal string as the toke source. When it reaches the end of the string, it LEAVEs, restores the lexer state and continues with the main source. 2) Whenever the parser starts a new block or sub scope, it remembers the current save stack position, and at end of scope, pops the save stack back to that position. In something like "@{ sub {]}} }}}" the lexer sees a double-quoted string, and saves the current lex state. The parser sees the start of a sub, and saves PL_compcv etc. Then a parse error occurs. The parser goes into error recovery, discarding tokens until it can return to a sane state. The lexer runs out of tokens when toking the string, does a LEAVE, and switches back to toking the main source. This LEAVE restores both the lexer's and the parser's state; in particular the parser gets its old PL_compcv restored, even though the parser hasn't finished compiling the current sub. Later, series of '}' tokens coming through allows the parser to finish the sub. Since PL_error_count > 0, it discards the just-compiled sub and sets PL_compcv to null. Normally the LEAVE_SCOPE done just after this would restore PL_compcv to its old value (e.g. PL_main_cv) but the stack has already been popped, so PL_compcv gets left null, and SEGVs occur. The two main ways I can think of fixing this in the long term are 1) avoid the lexer using the save stack for long-term state storage; in particular, make S_sublex_push() malloc a new parser object rather than saving the current lexer state on the save stack. 2) At the end of a sublex, if PL_error_count > 0, don't try to restore state and continue, instead just croak. N.B. the test that this commit adds to lex.t doesn't actually trigger the SEGV, since the bad code is wrapped in an eval which (for reasons I haven't researched) avoids the SEGV.
*	op_parent(): only exist under -DPERL_OP_PARENT	David Mitchell	2015-04-19	1	-0/+2
\| \| \| \| \| \|	Make the function Perl_op_parent() only be present in perls built with -DPERL_OP_PARENT. Previously the function was present in all builds, but always returned NULL on non PERL_OP_PARENT builds.
*	op_sibling_splice(): allow NULL parent arg	David Mitchell	2015-04-19	1	-1/+1
\| \| \| \| \| \| \|	If the splicing doesn't affect the first or last sibling of an op_sibling chain, then we don't need access to the parent op of the siblings (to access/update op_first, op_last, OPf_KIDS etc). So allow an NULL parent arg in that case.
*	fix some minor compiler warnings	David Mitchell	2015-04-18	1	-1/+1
\| \| \| \| \| \| \|	S_deb_curcv's first param differed in constness between declaration and definition. GIMME_V can return an I32, so don't assign it to a U8.
*	Change /(?[...]) to have normal operator precedence	Karl Williamson	2015-03-19	1	-0/+1
\| \| \| \| \|	This experimental feature now has the intersection operator ("&") higher precedence than the other binary operators.
*	Fix qr'\N{U+41}' on EBCDIC platforms	Karl Williamson	2015-03-18	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \|	Prior to this commit, the regex compiler was relying on the lexer to do the translation from Unicode to native for \N{...} constructs, where it was simpler to do. However, when the pattern is a single-quoted string, it is passed unchanged to the regex compiler, and did not work. Fixing it required some refactoring, though it led to a clean API in a static function. This was spotted by Father Chrysostomos.
*	Remove PL_ prefix for recently added non-globals	Karl Williamson	2015-03-17	1	-10/+10
\| \| \| \| \| \| \|	PL is reserved for global variables. These are enums and static variable names introduced for handling /\b{...}/ See <20150311150610.GN28599@iabyn.com> and follow up.
*	Perl_multideref_stringify: don't SEGV on null cv	David Mitchell	2015-03-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function is called by e.g. "perl -Dt" to display the multideref op: $ perl -Dt -e'$a->{foo}[1]' ... (-e:1) multideref($a->{"foo"}[1]) On threaded builds, it needs to know the correct pad (and so the correct cv too) so that it can access GVs and const SVs that have been moved to the pad. However with a sort code block (rather than a sort sub), S_deb_curcv() returns null, so multideref_stringify() is called with a null CV. This then SEGVs. Although ideally S_deb_curcv() should be fixed, a function like multideref_stringify(), which can be used for debugging, should be robust in unexpected circumstances. So this commit makes it safe (although not particularly useful) with a null CV: $ perl -Dt -e'@a = sort { $a->[$i] <=> $b->[$i] } [0], [1]' ... (-e:1) sort (-e:1) multideref(<NULLGV>->[<NULLGV>]) (-e:1) multideref(<NULLGV>->[<NULLGV>])
*	don't test non-null args	David Mitchell	2015-03-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For lots of core functions: if a function parameter has been declared NN in embed.fnc, don't test for nullness at the start of the function, i.e. eliminate code like if (!foo) ... On debugging builds the test is redundant, as the PERL_ARGS_ASSERT_FOO at the start of the function will already have croaked. On optimised builds, it will skip the check (and so be slightly faster), but if actually passed a null arg, will now crash with a null-deref SEGV rather than doing whatever the check used to do (e.g. croak, or silently return and let the caller's code logic to go awry). But hopefully this should never happen as such instances will already have been detected on debugging builds. It also has the advantage of shutting up recent clangs which spew forth lots of stuff like: sv.c:6308:10: warning: nonnull parameter 'bigstr' will evaluate to 'true' on first encounter [-Wpointer-bool-conversion] if (!bigstr) The only exception was in dump.c, where rather than skipping the null test, I instead changed the function def in embed.fnc to allow a null arg, on the basis that dump functions are often used for debugging (where pointers may unexpectedly become NULL) and it's better there to display that this item is null than to SEGV. See the p5p thread starting at 20150224112829.GG28599@iabyn.com.
*	grok_atoUV: don't make part of API	Hugo van der Sanden	2015-03-09	1	-1/+1
\| \| \| \|	.. but keep available to extensions.
*	[perl #123814] replace grok_atou with grok_atoUV	Hugo van der Sanden	2015-03-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Some questions and loose ends: XXX gv.c:S_gv_magicalize - why are we using SSize_t for paren? XXX mg.c:Perl_magic_set - need appopriate error handling for $) XXX regcomp.c:S_reg - need to check if we do the right thing if parno was not grokked Perl_get_debug_opts should probably return something unsigned; not sure if that's something we can change.
*	embed.fnc: Change _warn_problematic_locale() function from public	Karl Williamson	2015-03-09	1	-1/+1
\| \| \| \|	Spotted by Daniel Dragan
*	[perl #123202] speed up scalar //g against tainted strings	Tony Cook	2015-02-26	1	-0/+1
\|
*	pp_pack.c: Silence compiler warning	Karl Williamson	2015-02-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This was introduced by 9df874cdaa2f196cc11fbd7b82a85690c243eb9f in changing the name of some static functions. I didn't realize at the time that the function was defined in embed.fnc, as none of the others are, and it was always called with the S_ prefix form. Nor did I notice the compiler warnings. It turns out that the base name of this function is the same as a public function, so I've renamed it to have prefix 'S_my_'.
*	Add \b{sb}	Karl Williamson	2015-02-19	1	-3/+15
\|
*	Add qr/\b{wb}/	Karl Williamson	2015-02-19	1	-0/+14
\|
*	Add qr/\b{gcb}/	Karl Williamson	2015-02-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	A function implements seeing if the space between any two characters is a grapheme cluster break. Afer I wrote this, I realized that an array lookup might be a better implementation, but the deadline for v5.22 was too close to change it. I did see that my gcc optimized it down to an array lookup. This makes the implementation of \X go from being complicated to trivial.
*	infnan: revert nan payload/signaling changes	Jarkko Hietaniemi	2015-02-11	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	4258cf903c752ec19a3aeee9b93020533d923e1a 91e945c051cfcdf499d5b43aa5ac0a5681cdd595 eb254f2672a985ec3c34810f624f36c18fc35fc7 c9a671b17a9c588469bcef958038daaaaf9cc88b 99fcdd4df47515fb0a62a046e622adec0871754d ba511db061a88439acb528a66c780ab574bb4fb0 0d1cf11425608e9be019f27a3a4575bc71c49e6b c2ea8a88f8537d00ba25ec8feb63ef5dc085ef2b b5a6eedc2f49a90089cca896ee20f41e373fb4c9 30419b527d2c5a06cefe2db9183f59e2697c47fc 29b62199cd4c359dfc6b9d690341de40d105ca5f be181dc9d91c84a2fe03912c993c8259fed92641 4de1bcfe1abdaba0a5da394ddea0cc6fd7e36c7b 6e915616c4ccb4f6cc3122c5d395765db96c0a2d b2e3501558a1017eb529be0915c25d31671e7869 bfaa02d55f4ace1571e6fa9e5b47d5e3ac3cecc6 569f27e562618bdddcf4a9fc71612283a73747e9 4f89311dc8de87ddc9a302c6f2d2c844951bbd28 a307a0b0d83c509cc2adaad8cebb44260294bf36 6640aa2c3b93d7ac78e4e86983fe5948b3ca55f2 b74dc0b3c96390d8bf83d8c3ffc0c2c2d1f0a5d3 c3a8e5a5b4bb89a15de642c023dfd5cbc4678938
*	infnan: store the nan payload error in an optional SV	Jarkko Hietaniemi	2015-02-11	1	-3/+3
\|
*	infnan: API context juggling	Jarkko Hietaniemi	2015-02-08	1	-3/+3
\|
*	infnan: grok_number* setting the infnan NV directly	Jarkko Hietaniemi	2015-02-08	1	-1/+2
\|
*	infnan: add grok_nan and grok_nan_payload	Jarkko Hietaniemi	2015-02-08	1	-0/+2
\|
*	infnan: add nan_payload_set	Jarkko Hietaniemi	2015-02-08	1	-0/+1
\|
*	infnan: add nan_is_signaling	Jarkko Hietaniemi	2015-02-08	1	-0/+1
\|
*	infnan: add nan_signaling_set	Jarkko Hietaniemi	2015-02-08	1	-0/+1
\|
*	infnan: add nan_hibyte	Jarkko Hietaniemi	2015-02-08	1	-0/+1
\|
*	Remove context param from sv_get_backrefs	Father Chrysostomos	2015-01-31	1	-1/+1
\| \| \| \| \|	v5.21.7-83-geaab564 added sv_get_backrefs. v5.21.7-90-g8fbcb65 removed the one use of aTHX.
*	infnan: grok_infnan now needs context	Jarkko Hietaniemi	2015-01-28	1	-1/+1
\|
*	Correction to 0563a5d0db	Steve Hay	2015-01-28	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Functions marked "b" do not have to be in mathoms.c after all. That was partly based on the belief (from e4524c4c2a) that "b" functions have their proto.h entries skipped, but that is not the case. It is "m" functions (macros) that have their proto.h entries skipped. The confusion may have arisen because all but four current "b" functions have "m" as well anyway. However, even functions marked "b" and "m" do not absolutely have to be in mathoms.c. At the time of writing they all are, which is convenient for having the symbols marked EXTERN_C, which is required to unmangle the exported names in a C++ build (see d447858807). Their presence in mathoms.c will also have them skipped from -DNO_MATHOMS builds (as of 075eb5c9b6). But it is possible for "bm" functions to exist elsewhere as long as they are explicitly marked EXTERN_C. In that case they won't be automatically skipped for -DNO_MATHOMS builds either, which may or may not require special attention too. Thanks to Craig A. Berry for pointing out that 0563a5d0db wasn't quite correct.
*	rename unop_aux_stringify to multideref_stringify	David Mitchell	2015-01-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This function returns a string representation of the OP_MULTIDEREF op (as used by the output of perl -Dt). However, the stringification of a UNOP_AUX op is op-specific, and hypothetical future UNOP_AUX-class ops will need their own functions. So the current function name is misleading. It should be safe to rename it, as it only been in since 5.21.7, and isn't public.
*	Functions marked 'b' in embed.fnc MUST be implemented in mathoms.c	Steve Hay	2015-01-27	1	-2/+2
\| \| \| \| \|	This was effectively changed from 'often' to 'must be' by commit 075eb5c9b6.
*	Move inline fcn to #included file	Karl Williamson	2015-01-21	1	-1/+1
\| \| \| \| \|	Future commits will want this function to be able to be used in more than one core file.
*	regcomp.c: Move #define, make a function always compiled	Karl Williamson	2015-01-20	1	-1/+1
\| \| \| \| \|	This is in preparation for the next commit. The function previously was used only in DEBUGGING builds
*	Revert "refactor gv_add_by_type"	Matthew Horsfall	2015-01-20	1	-1/+1
\| \| \| \| \| \| \|	This reverts commit 819b139db33e2022424694e381422766903d4f65. This could be repapplied for 5.23.1, with modifications or additional patches to solve the breakage discussed in RT 123580.
*	Move unlikely executed macro to function	Karl Williamson	2015-01-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	The bulk of this macro is extremely rarely executed, so it makes sense to optimize for space, as it is called from a fair number of places, and move as much as possible to a single function. For whatever it's worth, on my system with my typical compilation options, including -O0, the savings was 19640 bytes in regexec.o, 4528 in utf8.o, at a cost of 1488 in locale.o.
*	regcomp.c: Add 'strict' parameter to S_regclass()	Karl Williamson	2015-01-13	1	-2/+3
\| \| \| \| \| \| \|	This function has the capability to do strict checking, with the variable 'strict', but it is initialized based on another parameter's value. This commit causes 'strict' to be passed in, so it is independent of other parameters.
*	refactor op.c S_bad_type_*v	Daniel Dragan	2015-01-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-flags arg of both funcs is unused in all callers. Move the 0 to the funcs. flags arg is from commit ce16c625ec in 2012 -all bad_type_gv calls are right before the end of the switch, the pushing of 1st 3 args and call asm ops can be merged together, leaving the 1 string constant push as the only unique op between the 7 src code callers of bad_type_gv, this requires reordering the args so the only unique one is the last/right most one, reordering can't be done to bad_type_pv because each following execution point after each bad_type_pv is different, bad_type_pv's caller/s are not a switch statement - commit 53e06cf030 probably overlooked the 2 PL_op_desc[type] places, OP_DESC is a fancier superset of PL_op_desc[type], since calling bad_type_pv only happens during a PP syntax error, that is not performance critical, so replace PL_op_desc[type] with OP_DESC and factor out OP to description string lookup, plus custom ops are very rare so this shouldn't impact the error message seen by the user VC2003 .text section of perl521.dll before 0xc9543 after 0xC9523
*	Simplify s/// and tr/// parsing logic	Father Chrysostomos	2015-01-08	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These two operators were being translated into subst("","") and tr("","") by the lexer. Then pmruntime in op.c would take apart the resulting list op. Instead of constructing a list op only to take it apart again, feed the replacement part to pmruntime separately. We can achieve this by introducing a new token ('/') that the parser rec- ognizes as introducing a replacement. If we had followed this approach to begin with, then bug #123542 would never have happened. (Actually, it seems the parser did know about the replacement part to begin with, but it changed in perl-5.8.0-4047-g131b3ad to fix some overloading problems.)
*	regcomp.c: Rmv no-longer-used macro and function	Karl Williamson	2015-01-06	1	-2/+0
\| \| \| \|	There are no longer any calls to these, so can be removed.
*	pad.c: Remove unused context params	Father Chrysostomos	2015-01-06	1	-4/+4
\|
*	refactor gv_add_by_type	Daniel Dragan	2015-01-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gv_add_by_type was added in commit d5713896ec in 5.11.0 . Improve gv_add_by_type by making it return the newly created SV, instead of the the GV , which the caller must deref both the GV head to get svu and then deref a slice into the GP, even though it already derefed svu and GP right before, to figure out whether to call gv_add_by_type in the first place. The original version of this patch had gv_add_by_type returning a SV ** to ensure lvalue-ness but it was discovered it wasn't needed and not smart. -rename gv_add_by_type since it was removed from public api and its proto changed -remove null check since it is impossible to pass null through GvAVn(), and unlikely with gv_AVadd, null segvs reliably crash in the rare case of a problem -instead of S_gv_init_svtype and gv_add_by_type using a tree of logic/ conditional jumps in asm, use a lookup table, GPe (e=enum or entry) enums are identical to offsets into the GP struct, all of then fit under 0xFF, if the CC and CPU arch wants, CC can load the const once into a register, then use the number for the 2nd deref, then use the number again as an arg to gv_add_by_type, the low (&~0xf) or high (<<2) 2 bits in a GPe can be used for something else in the future since GPe is pointer aligned -SVt_LAST triggers "panic: sv_upgrade to unknown type", so use that value for entries of a GP which are not SV head s and are invalid to pass as an arg -remove the tree of logic in S_gv_init_svtype, replace with a table -S_gv_init_svtype is now tail call friendly and very small -change the GVn to be rvalues only, assigning to GVn is probably a memory leak -fix 1 core GVn as lvalue use -GvSVn's unusual former definition is from commit 547f15c3f9 in 2005 and DEFSV as lvalue is gone in core as of commit 414bf5ae08 from 2008 since all the GV*n macros are now rvalues, this goes too -PTRPTR2IDX and PTRSIZELOG2 could use better names -in pp_rv2av dont declare strings like that VC linker won't dedup that, and other parts of core also have "an ARRAY", perl521.dll previously had 2 "an ARRAY" and "a HASH" strings in it due to this before VC 2003 32 perl521.dll .text 0xc8813 in machine code bytes after .text 0xc8623
*	utf8.c: Use OP_DESC instead of passing string.	Karl Williamson	2014-12-29	1	-1/+0
\| \| \| \|	OP_DESC is simpler and more general.
*	add new API function sv_get_backrefs()	Yves Orton	2014-12-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	This encapsulates the logic to extract the backrefs from a weak-referent. Since sv_get_backrefs() can be used for a similar purposes as hv_backreferences_p() we no longer need to export the later, and therefore this patch also reverts ad2f46a793b4ade67d45ac0086ae62f6756c2752. See perl #123473 for related discussion, and https://github.com/Sereal/Sereal/issues/73 for a practical example of why this API is required.
*	add hv_backreferences_p() to embed.fnc	Yves Orton	2014-12-21	1	-1/+1
\| \| \| \| \|	Otherwise it cannot be used on Win32. See https://github.com/Sereal/Sereal/issues/73 for an example of where this breaks Win32 perl extensions.
*	make the EXACTF_invlist only when SCF_DO_STCLASS	Hugo van der Sanden	2014-12-11	1	-0/+2
\| \| \| \|	The data is used only for STCLASS, and it's somewhat expensive to create.
*	stdize_locale not used in POSIX.	Jarkko Hietaniemi	2014-12-09	1	-1/+4
\|
*	Fix OUTSIDE for named subs inside predeclared subs	Father Chrysostomos	2014-12-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This prints 42 as expected (foo initialises $x and bar reads it via eval): use 5.01; sub foo { state $x = 42; sub bar { eval 'print $x // "u", "\n"'; } } foo(); bar(); If you predeclare foo and vivify its glob, use 5.01; foo; # vivifies the glob at compile time sub foo; sub foo { state $x = 42; sub bar { eval 'print $x // "u", "\n"'; } } foo(); bar(); then the output is ‘u’, because $x is now undefined. What’s happening is that ‘eval’ follows CvOUTSIDE pointers (each sub points to its outer sub), searching each pad to find a lexical $x. In the former case it succeeds. In the latter, bar’s CvOUTSIDE pointer is pointing to the wrong thing, so the search fails and $x is treated as global. You can see it’s global with this example, which prints ‘globular’: use 5.01; foo; # vivifies the glob at compile time sub foo; sub foo { state $x = 42; sub bar { eval 'print $x // "u", "\n"'; } } foo(); $main::x = "globular"; bar(); When a sub is compiled, a new CV is created at the outset and put in PL_compcv. When the sub finishes compiling, the CV in PL_compcv is installed in the sub’s typeglob (or as a subref in the stash if pos- sible). If there is already a stub in a typeglob, since that stub could be referenced elsewhere, we have to reuse that stub and transfer the contents of PL_compcv to that stub. If we have any subs inside it, those will now have CvOUTSIDE point- ers pointing to the old PL_compcv that has been eviscerated. So we go through the pad and fix up the outside pointers for any subs found there. Named subs don’t get stored in the pad like that, so the CvOUTSIDE fix-up never happens. Hence the bug above. The bug does not occur if the glob is not vivified before the sub def- inition, because a stub declaration will skip creating a real CV if it can. It can’t if there is a typeglob. The solution, of course, is to store named subs in the outer sub’s pad. We can skip this if the outer ‘sub’ is an eval or the main pro- gram. These two types of CVs obviously don’t reuse existing stubs, since they never get installed in the symbol table. Since named subs have strong outside pointers, we have to store weak refs in the pad, just as we do for formats.
*	make xs_version_bootcheck() appear only in util.c	David Mitchell	2014-12-09	1	-0/+2
\| \| \| \| \| \| \|	672cbd159cb9 converted xs_version_bootcheck() into a static function, but forgot to wrap the embed.fnc entry with "#if defined(PERL_IN_UTIL_C)", so it was generating a compiler warning in every file that proto.h was included in.
*	make xs_version_bootcheck a static func since not used as export anymore	Daniel Dragan	2014-12-08	1	-4/+1
\| \| \| \| \| \| \| \|	Since commit db6e00bd00 the function has not been used by XS modules. Now remove it from export table since it has never been public API (the macro is), and it saves its symbol name string name, symbol table pointer, and allows for inlining and/or random calling convention optimization by the CC.
*	Add OP_MULTIDEREF	David Mitchell	2014-12-07	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This op is an optimisation for any series of one or more array or hash lookups and dereferences, where the key/index is a simple constant or package/lexical variable. If the first-level lookup is of a simple array/hash variable or scalar ref, then that is included in the op too. So all of the following are replaced with a single op: $h{foo} $a[$i] $a[5][$k][$i] $r->{$k} local $a[0][$i] exists $a[$i]{$k} delete $h{foo} while these aren't: $a[0] already handled by OP_AELEMFAST $a[$x+1] not a simple index and these are partially replaced: (expr)->[0]{$k} the bit following (expr) is replaced $h{foo}[$x+1][0] the first and third lookups are each done with a multideref op, while the $x+1 expression and middle lookup are done by existing add, aelem etc ops. Up until now, aggregate dereferencing has been very heavyweight in ops; for example, $r->[0]{$x} is compiled as: gv[r] s rv2sv sKM/DREFAV,1 rv2av[t2] sKR/1 const[IV 0] s aelem sKM/DREFHV,2 rv2hv sKR/1 gvsv[x] s helem vK/2 When executing this, in addition to the actual calls to av_fetch() and hv_fetch(), there is a lot of overhead of pushing SVs on and off the stack, and calling lots of little pp() functions from the runops loop (each with its potential indirect branch miss). The multideref op avoids that by running all the code in a loop in a switch statement. It makes use of the new UNOP_AUX type to hold an array of typedef union { PADOFFSET pad_offset; SV *sv; IV iv; UV uv; } UNOP_AUX_item; In something like $a[7][$i]{foo}, the GVs or pad offsets for @a and $i are stored as items in the array, along with a pointer to a const SV holding 'foo', and the UV 7 is stored directly. Along with this, some UVs are used to store a sequence of actions (several actions are squeezed into a single UV). Then the main body of pp_multideref is a big while loop round a switch, which reads actions and values from the AUX array. The two big branches in the switch are ones that are affectively unrolled (/DREFAV, rv2av, aelem) and (/DREFHV, rv2hv, helem) triplets. The other branches are various entry points that handle retrieving the different types of initial value; for example 'my %h; $h{foo}' needs to get %h from the pad, while '(expr)->{foo}' needs to pop expr off the stack. Note that there is a slight complication with /DEREF; in the example above of $r->[0]{$x}, the aelem op is actually aelem sKM/DREFHV,2 which means that the aelem, after having retrieved a (possibly undef) value from the array, is responsible for autovivifying it into a hash, ready for the next op. Similarly, the rv2sv that retrieves $r from the typeglob is responsible for autovivifying it into an AV. This action of doing the next op's work for it complicates matters somewhat. Within pp_multideref, the autovivification action is instead included as the first step of the current action. In terms of benchmarking with Porting/bench.pl, a simple lexical $a[$i][$j] shows a reduction of approx 40% in numbers of instructions executed, while $r->[0][0][0] uses 54% fewer. The speed-up for hash accesses is relatively more modest, since the actual hash lookup (i.e. hv_fetch()) is more expensive than an array lookup. A lexical $h{foo} uses 10% fewer, while $r->{foo}{bar}{baz} uses 34% fewer instructions. Overall, bench.pl --tests='/expr::(array\|hash)/' ... gives: PRE POST ------ ------ Ir 100.00 145.00 Dr 100.00 165.30 Dw 100.00 175.74 COND 100.00 132.02 IND 100.00 171.11 COND_m 100.00 127.65 IND_m 100.00 203.90 with cache misses unchanged at 100%. In general, the more lookups done, the bigger the proportionate saving.