| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By "true" I mean that they have prototypes, but no bodies.
So don't declare their prototypes under PERL_NO_INLINE_FUNCTIONS.
After some studying of http://www.greenend.org.uk/rjk/tech/inline.html
it seems like Perl is trying to implement the "simple portable" model.
But the functions listed as failing during porting/extrefs.t in Tru64:
they are neither fish nor fowl. Their prototypes are listed in
proto.h as PERL_STATIC_INLINE (which in Tru64 is "static inline"),
but since the test is built with -DPERL_NO_INLINE_FUNCTIONS,
the function bodies (which would be in inline.h) are not visible.
So they end up being body-less static inline prototypes, which is,
I believe, somewhat of an oxymoron.
The "complicated portable" model might be a more wortwhile longer
term goal: in that, there is no "static inline", and there would be
a new source file, say, inline.c. Now with the "simple portable",
the bodies might end up being compiled multiple times, multiple copies
ending up in different object files, depending on how smart the
compiler/linker is.
Another move could be that maybe there should be no prototypes at all
for inlineables, because having those is kind beside the point. How
well that would work across different compilers is unknown.
Yet another move, perhaps the simplest one, would be to move these
particular functions away from inline.h. But this would be just
dodging the larger problems discussed above.
|
|
|
|
| |
With one ugly cast inside the reg_recode() call.
|
|
|
|
| |
This only returns TRUE or FALSE; no need for a wider return value.
|
|
|
|
|
|
|
| |
regpatws() is only used in one place, and is dangerous to retain it as a
named entity. This is because wherever white space is to be skipped,
(#...) comments are to be as well, so the function that does both things
should be called instead of this one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sometimes we want to move to the next non-ignored character in the
input. The nextchar() function does that (but buggily in UTF-8).
And sometimes we are already at the next character, but if it is one
that should be ignored, we want to move to the first one that isn't.
This commit creates a function to do the second task by extracting the
code in nextchar() to it, and making nextchar() a lightweight wrapper
around it, and hence likely to be optimized out by the compiler.
This is a step in the direction of fixing the UTF-8 problems with
nextchar(), and fixing some other bugs. The new function has added
generality which won't be used until a later commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nextchar() advances the parse to the next byte beyond any ignorable
bytes, returning the parse pointer before the advancement.
I find this confusing, as
foo = nextchar();
reads as if foo should point to the next character, instead of the
character where the parse already is at. This functionality is hard for
a reader to grok, even if the name weren't misleading, as the place the
variable gets set in the source is far away from the call. It's clearer
to say
foo = current;
nextchar();
This has confused others as well, as in one place several commits have
been required to get it so it works properly, and games have been played
to back up the parse if it turns out it shouldn't have been advanced,
whereas it's better to check first, then advance if it is the right
thing to do. Ready-Fire-Aim is not a best practice.
This commit makes nextchar() return void, and changes the few places
where the en-passant value was used.
The new scheme is still buggy, as nextchar() only advances a single
byte, which may be the wrong thing to do when the pattern is UTF-8
encoded. More work is needed to be in a position to fix this. We have
only gotten away with this so far because apparently no one is using
non-ASCII white space under /x, and our meta characters are all ASCII,
and there are likely other things that reposition things to a character
boundary before problems have arisen.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reorder the body of Perl_sv_backoff slightly to make it more tail-call
friendly, and change its signature from returning an int (always 0) to
void.
sv_backoff has only 1.5 function calls in it, there is a memcpy of a U32 *
for alignment reasons (I wont discuss U32_ALIGNMENT_REQUIRED) inside of
SvOOK_offset, and the explicit Move()/memmove. GCC and clang often inline
memcpy/memmove when the length is a constant and is small. Sometimes
a CC might also do unaligned memory reads if OS/CPU allows it
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20130513/174807.html
so I'll assume memcpy by short constant isn't a func call for discussion.
By moving SvFLAGS modification before the one and only func call, and
changing the return type to void, there is no code to execute after the
Move func call so the CC, if it wants (OS/ABI/CPU, specifically I am
thinking about x86-64) can tailcall jump to memmove. Also var sv can be
stored in a cheaper vol reg since it is not saved around any func calls
(SvFLAGS set was moved) assuming the memcpy by short constant was inlined.
The before machine code size of Perl_sv_backoff with VC 2003 -O1 was
0x6d bytes. After size is 0x61. .text section size of perl523.dll was
after was 0xD2733 bytes long, before was 0xD2743 bytes long. VC perl does
not inline memcpys by default.
In commit a0d0e21ea6 "perl 5.000" the return 0 was added. The int ret type
is from day 1 of sv_backoff function existing/day 1 of SV *s
from commit 79072805bf "perl 5.0 alpha 2". str_backoff didn't exist AFAIK,
only str_grow would retake the memory at the start of the block. Since
sv_backoff is usually used in a "&& func()" macro (SvOOK_off), it needed a
non void ret type, a simple ", 0" in the macro fixes that. All CCs optimize
and remove "if(0)" machine instructions so the ", 0" is optimized away in
the perl binary.
|
|
|
|
|
| |
It was never public, and was only written for the sake of pp_coreargs,
which no longer uses it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Largely reimplements 839a9f02, 54fa14d7, e8432c63, 40262ff4.
The upside is that now doio.c and pp_sys.c have much less AmigaOS
specific ifdefs. As a downside, the exec code is now forked (pun
only partially accidental.)
The earlier story regarding fork+exec, that the AmigaOS creating
thread doesn't terminate but instead continues running is both true
and false. The more detailed story is that the user-observable
behaviour is as with POSIX/UNIX. The thread that created the new
"task" (to use the AmigaOS terms) does hang around -- but all it
does is to wait for the new task to terminate, and more importantly,
it holds on to the resources like filehandles. If the task were to
immediately terminate, the resources would be reclaimed by the kernel.
|
|
|
|
|
|
|
|
|
| |
Commit 2d3d6e6e7c2d50b1cc47032cf089151823fb20a6 introduced the
'optimizable' variable which if FALSE prevents the [...] node from being
optimized, if otherwise possible, into something simpler. It turns out
that several of the conditions which prevent such optimization can just
clear this flag when they are found, rather than having to test for the
conditions again later when the optimization is actually done.
|
|
|
|
|
|
|
|
|
|
|
| |
This initialization is done before the processing of command line
arguments, so that it has to be handled specially. This commit changes
the initialization code to output debugging information if the
environment variable PERL_DEBUG_LOCALE_INIT is set.
I don't see the need to document this outside the source, as anyone who
is using it would be reading the source anyway; it's of highly
specialized use.
|
|
|
|
|
| |
Unlike UNIXish fork-exec, in amigaos forking is more like
starting a thread, the return code is more than a boolean.
|
|
|
|
| |
This will be used by the next commit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These two commits:
v5.21.3-759-gff2a62e "Skip no-common-vars optimisation for aliases"
v5.21.4-210-gc997e36 "Make list assignment respect foreach aliasing"
added a run-time mechanism to detect aliased package variables,
by either "*pkg = ...," or "for $pkg (...)", and used that information
to enable the OPpASSIGN_COMMON mechanism at runtime for detecting common
elements in a list assign, e.g.
for $alias ($a, ...) {
($a,$b) = (1,$alias);
}
The previous commit but one changed the OPpASSIGN_COMMON mechanism such
that it no longer uses PL_sawalias. So this var and the mechanism for
setting it can now be removed.
This commit removes:
* the PL_sawalias variable
* the GPf_ALIASED_SV GP flag
* the SAVEt_GP_ALIASED_SV and save_aliased_sv() save type.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit almost completely replaces the current mechanism
for detecting and handing common vars in list assignment, e.g.
($a,$b) = ($b,$a);
In general outline: it creates more false positives at compile-time
than before, but also no longer misses some false negatives. In
compensation, it considerably reduces the run-time cost of handling
potential and real commonality.
It does this firstly by splitting the OPpASSIGN_COMMON flag into 3
separate flags:
OPpASSIGN_COMMON_AGG
OPpASSIGN_COMMON_RC1
OPpASSIGN_COMMON_SCALAR
which indicate different classes of commonality that can be handled
in different ways at runtime.
Most importantly, it distinguishes between two basic cases. Firstly,
common scalars (OPpASSIGN_COMMON_SCALAR), e.g.
($x,....) = (....,$x,...)
where $x is modified and then sometime later its value is used again,
but that value has changed in the meantime. In this case, we need
replace such vars on the RHS with mortal copies before processing the
assign.
The second case is an aggregate on the LHS (OPpASSIGN_COMMON_AGG), e.g.
(...,@a) = (...., $a[0],...)
In this case, the issue is instead that when @a is cleared, it may free
items on the RHS (due to the stack not being ref counted). What is
required here is that rather than making of a copy of each RHS element and
storing it in the array as we progress, we make *all* the copies *before*
clearing the array, but mortalise them in case we die in the meantime.
We can further distinguish two scalar cases; sometimes it's possible
to confirm non-commonality at run-time merely by checking that all
the LHS scalars have a reference count of 1. If this is possible,
we set the OPpASSIGN_COMMON_RC1 flag rather than the
OPpASSIGN_COMMON_SCALAR flag.
The major improvement in the run-time performance in the
OPpASSIGN_COMMON_SCALAR case (or OPpASSIGN_COMMON_RC1 if rc>1 scalars are
detected), is to use a mark-and-sweep scan of the two lists using the
SVf_BREAK flag, to determine which elements are common, and only make
mortal copies of those elements. This has a very big effect on run-time
performance; for example in the classic
($a,$b) = ($b,$a);
it would formerly make temp copies of both $a and $b; now it only
copies $a.
In more detail, the mark and sweep mechanism in pp_aassign works by
looping through each LHS and RHS SV pair in parallel. It temporarily marks
each LHS SV with the SVf_BREAK flag, then makes a copy of each RHS element
only if it has the SVf_BREAK flag set. When the scan is finished, the flag
is unset on all LHS elements.
One major change in compile-time flagging is that package scalar vars are
now treated as if they could always be aliased. So we don't bother any
more to do the compile-time PL_generation checking on package vars (we
still do it on lexical vars). We also no longer make use of the run-time
PL_sawalias mechanism for detecting aliased package vars (and indeed the
next commit but one will remove that mechanism). This means that more list
assignment expressions which feature package vars will now need to
do a runtime mark-and-sweep (or where appropriate, RC1) test. In
compensation, we no longer need to test for aliasing and set PL_sawalias
in pp_gvsv and pp_gv, nor reset PL_sawalias in every pp_nextstate.
Part of the reasoning behind this is that it's nearly impossible to detect
all possible package var aliasing; for example PL_sawalias would fail to
detect XS code doing GvSV(gv) = sv.
Note that we now scan the two children of the OP_AASSIGN separately,
and in particular we mark lexicals with PL_generation only on the
LHS and test only on the RHS. So something like
($x,$y) = ($default, $default)
will no longer be regarded as having common vars.
In terms of performance, running Porting/perlbench.pl on the new
expr::aassign:: tests in t/perf/benchmarks show that the biggest slowdown
is around 13% more instruction reads and 20% more conditional branches in
this:
setup => 'my ($v1,$v2,$v3) = 1..3; ($x,$y,$z) = 1..3;',
code => '($x,$y,$z) = ($v1,$v2,$v3)',
where this is now a false positive due to the presence of package
variables.
The biggest speedup is 50% less instruction reads and conditional branches
in this:
setup => '@_ = 1..3; my ($x,$y,$z)',
code => '($x,$y,$z) = @_',
because formerly the presence of @_ pessimised things if the LHS wasn't
a my declaration (it's still pessimised, but the runtime's faster now).
Conversely, we pessimise the 'my' variant too now:
setup => '@_ = 1..3;',
code => 'my ($x,$y,$z) = @_',
this gives 5% more instruction reads and 11% more conditional branches now.
But see the next commit, which will cheat for that particular construct.
|
| |
|
|
|
|
| |
we need to match the declaration of ax
|
|
|
|
| |
Instead of #include-ing the C file, compile it normally.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #124385
Parsing following a syntax error could result in a null ptr dereference.
This commit contains a band-aid that returns from Perl_cv_forget_slab() if
the cv arg is null; but the real issue is much deeper and needs a more
general fix at some point.
Basically, both the lexer and the parser use the save stack, and after an
error, they can get out of sync.
In particular:
1) when handling a double-quoted string, the lexer does an ENTER, saves
most of its current state on the save stack, then uses the literal string
as the toke source. When it reaches the end of the string, it LEAVEs,
restores the lexer state and continues with the main source.
2) Whenever the parser starts a new block or sub scope, it remembers the
current save stack position, and at end of scope, pops the save stack back
to that position.
In something like
"@{ sub {]}} }}}"
the lexer sees a double-quoted string, and saves the current lex state.
The parser sees the start of a sub, and saves PL_compcv etc. Then a parse
error occurs. The parser goes into error recovery, discarding tokens until
it can return to a sane state. The lexer runs out of tokens when toking
the string, does a LEAVE, and switches back to toking the main source.
This LEAVE restores both the lexer's and the parser's state; in particular
the parser gets its old PL_compcv restored, even though the parser hasn't
finished compiling the current sub. Later, series of '}' tokens coming
through allows the parser to finish the sub. Since PL_error_count > 0, it
discards the just-compiled sub and sets PL_compcv to null. Normally the
LEAVE_SCOPE done just after this would restore PL_compcv to its old value
(e.g. PL_main_cv) but the stack has already been popped, so PL_compcv gets
left null, and SEGVs occur.
The two main ways I can think of fixing this in the long term are
1) avoid the lexer using the save stack for long-term state storage;
in particular, make S_sublex_push() malloc a new parser object rather
than saving the current lexer state on the save stack.
2) At the end of a sublex, if PL_error_count > 0, don't try to restore
state and continue, instead just croak.
N.B. the test that this commit adds to lex.t doesn't actually trigger the
SEGV, since the bad code is wrapped in an eval which (for reasons I
haven't researched) avoids the SEGV.
|
|
|
|
|
|
| |
Make the function Perl_op_parent() only be present in perls built with
-DPERL_OP_PARENT. Previously the function was present in all builds, but
always returned NULL on non PERL_OP_PARENT builds.
|
|
|
|
|
|
|
| |
If the splicing doesn't affect the first or last sibling of an op_sibling
chain, then we don't need access to the parent op of the siblings (to
access/update op_first, op_last, OPf_KIDS etc). So allow an NULL parent
arg in that case.
|
|
|
|
|
|
|
| |
S_deb_curcv's first param differed in constness between declaration and
definition.
GIMME_V can return an I32, so don't assign it to a U8.
|
|
|
|
|
| |
This experimental feature now has the intersection operator ("&") higher
precedence than the other binary operators.
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this commit, the regex compiler was relying on the lexer to do
the translation from Unicode to native for \N{...} constructs, where it
was simpler to do. However, when the pattern is a single-quoted string,
it is passed unchanged to the regex compiler, and did not work. Fixing
it required some refactoring, though it led to a clean API in a static
function.
This was spotted by Father Chrysostomos.
|
|
|
|
|
|
|
| |
PL is reserved for global variables. These are enums and static
variable names introduced for handling /\b{...}/
See <20150311150610.GN28599@iabyn.com> and follow up.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function is called by e.g. "perl -Dt" to display the multideref op:
$ perl -Dt -e'$a->{foo}[1]'
...
(-e:1) multideref($a->{"foo"}[1])
On threaded builds, it needs to know the correct pad (and so the correct
cv too) so that it can access GVs and const SVs that have been moved to
the pad.
However with a sort code block (rather than a sort sub), S_deb_curcv()
returns null, so multideref_stringify() is called with a null CV. This
then SEGVs.
Although ideally S_deb_curcv() should be fixed, a function like
multideref_stringify(), which can be used for debugging, should be robust
in unexpected circumstances. So this commit makes it safe (although not
particularly useful) with a null CV:
$ perl -Dt -e'@a = sort { $a->[$i] <=> $b->[$i] } [0], [1]'
...
(-e:1) sort
(-e:1) multideref(<NULLGV>->[<NULLGV>])
(-e:1) multideref(<NULLGV>->[<NULLGV>])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For lots of core functions:
if a function parameter has been declared NN in embed.fnc, don't test for
nullness at the start of the function, i.e. eliminate code like
if (!foo) ...
On debugging builds the test is redundant, as the PERL_ARGS_ASSERT_FOO
at the start of the function will already have croaked.
On optimised builds, it will skip the check (and so be slightly faster),
but if actually passed a null arg, will now crash with a null-deref SEGV
rather than doing whatever the check used to do (e.g. croak, or silently
return and let the caller's code logic to go awry). But hopefully this
should never happen as such instances will already have been detected on
debugging builds.
It also has the advantage of shutting up recent clangs which spew forth
lots of stuff like:
sv.c:6308:10: warning: nonnull parameter 'bigstr' will evaluate to
'true' on first encounter [-Wpointer-bool-conversion]
if (!bigstr)
The only exception was in dump.c, where rather than skipping the null
test, I instead changed the function def in embed.fnc to allow a null arg,
on the basis that dump functions are often used for debugging (where
pointers may unexpectedly become NULL) and it's better there to display
that this item is null than to SEGV.
See the p5p thread starting at 20150224112829.GG28599@iabyn.com.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some questions and loose ends:
XXX gv.c:S_gv_magicalize - why are we using SSize_t for paren?
XXX mg.c:Perl_magic_set - need appopriate error handling for $)
XXX regcomp.c:S_reg - need to check if we do the right thing if parno
was not grokked
Perl_get_debug_opts should probably return something unsigned; not sure
if that's something we can change.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This was introduced by 9df874cdaa2f196cc11fbd7b82a85690c243eb9f
in changing the name of some static functions. I didn't realize at the
time that the function was defined in embed.fnc, as none of the others
are, and it was always called with the S_ prefix form. Nor did I notice
the compiler warnings.
It turns out that the base name of this function is the same as a public
function, so I've renamed it to have prefix 'S_my_'.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
A function implements seeing if the space between any two characters is
a grapheme cluster break. Afer I wrote this, I realized that an array
lookup might be a better implementation, but the deadline for v5.22 was
too close to change it. I did see that my gcc optimized it down to
an array lookup.
This makes the implementation of \X go from being complicated to
trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4258cf903c752ec19a3aeee9b93020533d923e1a
91e945c051cfcdf499d5b43aa5ac0a5681cdd595
eb254f2672a985ec3c34810f624f36c18fc35fc7
c9a671b17a9c588469bcef958038daaaaf9cc88b
99fcdd4df47515fb0a62a046e622adec0871754d
ba511db061a88439acb528a66c780ab574bb4fb0
0d1cf11425608e9be019f27a3a4575bc71c49e6b
c2ea8a88f8537d00ba25ec8feb63ef5dc085ef2b
b5a6eedc2f49a90089cca896ee20f41e373fb4c9
30419b527d2c5a06cefe2db9183f59e2697c47fc
29b62199cd4c359dfc6b9d690341de40d105ca5f
be181dc9d91c84a2fe03912c993c8259fed92641
4de1bcfe1abdaba0a5da394ddea0cc6fd7e36c7b
6e915616c4ccb4f6cc3122c5d395765db96c0a2d
b2e3501558a1017eb529be0915c25d31671e7869
bfaa02d55f4ace1571e6fa9e5b47d5e3ac3cecc6
569f27e562618bdddcf4a9fc71612283a73747e9
4f89311dc8de87ddc9a302c6f2d2c844951bbd28
a307a0b0d83c509cc2adaad8cebb44260294bf36
6640aa2c3b93d7ac78e4e86983fe5948b3ca55f2
b74dc0b3c96390d8bf83d8c3ffc0c2c2d1f0a5d3
c3a8e5a5b4bb89a15de642c023dfd5cbc4678938
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
v5.21.7-83-geaab564 added sv_get_backrefs. v5.21.7-90-g8fbcb65
removed the one use of aTHX.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function returns a string representation of the OP_MULTIDEREF op
(as used by the output of perl -Dt).
However, the stringification of a UNOP_AUX op is op-specific, and
hypothetical future UNOP_AUX-class ops will need their own functions.
So the current function name is misleading.
It should be safe to rename it, as it only been in since 5.21.7, and isn't
public.
|
|
|
|
|
| |
Future commits will want this function to be able to be used in more
than one core file.
|
|
|
|
|
| |
This is in preparation for the next commit. The function previously was
used only in DEBUGGING builds
|
|
|
|
|
|
|
| |
This reverts commit 819b139db33e2022424694e381422766903d4f65.
This could be repapplied for 5.23.1, with modifications or
additional patches to solve the breakage discussed in RT 123580.
|
|
|
|
|
|
|
|
|
|
| |
The bulk of this macro is extremely rarely executed, so it makes sense
to optimize for space, as it is called from a fair number of places, and
move as much as possible to a single function.
For whatever it's worth, on my system with my typical compilation
options, including -O0, the savings was 19640 bytes in regexec.o, 4528
in utf8.o, at a cost of 1488 in locale.o.
|