| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
My recent commit 5d37acd6b65eb enabled (among other things)
format-arg checking of taint_proper(). This was not a good idea since
taint_proper() adds extra args before it actually calls a printf-style
function. This was masked since on some gcc systems, a NULLOK format arg
disables this check.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to the security risks associated with user-supplied formats
being passed to C-level printf() style functions (eg %n),
gcc has a -Wformat-nonliteral warning that complains whenever such a
function is passed a non-literal format string.
This commit silences all such warnings in core and ext/.
The main changes are
1) the 'f' (format) flag in embed.fnc is now handled slightly more
cleverly. Rather than just applying to functions whose last arg is '...'
(and where the format arg is assumed to be the previous arg), it
can now handle non-'...' functions: arg checking is disabled, but format
checking is sill done: it works by assuming that an arg called 'fmt',
'pat' or 'f' is the format string (and dies if fails to find exactly one
such arg).
2) with the new embed.fnc functionally, more functions have been marked
with the 'f' flag. When such a function passes its fmt arg onto an inner
printf-like function, we simply disable the warning for that call using
GCC_DIAG_IGNORE(-Wformat-nonliteral), since we know that the caller must
have already checked it.
3) In quite a few places the format string isn't literal, but it *is*
constant (e.g. PL_warn_uninit_sv). For those cases, again disable the
warning.
4) In pp_formline(), a particular format was was one of several different
literal strings depending on circumstances. Rather than assigning this
string to a temporary variable, incorporate the ?: branches directly in
the function call arg. gcc is clever enough to decide the arg is then
always literal.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mark this function with
__attribute__format__null_ok__(__strftime__,pTHX_1,0)
so that compiler checks and warnings about strftime-style format args
can be checked.
Rather than adding new flag(s) to embed.fnc, I just enhanced the f flag
to treat it as strftime-style rather than printf if the function name
matches /strftime/. This was quicker, and we're unlikely to have many
such functions.
|
|
|
|
|
|
|
|
|
|
|
| |
The NULL sv code being removed dates to commit e334a159a5 Perl 1.0 as
the pre-SV str_2ptr and str_2num calls. When SVs were intoduced in
commit 79072805bf Perl 5.0 alpha 2, the NULL sv code was copied to the new
SV functions. The functions were bulk marked non-NULL in commit f54cb97a39
during 5.9.3 development. The docs were corrected to say NULLOK support
in commit 53e8571218 during 5.11.0.
See the perldelta part of this patch for the rest of commit body.
|
|
|
|
|
| |
This is in preparation for the same code to be used in additional
places. There should be no logic changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since we can only recurse into a given paren (or the entire pattern)
once, we know that the maximum recursion depth is the number of parens
in the pattern (plus one for "whole pattern"). This means we can
preallocate one large bitmap, and then use different chunks of it
for each level. That avoids SAVEFREEPV costs for each bitmap, which
are likely short anyway. (One could imagine an optimization where a
flag somewhere lets us use the RExC_study_chunk_recursed pointer
as a bitmap, so we dont have to allocate all when we have less than
32 parens.)
This removes the "recursed" argument from study_chunk() and replaces
it with a "recursive_depth" argument which counts how deep we
are in the bitmap "stack".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
$ perl -Mutf8 -e 's αaαα'
Substitution replacement not terminated at -e line 1.
What is happening is that the first scan goes past the delimiter at
the end of the pattern. Then a single byte is compared (the previous
character against the first byte of the opening delimiter) to see
whether the parser needs to step back one byte before scanning the
second part.
That means you can do the equivalent of s/foo/|bar|g if / is replaced
with a wide character:
$ perl -l -Mutf8 -e '$_ = "a"; s αaα|b|; print'
b
This commit fixes it by giving toke.c:S_scan_str an extra parameter,
so it can tell the callers that need this (scan_subst and scan_trans)
where to start scanning the replacement.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 1830b3d9c8 introduced a flaw where XopENTRY calls
Perl_custom_op_xop twice to retrieve the same XOP *. This is inefficient
and causes extra machine code. Since I found no CPAN or upstream=blead
usage of Perl_custom_op_xop, and its previous docs say it isn't 100%
public, it is being converted to a macro.
Most usage of Perl_custom_op_xop is to conditionally fetch a member of the
XOP struct, which was previously implemented by XopENTRY. Move the XopENTRY
logic and picking defaults to an expanded version of Perl_custom_op_xop.
The union allows Perl_custom_op_get_field to return its result in 1
register, since the union is similar to a void * or IV, but with the
machine code overhead of casting, if any, being done in the callee
(Perl_custom_op_get_field), not the caller. Perl_custom_op_get_field can
also return the XOP * without looking inside it to implement
Perl_custom_op_xop.
XopENTRYCUSTOM is a wrapper around Perl_custom_op_get_field with
XopENTRY-like usage.
XopENTRY is used by the OP_* macros, which are heavily used (but rarely
called, since custom ops are rare) by Perl lang warnings system. The
vararg warning arguments are usually evaluted no matter if the warning
will be printed to STDERR or not. Since some people like to ignore warnings
or run no strict; and warnings branches are frequent in pp_*, it is
beneficial to make the OP_* macros smaller in machine code. The design
of Perl_custom_op_get_field supports these goals.
This commit does not pass judgement on Ben Morrow's unclear public or
private API designation of Perl_custom_op_xop, and whether
Perl_custom_op_xop should deprecated and removed from public API. It was
trivial to leave a form of Perl_custom_op_xop in the new design.
XOPe enums are identical to XOPf constants so no conversion has to be
done between the field selector parameter and the field flag to test
in machine code.
ASSUME and NOT_REACHED are being introduced. The closest to the 2
previously was "assert(0)". Perl has not used ASSUME or CC specific
versions of it before. Clang, GCC >= 4.5, and Visual C are supported. For
completeness, ARMCC's __promise was added, but Perl is not known to have
any support for ARMCC by this commiter.
This patch is part of perl #115032.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This resolves two RT tickets:
• #115330 is that qx and `` overrides do not support interpolation.
• #119827 is that <<`` does not support readpipe overrides at all.
The obvious fix for #115330 fixes #119827 at the same time.
When quote-like operators are parsed, after the string has been
scanned S_sublex_push is called, which decides which of two paths
to follow:
1) For things not requiring interpolation, the string is passed to
tokeq (originally called q, it handles double backslashes and back-
slashed delimiters) and returned to the parser immediately.
2) For anything that interpolates, the lexer enters a special inter-
polation mode (LEX_INTERPPUSH) and goes through a more complex
sequence over the next few calls (e.g., qq"a.$b.c" is turned into
‘stringify ( "a." . $ b . ".c" )’).
When commit e3f73d4ed (Oct 2006, perl 5.10) added support for overrid-
ing `` and qx with a readpipe sub, it did so by creating an entersub
op in toke.c and making S_sublex_push follow path no. 1, taking the
result if tokeq and inserting it into the already-constructed op tree
for the sub call.
That approach caused interpolation to be skipped when qx or `` is
overridden. Furthermore it didn’t touch <<`` at all.
The easiest solution is to let toke.c follow its normal path and
create a backtick op (instead of trying to half-intercept it), and
to deal with override lookup afterwards in ck_backtick, the same way
require overrides are handled. Since <<`` also turns into a backtick
op, it gets handled too that way.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When I moved the three occurrences of this code in op.c into a static
function, I did not realise at the time that it also occurred thre
etimes in toke.c.
So now it is in a new non-static function in gv.c.
Only two of the instances in toke.c could be changed to use this func-
tion, as the otherwise is a little different. I couldn’t see a simple
way of factoring its requirements in.
|
|
|
|
|
|
|
| |
This also moves the indirect dependency on stdbool.h to its
own file, rather than being pulled in for all of perl.c, for
those cases where one may want to test using other definitions
of bool.
|
|
|
|
|
|
|
|
| |
where possible
This involved adding hv_fetchhek and hv_storehek macros and changing
S_mro_clean_isarev to accept a hash parameter and expect HVhek_UTF8
instead of SVf_UTF8.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When if/else/unless is the last thing in an lvalue sub, the lvalue
context is not always propagated properly and scope exit tries to
copy things, including arrays, resulting in ‘Bizarre copy of ARRAY’.
This commit fixes the bizarre copy by flagging any leave op that is
part of an lvalue sub’s return sequence, using the OPpLEAVE flag added
for this purpose in the previous commit. Then pp_leave uses that flag
to avoid copying return values, but protects them via the mortals
stack just like pp_leavesublv (actually pp_ctl.c:S_return_lvalues).
For ‘if’ and ‘unless’ without ‘else’, the lvalue context was not being
propagated, resulting in arrays’ getting flattened despite the lvalue
context. op_lvalue_flags in op.c needed to handle AND and OR ops,
which ‘if’ and ‘unless’ compile to, to make this work.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When Perl_sv_2bool_flags() has an overloaded arg, it calls SvTRUE()
on the SV returned from the overload method. This indirectly calls
sv_2bool_flags() again.
Change it so that sv_2bool_flags() just iterates the new overload value
each time.
2 callsites were converted to gotos. A SvTRUE_common was expanded so goto
can be used. This function's machine code size on VC2003 32 bits dropped
by 0x24 bytes after this patch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This attribute adds an additional way of declaring a prototype for a
sub, making sub foo($$) and sub foo : prototype($$) equivalent. The
intent is to keep the functionality of prototypes while allowing other
modules to use the syntactic space it currently occupies for other
purposes.
The attribute is supported in attributes.xs to allow
attributes::->import to work, but if its defined inline via something
like sub foo : prototype($$) {}, it will not call out to the
attributes module.
For: RT #119251
|
|
|
|
|
| |
This parameter is no longer used, since a few commits ago in this
series.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until this commit, the regular expression optimizer has essentially
punted on above-Latin1 code points. Under some circumstances, they
would be taken into account, more or less, but often, the generated
synthetic start class would end up matching all above-Latin1 code
points. With the advent of inversion lists, it becomes feasible to
actually fully handle such code points, as inversion lists are a
convenient way to express arbitrary lists of code points and take their
union, intersection, etc. This commit changes the optimizer to use
inversion lists for operating on the code points the synthetic start
class can match.
I don't much understand the overall operation of the optimizer. I'm
told that previous porters found that perturbing it caused unexpected
behaviors. I had promised to get this change in 5.18, but didn't. I'm
trying to get it in early enough into the 5.20 preliminary series that
any problems will surface before 5.20 ships.
This commit doesn't change the macro level logic, but does significantly
change various micro level things. Thus the 'and' and 'or' subroutines
have been rewritten to use inversion lists. I'm pretty confident that
they do what their names suggest. I re-derived the equations for what
these operations should do, getting the same results in some cases, but
extending others where the previous code mostly punted. The derivations
are given in comments in the respective routines.
Some of the code is greatly simplified, as it no longer has to treat
above-Latin1 specially.
It is now feasible for /i matching of above-Latin1 code points to know
explicitly the folds that should be in the synthetic start class. But
more prepatory work needs to be done before putting that into place.
...
|
|
|
|
|
|
|
|
|
| |
This commit adds some functions that are currently unused, but will be
used in a future commit. This commit is essentially to make the
differences smaller in that commit, as 'diff' is getting confused and
not outputting the logical differences. The functions are added in a
block at the beginning of the file to avoid the 'diff' issues. A later
white-space only commit will move them to more appropriate positions.
|
|
|
|
|
|
| |
By changing the order of the parameters to the static function
S_add_data, we can call it with STR_WITH_LEN and avoid a human having to
count characters.
|
|
|
|
| |
I found I needed const in a future commit.
|
|
|
|
|
|
|
| |
This parameter will be used in future commits. This commit is really
only to make the difference listing smaller in those, by committing
separately just the book-keeping parts. This parameter requires also
passing the aTHX_ thread parameter
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The regcomp.c struct RExC_state_t has not been usable fully as a
typedef, requiring the 'struct' at times. This has caused me, and I
presume others, wasted time when we forget to use it under those
circumstances when it should be used, but it's never been a big enough
issue to cause me to spend tuits on it. But, working on something else,
I finally came to the realization of what the problem is. It is because
proto.h is #included before regcomp.h is, and so functions that are
declared in proto.h that have something that is a RExC_state_t as a
parameter don't know that it is a typedef because that is defined in
regcomp.h. A way around this is already used for other similar
structures, and that is to declare them in perl.h which is always read
in before proto.h, leaving the definitions to regcomp.h. Thus proto.h
knows enough to compile.
The structure was already declared in perl.h; just not typedef'd.
Otherwise proto.h would not know about it at all. This patch moves two
regcomp.c related declarations in perl.h to the same section as the
others, and changes the one for RExC_state_t to be a typedef. All the
'struct' uses are removed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The term 'class' is very overloaded in regex code and documentation.
perlrecharclass.pod calls the dot (matching any char) a class, and
calls the [] form "bracketed character classes". There are other
meanings as well. This is the first commit in a short series that
removes some of those overloadings.
One instance of class is the "synthetic start class", generated by the
regex optimizer to be a list of all the code points a sucessful match
could possibly start with. This is useful in more quickly finding where
to start looking in matching against a target string. Prior to this
commit, the routines that referred to this began with 'cl_', and the
formal parameters were 'cl', which could mean any class. This commit
changes those instances of 'cl' to 'ssc' to indicate this is the only
type of class that is being handled.
|
|
|
|
|
|
| |
The previous commit just extracted out code into a function. This
commit renames a parameter for clarity, combines two parameters to make
the interface cleaner, and adds and moves comments around.
|
|
|
|
|
|
| |
A future commit will use this functionality from a second place. For
now, just cut and paste, and do the minimal ancillary work to get it to
compile and pass.
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of extending the regular expression optimizer to properly handle
above Latin1 code points, I need an inversion list to contain which code
points the synthetic start class (ssc) matches.
The ssc currently is the same as a locale-aware ANYOF node, which uses
the struct of a regular ANYOF node, plus some extra fields at the end.
This commit creates a new typedef for ssc use, which is the locale-aware
ANYOF node, plus an extra SV* at the end to hold the inversion list.
|
|
|
|
|
| |
This is in preparation for it to be called from more than one place, in
a future commit.
|
|
|
|
|
|
| |
If perl was compiled with -DDUMP_FDS, it would define dump_fds
and add it to the API, although even then nothing used it.
dump_fds() itself was buggy, only checking for fds 0 through 32.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All but one of scan_ident()'s callers already passed PL_bufend as
the removed argument; The one deviant was intuit_more(), which was
setting the "end of buffer" argument, to the next close-bracket.
This commit modifies intuit_more() to temporarily set PL_bufend and
then restore it.
This was done as groundwork for the following commit, which will add
more uses of PEEKSPACE() to scan_ident() in order to fix some whitespace
and line number bugs, and PEEKSPACE() modifies PL_bufend directly
if it encounters a newline at the end of the buffer -- that last bit
being why changing intuit_more() to modify-and-restore PL_bufend is
safe, since the end of the buffer will always be a ']'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on Yves's random branch work.
This version makes the new random number visible to external modules,
for example, List::Util's XS shuffle() implementation.
I've also added a 64-bit implementation when HAS_QUAD is true, this
should be significantly faster, even on 32-bit CPUs. This is intended to
produce exactly the same sequence as the original implementation.
The original version of this commit retained the "freebsd" name from
Yves's original work for the function and data structure names. I've
removed "freebsd" from most function names so the name isn't an issue
if we choose to replace the implementation,
|
| |
|
|
|
|
|
| |
gv_is_in_main() checks if an unqualified identifier is in the main::
stash.
|
|
|
|
|
| |
Namely, gv_magicalize no longer stores the GV into the stash, which
is gv_fetchpvn_flags' job.
|
|
|
|
|
|
| |
This bit is called when a GV already exists, but it's name is length-one
and it's on the main:: stash, so it might have multiple kinds of magic,
like $! and %!, or @+ and %+.
|
| |
|
|
|
|
|
| |
This commit takes a chunk of code out of gv_fetchpvn_flags and
turns it into two fuctions: parse_gv_stash_name and find_default_stash.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 7b6e8075e45ebc684565efbe3ce7b70435f20c79.
It turns out to be problematic, because it causes NULLs on the stack,
which XSUBs may trip on.
My main reason for it was actually to try to resolve some CPAN
failures, but it turns out that other fixes have removed the
need for that.
|
|
|
|
| |
These functions worked with ints instead of SSize_t,
|
|
|
|
|
|
|
| |
Now that NULL is used for a nonexistent element, it is easy for XS
code to pass it to av_push(). av_store already accepts NULL, and
av_push already works with it on non-debugging builds, so there is
really no need for this restriction.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
warn and die have special code (closest_cop) to find a nulled
nextstate op closest to the warn or die op, to get the line number
from it. This commit extends that capability to caller, so that
if (1) {
foo();
}
sub foo { warn +(caller)[2] }
shows the right line number.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the Unicode data is stored in native character set order, it is
rare to need to work with the Unicode order. Traditionally, the real
work was done in functions that worked with the Unicode order, and
wrapper functions (or macros) were used to translate to/from native.
There are two groups of functions: one that translates from code point
to UTF-8, and the other group goes the opposite direction.
This commit changes the base function that translates from UTF-8 to code
point to output native instead of Unicode. Those extremely rare
instances where Unicode output is needed instead will have to hand-wrap
calls to this function with a translation macro, as now described in the
API pod. Prior to this, it was the other way, the native was wrapped,
and the rare, strict Unicode wasn't. This eliminates a layer of
function call overhead for a common case.
The base function that translates from code point to UTF-8 retains its
Unicode input, as that is more natural to process. However, it is
de-emphasized in the pod, with the functionality description moved to
the pod for a native input wrapper function. And, those wrappers are
now macros in all cases; previously there was function call overhead
sometimes. (Equivalent exported functions are retained, however, for XS
code that uses the Perl_foo() form.)
I had hoped to rebase this commit, squashing it with an earlier commit
in this series, eliminating the use of a temporary function name change,
but the work involved turns out to be large, with no real payoff.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is in preparation for deprecating these functions, to force any
code that has been using these functions to change.
Since the Unicode tables are now stored in native order, these
functions should only rarely be needed.
However, the functionality of these is needed, and in actuality, on
ASCII platforms, the native functions are #defined to these. So what
this commit does is rename the functions to something else, and create
wrappers with the old names, so that anyone using them will get the
deprecation when it actually goes into effect: we are waiting for CPAN
files distributed with the core to change before doing the deprecation.
According to cpan.grep.me, this should affect fewer than 10 additional
CPAN distributions.
|
|
|
|
|
|
|
| |
Code should almost never be dealing with non-native code points
This is in preparation for later deprecation when our CPAN modules have
been converted away from using it.
|
|
|
|
|
|
|
| |
Now that the tables are stored in native order, there is almost no need
for code to be dealing in Unicode order.
According to grep.cpan.me, there are no uses of this function in CPAN.
|
|
|
|
|
|
|
|
|
| |
Now that all the tables are stored in native format, there is very
little reason to use this function; and those who do need this kind of
functionality should be using the bottom level routine, so as to make it
clear they are doing nonstandard stuff.
According to grep.cpan.me, there are no uses of this function in CPAN.
|
|
|
|
| |
This is in preparation for the current wrapee becoming deprecated
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These macros are no longer called in the Perl core. This commit turns
them into functions so that they can use gcc's deprecation facility.
I believe these were defective right from the beginning, and I have
struggled to understand what's going on. From the name, it appears
NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the
appropriate parameter indicates that. But that is impossible to do
correctly from that API, as for variant characters, it needs to return
two bytes. It could only work correctly if ch is an I8 byte, which
isn't native, and hence the name would be wrong.
Similar arguments for ASCII_TO_NEED.
The function S_append_utf8_from_native_byte(const U8 byte, U8** dest)
does what I think NATIVE_TO_NEED intended.
|