summaryrefslogtreecommitdiff
path: root/pp.c
Commit message (Collapse)AuthorAgeFilesLines
* Use the new utf8 to code point functionsKarl Williamson2012-03-191-3/+3
| | | | | These functions should be used in preference to the old ones which can read beyond the end of the input string.
* pp.c: Cast function argument to expected typeKarl Williamson2012-02-151-1/+1
| | | | This was breaking some builds
* pp_quote_meta: in locale, quote all UTF-8 Latin1 non-ASCIIKarl Williamson2012-02-151-1/+5
| | | | | | Under locale rules, this commit quotes all non-ASCII Latin1 characters in UTF-8 encoded strings. This provides consistency with this function and other functions, such as lc().
* pp.c: white-space onlyKarl Williamson2012-02-151-6/+6
|
* perl #77654: quotemeta quotes non-ASCII consistentlyKarl Williamson2012-02-151-8/+29
| | | | | | | | | | As described in the pod changes in this commit, this changes quotemeta() to consistenly quote non-ASCII characters when used under unicode_strings. The behavior is changed for these and UTF-8 encoded strings to more closely align with Unicode's recommendations. The end result is that we *could* at some future point start using other characters as metacharacters than the 12 we do now.
* pp_quotemeta(): Use more explicit macroKarl Williamson2012-02-151-1/+1
| | | | | | Changing the macro to a differently-named equivalent stresses that only ASCII characters may escape from being quoted. That is, all non-ASCII are quoted.
* Make pp_study a no-op, as discussed on p5pAbhijit Menon-Sen2012-02-131-0/+5
|
* use locale; fc(""); shouldn't taint.Brian Fraser2012-01-301-1/+1
| | | | | fc() brought to life its own version of #39028. fc(""), like lc("") and friends, shouldn't taint the result.
* pp.c: Can grow scalar by lessKarl Williamson2012-01-291-1/+3
| | | | | The max expansion when a Latin1 character is folded and converted to UTF-8 is '2' bytes per input byte, not the more general case.
* Implement the fc keyword and the \F string escape.Brian Fraser2012-01-291-0/+151
| | | | | | | | | | | | | | | | | | | | | | Along with the simple_casefolding and full_casefolding features. fc() stands for foldcase, a sort of pseudo case (like lowercase), which is used to implement Unicode casefolding. It maps a string to a form where all case differences are erased, so it's a locale-independent way of checking if two strings are the same, regardless of case. This functionality was, and still is, available through the regular expression engine -- /i matches would use casefolding internally. The fc keyword merely exposes this for easier access. Previously, one could attempt to case-insensitively test two strings for equality by doing lc($a) eq lc($b) But that might get you wrong results, for example in the case of \x{DF}, LATIN SMALL LETTER SHARP S.
* [perl #108480] $cow |= number undefines $cowFather Chrysostomos2012-01-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If a read-only scalar is passed to one of | & ^ and it decides to do a numeric operation, the numeric flags on the read-only scalar are turned off afterwards if they were not on to begin with. This was introduced in commit b20c4ee1f, which did so to stop $x | "0" from coercing the rhs and making it behave differently the second time through. What that commit did not take into account was that the read-only flag is set on cow scalars, and the same pp function is used for the assignment forms. So it was turning off the numeric flags after $cow |= 1, leaving $cow undef. I made this numeric flag-twiddling apply only to read-only scalars (supposedly), because that seemed the most conservative and acceptable change. I am actually in favour of extending it to all scalars, to make these operators less surprising. For that reason, this commit preserves the current behaviour with cows in the non-assignment case: they don’t get coerced into numbers. Changing them to work the same way as non-cow writable scalars would make things more consistent, but more consistently buggy. I would like to make this non-coercion apply to all scalars in 5.18. This commit simply skips the flag-twiddling on the lhs in the assign- ment case.
* Provide as much diagnostic information as possible in "panic: ..." messages.Nicholas Clark2012-01-161-1/+1
| | | | | | | | | | | | | | | The convention is that when the interpreter dies with an internal error, the message starts "panic: ". Historically, many panic messages had been terse fixed strings, which means that the out-of-range values that triggered the panic are lost. Now we try to report these values, as such panics may not be repeatable, and the original error message may be the only diagnostic we get when we try to find the cause. We can't report diagnostics when the panic message is generated by something other than croak(), as we don't have *printf-style format strings. Don't attempt to report values in panics related to *printf buffer overflows, as attempting to format the values to strings may repeat or compound the original error.
* Better fix for perl #107440Father Chrysostomos2012-01-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > > Actually, the simplest solution seem to be to put the av or hv on > > the mortals stack in pp_aassign and pp_undef, rather than in > > [ah]v_undef/clear. > > This makes me nervous. The tmps stack is typically cleared only on > statement boundaries, so we run the risks of > > * user-visible delaying of freeing elements; > * large tmps stack growth might be possible with > certain types of loop that repeatedly assign to an array without > freeing tmps (eg map? I think I fixed most map/grep tmps leakage > a > while back, but there may still be some edge cases). > > Surely an ENTER/SAVEFREESV/LEAVE inside pp_aassign is just as > efficient, > without any attendant risks? > > Also, although pp_aassign and pp_undef are now fixed, the > [ah]v_undef/clear functions aren't, and they're part of the public API > that can be called independently of pp_aassign etc. Ideally they > should > be fixed (so they don't crash in mid-loop), and their documentation > updated to point out that on return, their AV/HV arg may have been > freed. This commit takes care of the first part; it changes pp_aassign to use ENTER/SAVEFREESV/LEAVE and adds the same to h_freeentries (called both by hv_undef and hv_clear), av_undef and av_clear. It effectively reverts the C code part of 9f71cfe6ef2.
* [perl #107440] Save av/hv on mortals stack when clearingFather Chrysostomos2012-01-061-2/+2
| | | | | | | | | | | | | | | | | | | In pp_undef and pp_aassign, we should put the av or hv that is being cleared on the mortals stack (with an increased refcount), so that destructors fired during the clearing do not free the av or hv. I was going to put this in av_undef, etc., but pp_aassign also needs to access the aggregate after clearing it. We still get a crash with that approach. Putting the aggregate on the mortals stack in av_undef, av_clear and h_freeentries would work, too, but might cause the aggregate to leak too far. That may cause problems, e.g., if it is %^H, because it may last until the end of the current compilation unit. Directly inside a runloop (in a pp function), it should be OK to use the mortals stack, as it *will* be cleared ‘soon’. This seems the least intrusive approach.
* diag_listed_as galoreFather Chrysostomos2011-12-281-0/+1
| | | | | In two instances, I actually modified to code to avoid %s for a constant string, as it should be faster that way.
* diag_listed_as for lvalue scalar context errorsFather Chrysostomos2011-12-281-0/+2
|
* Fix two (er, four) sub:lvalue { &$x } bugsFather Chrysostomos2011-12-261-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The lvalue context that the last statement of an lvalue subroutine provides, when applied to entersub, causes the ops below the entersub to be complied oddly. Compare regular subs and lvalue subs: $ ./perl -Ilib -MO=Concise,bar,foo -e 'sub bar { &$x } sub foo:lvalue { &$x }' main::bar: 5 <1> leavesub[1 ref] K/REFC,1 ->(end) - <@> lineseq KP ->5 1 <;> nextstate(main 1 -e:1) v ->2 4 <1> entersub[t2] K/TARG ->5 - <1> ex-list K ->4 2 <0> pushmark s ->3 - <1> ex-rv2cv vK ->- - <1> ex-rv2sv sK/1 ->- 3 <#> gvsv[*x] s ->4 main::foo: b <1> leavesublv[1 ref] K/REFC,1 ->(end) - <@> lineseq KP ->b 6 <;> nextstate(main 2 -e:1) v ->7 a <1> entersub[t2] K/LVINTRO,TARG,INARGS ->b - <1> ex-list K ->a 7 <0> pushmark s ->8 9 <1> rv2cv vK/NO() ->a - <1> ex-rv2sv sK/1 ->9 8 <#> gvsv[*x] s ->9 -e syntax OK Notice that, in the second case, the rv2cv is not being optimised away. Under strict mode, this allows a sub call on a string, since rv2cv is not subject to strict refs. It’s this code in op.c:op_lvalue_flags that is to blame: if (kid->op_type != OP_GV) { /* Restore RV2CV to check lvalueness */ restore_2cv: if (kid->op_next && kid->op_next != kid) { /* Happens? */ okid->op_next = kid->op_next; kid->op_next = okid; } else okid->op_next = NULL; okid->op_type = OP_RV2CV; okid->op_targ = 0; okid->op_ppaddr = PL_ppaddr[OP_RV2CV]; okid->op_private |= OPpLVAL_INTRO; okid->op_private &= ~1; break; } This code is a little strange. Using rv2cv to check lvalueness causes the problem with strict refs. The lvalue check could just as well go in entersub. The way this is currently written (and this is something I missed when supposedly fixing lvalue subs), the rv2cv op will reject a non-lvalue subroutine even when the caller is not called in lvalue context. So we actually have two bugs. Presumably the check was done in rv2cv to keep entersub fast. But the code I quoted above is only part of it. There is also a special block to create an rv2cv op anew to deal with method calls. This commit fixes both issues by moving the run-time lvalueness check to entersub. I put it after PUSHSUB for speed in the most common case (when there is no error). PUSHSUB already calls a function (was_lvalue_sub) to determine whether the current sub call is happen- ing in lvalue context. So the check I am adding after it only has to check a couple of flags, instead of calling was_lvalue_sub itself. This also fixes a bug I introduced earlier in the 5.15.x series. This is supposed to die (in fact, I made the mistake earlier of changing tests that were checking for this, but so many tests were wrong back then it was an easy mistake to make): $ ./perl -Ilib -e 'sub bar {$x} sub foo:lvalue { bar}; foo=3' And a fourth bug I discovered when writing tests: sub AUTOLOAD :lvalue { warn autoloading; $x } sub _102486 { warn "called" } &{'_102486'} = 72; warn $x __END__ autoloading at - line 1. 72 at - line 4. And it happens even if there is an lvalue sub defined under that name: sub AUTOLOAD :lvalue { warn autoloading; $x } sub _102486 :lvalue { warn "called" } &{'_102486'} = 72; warn $x __END__ autoloading at - line 1. 72 at - line 4. Since the sub cannot be seen at compile time, the lvalue check in rv2cv, as mentioned above. The autoloading is happening in rv2cv, too, instead of entersub (the code is repeated), but the sub is not checked for definition first. It was put in rv2cv because it had to come before the lvalue check. Putting the latter in entersub lets us delete that repeated autoload code, which is completely wrong anyway.
* Don’t crash when writing to null hash elemFather Chrysostomos2011-12-241-2/+2
| | | | | | | It’s possible for XS code to create hash entries with null values. pp_helem and pp_slice were not taking that into account. In fact, the core produces such hash entries, but they are rarely visible from Perl. It’s good to check for them anyway.
* Stop tell($glob_copy) from clearing PL_last_in_gvFather Chrysostomos2011-12-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bug is a side effect of rv2gv’s starting to return an incoercible mortal copy of a coercible glob in 5.14: $ perl5.12.4 -le 'open FH, "t/test.pl"; $fh=*FH; tell $fh; print tell' 0 $ perl5.14.0 -le 'open FH, "t/test.pl"; $fh=*FH; tell $fh; print tell' -1 In the first case, tell without arguments is returning the position of the filehandle. In the second case, tell with an explicit argument that happens to be a coercible glob (tell has an implicit rv2gv, so tell $fh is actu- ally tell *$fh) sets PL_last_in_gv to a mortal copy thereof, which is freed at the end of the statement, setting PL_last_in_gv to null. So there is no ‘last used’ handle by the time we get to the tell without arguments. This commit adds a new rv2gv flag that tells it not to copy the glob. By doing it unconditionally on the kidop, this allows tell(*$fh) to work the same way. Let’s hope nobody does tell(*{*$fh}), which will unset PL_last_in_gv because the inner * returns a mortal copy. This whole area is really icky. PL_last_in_gv should be refcounted, but that would cause handles to leak out of scope, breaking programs that rely on the auto-closing ‘feature’.
* Name anon handles __ANONIO__Father Chrysostomos2011-12-151-1/+1
| | | | | | | | | | | | | rather than $__ANONIO__ That dollar sign *has* to have been a mistake. In ck_fun, the name was set to __ANONIO__, but it seems the change that added it (afd1915d43) did not account for the fact that a little later on the same function checks to makes sure it begins with a dollar sign, as it could only be a variable name. rv2gv’s use of $__ANONIO__ (added recently by yours truly) was just copying was ck_fun was doing.
* pp.c: Changing case of utf8 strings under locale uses locale for < 255Karl Williamson2011-12-151-4/+29
| | | | | | | | | As proposed on p5p and approved, this changes the functions uc(), lc(), ucfirst(), and lcfirst() to respect locale for code points < 255; and use Unicode semantics for those above 255. This results in better, but not perfect results, as noted in the changed pods, and brings these functions into line with how regular expression pattern matching already works.
* Adjust substr offsets when using, not when creating, lvalueFather Chrysostomos2011-12-041-75/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When substr() occurs in potential lvalue context, the offsets are adjusted to the current string (negative being converted to positive, lengths reaching beyond the end of the string being shortened, etc.) as soon as the special lvalue to be returned is created. When that lvalue is assigned to, the original scalar is stringified once more. That implementation results in two bugs: 1) Fetch is called twice in a simple substr() assignment (except in void context, due to the special optimisation of commit 24fcb59fc). 2) These two calls are not equivalent: $SIG{__WARN__} = sub { warn "w ",shift}; sub myprint { print @_; $_[0] = 1 } print substr("", 2); myprint substr("", 2); The second one dies. The first one only warns. That’s mean. The error is also wrong, sometimes, if the original string is going to get longer before the substr lvalue is actually used. The behaviour of \substr($str, -1) if $str changes length is com- pletely undocumented. Before 5.10, it was documented as being unreli- able and subject to change. What this commit does is make the lvalue returned by substr remember the original arguments and only adjust the offsets when the assign- ment happens. This means that the following now prints z, instead of xyz (which is actually what I would expect): $str = "a"; $substr = \substr($str,-1); $str = "xyz"; print $substr;
* Optimise substr assignment in void contextFather Chrysostomos2011-11-261-8/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In void context we can optimise substr($foo, $bar, $baz) = $replacement; to something like substr($foo, $bar, $baz, $replacement); except that the execution order must be preserved. So what we actu- ally do is substr($replacement, $foo, $bar, $baz); with a flag to indicate that the replacement comes first. This means we can also optimise assignment to two-argument substr the same way. Although optimisations are not supposed to change behaviour, this one does. • It stops substr assignment from calling get-magic twice, which means the optimisation makes things less buggy than usual. • It causes the uninitialized warning (for an undefined first argu- ment) to mention the substr operator, as it did before the previous commit, rather than the assignment operator. I think that sort of detail is minor enough. I had to make the warning about clobbering references apply whenever substr does a replacement, and not only when used as an lvalue. So four-argument substr now emits that warning. I would consider that a bug fix, too. Also, if the numeric arguments to four-argument substr and the replacement string are undefined, the order of the uninitialized warn- ings is slightly different, but is consistent regardless of whether the optimisation is in effect. I believe this will make 95% of substr assignments run faster. So there is less incentive to use what I consider the less readable form (the four-argument form, which is not self-documenting). Since I like naïve benchmarks, here are Before and After: $ time ./miniperl -le 'do{$x="hello"; substr ($x,0,0) = 34;0}for 1..1000000' real 0m2.391s user 0m2.381s sys 0m0.005s $ time ./miniperl -le 'do{$x="hello"; substr ($x,0,0) = 34;0}for 1..1000000' real 0m0.936s user 0m0.927s sys 0m0.005s
* Don’t coerce $x immediately in foo(substr $x...)Father Chrysostomos2011-11-261-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This program: #!perl -l sub myprint { print @_ } print substr *foo, 1; myprint substr *foo, 1; produces: main::foo Can't coerce GLOB to string in substr at - line 4. Ouch! I would expect \substr simply to give me a scalar that peeks into the original string, but without modifying the original until the return value of \substr is actually assigned to. But it turns out that it coerces the original into a string immedi- ately, unless it’s GMAGICAL. I find the exception for magical varia- ble rather befuddling. I can only imagine it was for efficency (since the stringified form will be overwritten when magic_setsubstr calls SvGETMAGIC), but that doesn’t make sense as the original variable can itself be modified between the return of the special lvalue and the assignment to that lvalue. Since magic_setsubstr itself coerces the variable into a string upon assignment to the lvalue, we can just remove the coercion code from pp_substr. But that causes double uninitialized warnings in cases like substr($undef, 0,0) = "lrep". That happens because pp_substr is still stringifying the variable (but without modifying it). It has to do that, as it looks at the length of the original string and accordingly adjusts the offsets stored in the lvalue if they are negative or if they extend beyond the end of the string. So this commit takes the simple route of avoiding the warning in pp_substr by only stringifying a variable that is SvOK if called in lvalue context. Hence, assignment to substr($tied...) will continue to call FETCH twice, but that is not a new bug. The ideal solution would be for the offsets to be translated in mg.c, rather than in pp_substr. But that would be a more involved change (including most of this commit, which is therefore not wasted) with potential backward-compatibility issue with negative numbers. A side effect it that the ‘Attempt to use reference as lvalue in substr’ warning now occurs during the assignment to the substr lvalue, rather that substr itself. This means it occurs even for tied varia- bles, so things are now more consistent. The example at the beginning could still croak if the glob were replaced with a null string, so this commit only partially allevi- ates the pain.
* Call FETCH once when chomping a tied refFather Chrysostomos2011-11-241-1/+1
|
* pp.c: Remove useless read-only check from S_do_chompFather Chrysostomos2011-11-241-1/+1
| | | | | | After sv_force_normal_flags, the scalar will no longer be read-only, except in those cases where sv_force_normal_flags croaks. So this check will never be true when SvFAKE was true.
* amagic_deref_call does not necessitate SPAGAINFather Chrysostomos2011-11-221-4/+0
| | | | | As amagic_deref_call pushes a new stack, PL_stack_sp will always have the same value before and after, so SPAGAIN is unnecessary.
* [perl #80628] __SUB__Father Chrysostomos2011-11-221-0/+19
| | | | | After much alternation, altercation and alteration, __SUB__ is finally here.
* Mention implicit $_ in y///r uninit warningFather Chrysostomos2011-11-191-2/+4
| | | | This brings it into conformity with y without the /r.
* expunge gratuitous Unicode punctuation in commentsZefram2011-11-161-1/+1
|
* pp.c: Make sure variable is initializedKarl Williamson2011-11-121-0/+1
| | | | | A compiler generated a warning about this. It is the degenerate case with an empty input, so isn't really a problem, but silence the warning
* pp.c: Call subroutine instead of repeat codeKarl Williamson2011-11-111-49/+34
| | | | | | | | Now that there is a function that can convert a latin1 character to title or upper case without going out to swashes, we can call it instead of repeating the code. There is the additional overhead of a function call, but this could be avoided if it comes down to it by making it in-line.
* pp.c: Remove macro no-longer calledKarl Williamson2011-11-111-10/+2
|
* pp.c: Call subroutine instead of repeat codeKarl Williamson2011-11-111-38/+2
| | | | | | | | | Now that there is a function that can convert a latin1 character to title or upper case without going out to swashes, we can call it instead of repeating the code. There is the additional overhead of a function call, but this could be avoided if it comes down to it by making it in-line. And this only happens when upper-casing y with diaresis, and the micro sign
* pp.c: White-space onlyKarl Williamson2011-11-111-7/+6
| | | | | This outdents and reflows comments as a result of the removal of a surrounding block
* pp.c: Call subroutine instead of repeat codeKarl Williamson2011-11-111-76/+1
| | | | | | | | | Now that toLOWER_utf8() and toTITLE_utf8() have the intelligence to skip going out to swashes for Latin1 code points, it's not so critical to bypass calling them for these (for speed). It simplifies things not to have the intelligence repeated. There is the additional overhead of two function calls (minus the branches saved), but these could be avoided if it comes down to it by making them in-line.
* pp.c: White-space onlyKarl Williamson2011-11-111-28/+28
| | | | | This outdents and reflows comments as a result of the removal of a surrounding block
* pp.c: Call subroutine instead of repeat codeKarl Williamson2011-11-111-24/+8
| | | | | | | | | Now that toUPPER_utf8() has the intelligence to skip going out to swashes for Latin1 code points, it's not so critical to bypass calling it for these (for speed). It simplifies things not to have the intelligence repeated. There is the additional overhead of two function calls (minus the branches saved), but these could be avoided if it comes down to it by making them in-line.
* pp.c: Add compiler hintKarl Williamson2011-11-111-1/+1
| | | | | Almost always the input to uc() will be one of the other 253 Latin1 characters rather than one of the three that gets here.
* pp.c: White-space onlyKarl Williamson2011-11-111-24/+23
| | | | | This outdents and reflows comments as a result of the removal of a surrounding block
* pp.c: Call subroutine instead of repeat codeKarl Williamson2011-11-111-19/+0
| | | | | | | | | Now that toLOWER_utf8() has the intelligence to skip going out to swashes for Latin1 code points, it's not so critical to bypass calling it for these (for speed). It simplifies things not to have the intelligence repeated. There is the additional overhead of two function calls (minus the branches saved), but these could be avoided if it comes down to it by making them in-line.
* [perl #96326] *{$io} should not be semi-definedFather Chrysostomos2011-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | gv_efullname4 produces undef if the GV points to no stash, instead of using __ANON__, as it does when the stash has no name. Instead of going through hoops to try and work around it elsewhere, fix gv_efullname4. This means that $x = *$io; $x .= "whate’er"; no longer produces an uninitialized warning. (The warning was rather strange, as defined() returned true.) This commit also gives the glob the name $__ANONIO__ (yes, with a dol- lar sign). It may seem a little strange, but there is precedent in other autovivified globs, such as those open() produces when it cannot determine the variable name (e.g, open $t->{fh}).
* pp.c: White space onlyKarl Williamson2011-10-171-16/+15
| | | | | | This outdents a block to the same level as the surrounding text, and reflows the comments to take advantage of the extra space and use fewer lines.
* pp.c: Remove disabled code for context sensitive lcKarl Williamson2011-10-171-70/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | This code was always #ifdef'd out. It would have been used to convert to a Greek final sigma from a non-final one, depending on context. The problem is that we can't know algorithmically if a final sigma is in order or not. I excerpt this quote, that I find persuasive, from correspondence from Father Chrysostomos, who knows Greek: "I cannot see how any algorithm can know to get it right. "The letter σ (or Σ in capitals) represents the number 200 in Greek numerals. Those are not just ancient Greek numerals, but are used on a regular basis even in modern Greek. In many printed books ς is used in place of ϛ, which represents the number 6. So if casefolding should change ͵ΑΣʹ to ͵αςʹ, or if an output layer changes ͵ασʹ similarly, it will be changing the number (from 1200 to 1006). You can’t get around it by checking for the Greek numeral sign (ʹ), as sometimes the tonos (΄), oxeia (´), or even the ASCII straight quote is used. And often in lists or chapter titles a dot is used instead of numeral sign. "Also, σ is commonly used at the ends of abbreviations. Changing ‘βλέπε σ. 16’ (‘see page 16’) to ‘βλέπε ς. 16’ is not acceptable. "So, no, I don’t think a programming language should be fiddling with σ versus ς. (A word processor is another matter.)"
* do not return useless value from void-context substrChip Salzenberg2011-10-101-9/+14
|
* Resolve XS AUTOLOAD-prototype conflictFather Chrysostomos2011-10-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Did you know that a subroutine’s prototype can be modified with s///? Don’t look: *AUTOLOAD = *Internals'SvREFCNT; my $f = "Just another "; eval{main->$f}; print prototype AUTOLOAD; $f =~ s/Just another /Perl hacker,\n/; print prototype AUTOLOAD; You did look, didn’t you? You must admit that’s creepy. The problem goes back to this: commit adb5a9ae91a0bed93d396bb0abda99831f9e2e6f Author: Doug MacEachern <dougm@covalent.net> Date: Sat Jan 6 01:30:05 2001 -0800 [patch] xsub AUTOLOAD fix/optimization Message-ID: <Pine.LNX.4.10.10101060924280.24460-100000@mojo.covalent.net> Allow AUTOLOAD to be an xsub and allow such xsubs to avoid use of $AUTOLOAD. p4raw-id: //depot/perl@8362 which includes this: + if (CvXSUB(cv)) { + /* rather than lookup/init $AUTOLOAD here + * only to have the XSUB do another lookup for $AUTOLOAD + * and split that value on the last '::', + * pass along the same data via some unused fields in the CV + */ + CvSTASH(cv) = stash; + SvPVX(cv) = (char *)name; /* cast to loose constness warning */ + SvCUR(cv) = len; + return gv; + } That ‘unused’ field is not unused. It’s where the prototype is stored. So, not only is it clobbering the prototype, it’s also leak- ing it by assigning over the top of SvPVX. Furthermore, it’s blindly assigning someone else’s string, which could be freed before it’s even used. Since it has been documented for a long time that SvPVX contains the name of the AUTOLOADed sub, and since the use of SvPVX for prototypes is documented nowhere, we have to preserve the former. So this commit makes the prototype and the sub name share the same buffer, in a manner resembling that which CvFILE used before I changed it with bad4ae38. There are two new internal macros, CvPROTO and CvPROTOLEN for retriev- ing the prototype.
* gv.c, op.c, pp.c: Stash-injected prototypes and prototype() are UTF-8 clean.Brian Fraser2011-10-061-1/+1
| | | | | | | | This makes perl -E '$::{example} = "\x{30cb}"; say prototype example;' store and fetch the correctly flagged prototype. With this, all TODO tests in gv.t pass; The next commit will deal with making the parsing of prototypes nul-clean.
* pp.c: Got pp_gelem nul-clean.Brian Fraser2011-10-061-11/+12
|
* pp.c: Make warnings utf8-cleanBrian Fraser2011-10-061-3/+5
|
* pp.c: pp_substr for UTF-8 globs.Brian Fraser2011-10-061-2/+2
| | | | | Since typeglobs may have the UTF8 flag set now, we need to avoid testing SvCUR on a potential glob, as that would trip an assertion.