summaryrefslogtreecommitdiff
path: root/embed.h
Commit message (Collapse)AuthorAgeFilesLines
* Remove yyerror_svFather Chrysostomos2012-03-221-1/+0
| | | | | This was added in the previous commit, but was unnecessary, as it is not used anywhere and is not part of the public API.
* toke.c: yyerror cleanup.Brian Fraser2012-03-221-1/+4
|
* utf8.c: Add valid_utf8_to_uvuni() and valid_utf8_to_uvchr()Karl Williamson2012-03-191-0/+2
| | | | | | | | These functions are like utf8_to_uvuni() and utf8_to_uvchr(), but their name implies that the input UTF-8 has been validated. They are not currently documented, as it's best for XS writers to call the functions that do validation.
* utf8.c: Add utf8_to_uvchr_buf() and utf8_to_uvuni_buf()Karl Williamson2012-03-191-0/+2
| | | | | | | | The existing functions (utf8_to_uvchr and utf8_to_uvuni) have a deficiency in that they could read beyond the end of the input string if given malformed input. This commit creates two new functions which behave as the old ones did, but have an extra parameter each, which gives the upper limit to the string, so no read beyond it is done.
* rework how the trie logic handles the newer EXACT nodetypesYves Orton2012-03-031-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This cleans up and simplifies and extends how the trie logic interacts with the new node types. This change ultimately makes the EXACTFU, EXACTFU_SS, EXACTFU_NO_TRIE (renamed to EXACTFU_TRICKYFOLD) work properly with the trie engine regardless of whether the string is utf8 or latin1. This patch depends on the following: EXACT => utf8 or "binary" text EXACTFU => either pre-folded utf8, or latin1 that has to be folded as though it was utf8 EXACTFU_SS => special case of EXACTFU to handle \xDF/ss (affects latin1 treatment) EXACTFU_TRICKYFOLD => special case of EXACTFU to handle tricky non-latin1 fold rules EXACTF => "old style fold logic" untriable nodetype EXACTFA => (currently) untriable nodetype EXACTFL => (currently) untriable nodetype See the comments in regcomp.sym for these fold types. This patch involves a number of distinct, but related parts. Starting from compilation: * Simplify how we detect a triable sequence given the new nodetypes, this also probably fixed some "bugs" in how we detected certain sequences, like /||foo|bar/. * Simplify how we read EXACTFU nodes under utf8 by removing the now redundant folding logic (EXACTFU nodes under utf8 are prefolded). Also extend this logic to handle latin1 patterns properly (in conjunction with other changes) * Part of the problems associated with EXACTFU_SS and EXACTFU_TRICKYFOLD have to do with how the trie logic interacts with the minlen logic. This change handles both by pessimising the minlen when encounting these nodetypes. One observation is that the minlen logic is basically broken, and works only because it conflates bytes and codepoints in such a way that we more or less always get a value small enough that things work out anyway. Fixing that is properly is the job of another patch. * Part of the problem of doing folding under unicode rules is that there are a lot of foldings possible, some with strange rules. This means that the bitmap logic does not work correctly in all cases, as we currently do not have any way to populate it properly. So this patch disables the bitmap entirely when folding is involved until that is fixed. The end result of this is: we can TRIE/AHOCORASICK any sequence of EXACT, or EXACTFU (ish) nodes, regardless of utf8 or not, but we disable the bitmap when folding. A note for follow up relating to this patch is that the way EXACTFU_XXX nodes are currently dealt with we wont build the "maximal" trie because of their presence, instead creating a "jumptrie" consisting of either a leading EXACTFU node followed by a EXACTFU_XXX node, or vice versa. We should eventually address that.
* In perl.c, change S_open_script() to return rsfp.Nicholas Clark2012-02-271-1/+1
| | | | | | Previously it was being passed &rsfp as a parameter, because it was returning another value, fdscript. However, the return value has been ignored since commit cc69b689ee7c2745 removed suidperl in January 2009.
* perl #77654: quotemeta quotes non-ASCII consistentlyKarl Williamson2012-02-151-0/+3
| | | | | | | | | | As described in the pod changes in this commit, this changes quotemeta() to consistenly quote non-ASCII characters when used under unicode_strings. The behavior is changed for these and UTF-8 encoded strings to more closely align with Unicode's recommendations. The end result is that we *could* at some future point start using other characters as metacharacters than the 12 we do now.
* Add is_utf8_char_buf()Karl Williamson2012-02-111-0/+1
| | | | | | | | | | This function is to replace is_utf8_char(), and requires an extra parameter to ensure that it doesn't read beyond the end of the buffer. Convert is_utf8_char() and the only place in the Perl core to use the new one, assuming in each that there is enough space. Thanks to Jarkko Hietaniemi for suggesting this function name
* add wrap_op_checker() API functionZefram2012-02-111-0/+1
| | | | | This function provides a convenient and thread-safe way for modules to hook op checking.
* regcomp.c: Add ability to have compiled-in inversion listsKarl Williamson2012-02-091-0/+2
| | | | | | | | | This adds a routine that will take a C array and quickly create an inversion list that points to that array. Thus the array had better be exactly the internal form that is required for an inversion list. To make sure that this doesn't get out of sync, a new field in the list's header is created that is a combination of version-number/inversion-list-type.
* regcomp.c: Add ability to take union of a complementKarl Williamson2012-02-091-1/+1
| | | | | | | | | Previous commits have added the ability to the inversion list intersection routine to take the complement of one of its inputs. Likewise, for unions, this will be a frequent paradigm, and it is cheaper to do the complement of an input in the routine than to construct a new temporary that is the desired complement, and throw it away.
* regcomp.c: _invlist_subtract() becomes a macroKarl Williamson2012-02-091-1/+0
| | | | | | This function is no longer necessary, as it is just a call to the newly created _invlist_intersection_maybe_complement_2nd() with the correct parameters.
* regcomp.c: Add ability to take intersection of complementKarl Williamson2012-02-091-1/+1
| | | | | | | | | | | | | It turns out that it is a common paradigm to want to take the intersection of an inversion list with the complement of another inversion list. In fact, this is the how to subtract the second inversion list from the first, as what remains in the first after the subtraction is everything in it that is not in the second. It also turns out that it adds very few cycles to an intersection to complement one (or both, should we choose to) of the operands. By adding this capability, we don't have to create a copy of the inverted operand beforehand, just to throw it away.
* Move amagic hint checking to new functionFather Chrysostomos2012-01-241-0/+1
| | | | so that stringification will be able to use it, too.
* [rt.cpan.org #74289] Don’t make *CORE::foo read-onlyFather Chrysostomos2012-01-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | newATTRSUB requires the sub name to be passed to it wrapped up in a const op. Commit 8756617677dbd allowed it to accept a GV that way, since S_maybe_add_coresub (in gv.c) needed to pass it an existing GV not in the symbol table yet (to simplify code elsewhere). This had the inadvertent side-effect of making the GV read-only, since that’s what the check function for const ops does. Even if we were to call this a feature, it wouldn’t make sense as implemented, as GVs for non-ampable (&-able) subs like *CORE::chdir were not being made read-only. This commit adds a new flag to newATTRSUB, to allow a GV to be passed as the o parameter, instead of an op. While this may look as though it’s undoing the simplification in commit 8756617677dbd by adding more code, the new code is still conceptually simpler and more straightforward. Since newATTRSUB is in the API, I had to add a new _flags variant. (How did newATTRSUB get into the API to begin with?) In adding a test, I also discovered that ‘used once’ warnings were applying to these subs, which is obviously wrong. Commit 8756617677dbd caused that, too, as it was relying on the side-effect of newATTRSUB doing a GV lookup. This fixes that, too, by turning on the multi flag in S_maybe_add_coresub.
* regcomp.c: Refactor join_exact() to eliminate extra passesKarl Williamson2012-01-191-1/+1
| | | | | | | The strings in every EXACTFish node are examined for certain problematic sequences and code points. Prior to this patch, this was done in several passes, but this refactors the routine to do it in a single pass.
* regexec.c: Allow for returning shared swashKarl Williamson2012-01-131-0/+1
| | | | | | | | | | | | This changes the function that returns the swash associated with a bracketed character class so that it returns the original swash and not a copy. The function is renamed and made accessible only from within regexec.c, and a new wrapper function with the original name is created that just calls the other one and returns a copy of the swash. Thus, all access from outside regexec.c will use a copy which if overwritten will not harm others; while the option exists from within regexec.c to use a shared version.
* regcomp.c: Add _invlist_contents() to compactly dump inversion listKarl Williamson2012-01-131-0/+1
| | | | This will be used in future commits for debug traces
* utf8.c: Add ability to pass inversion list to _core_swash_init()Karl Williamson2012-01-131-1/+1
| | | | | | | Add a new parameter to _core_swash_init() that is an inversion list to add to the swash, along with a boolean to indicate if this inversion list is derived from a user-defined property. This capability will prove useful in future commits
* utf8.c: Add flag to swash_init() to not croak on errorKarl Williamson2012-01-131-1/+1
| | | | | | This adds the capability, to be used in future commits, for swash_ini() to return NULL instead of croaking if it can't find a property, so that the caller can choose how to handle the situation.
* regcomp.c: Add _invlist_populate_swatch()Karl Williamson2012-01-131-0/+1
| | | | This function will be used in future commits
* regcomp.c: Add invlist_search()Karl Williamson2012-01-131-0/+1
| | | | | This function does a binary search on an inversion list. It will be used in future commits
* utf8.c: New function to retrieve non-copy of swashKarl Williamson2012-01-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, swash_init returns a copy of the swash it finds. The core portions of the swash are read-only, and the non-read-only portions are derived from them. When the value for a code point is looked up, the results for it and adjacent code points are stored in a new element, so that the lookup never has to be performed again. But since a copy is returned, those results are stored only in the copy, and any other uses of the same logical stash don't have access to them, so the lookups have to be performed for each logical use. Here's an example. If you have 2 occurrences of /\p{Upper}/ in your program, there are 2 different swashes created, both initialized identically. As you start matching against code points, say "A" =~ /\p{Upper}/, the swashes diverge, as the results for each match are saved in the one applicable to that match. If you match "A" in each swash, it has to be looked up in each swash, and an (identical) element will be saved for it in each swash. This is wasteful of both time and memory. This patch renames the function and returns the original and not a copy, thus eliminating the overhead for stashes accessed through the new interface. The old function name is serviced by a new function which merely wraps the new name result with a copy, thus preserving the interface for existing calls. Thus, in the example above, there is only one swash, and matching "A" against it results in only one new element, and so the second use will find that, and not have to go out looking again. In a program with lots of regular expressions, the savings in time and memory can be quite large. The new name is restricted to use only in regcomp.c and utf8.c (unless XS code cheats the preprocessor), where we will code so as to not destroy the original's data. Otherwise, a change to that would change the definition of a Unicode property everywhere in the program. Note that there are no current callers of the new interface; these will be added in future commits.
* utf8.c: Change name of static functionKarl Williamson2012-01-131-1/+1
| | | | | This function has always confused me, as it doesn't return a swash, but a swatch.
* need backwards-compatile to_utf8_foo()Karl Williamson2012-01-081-4/+4
| | | | | | | | | | These 4 functions have been replaced by variants to_utf8_foo_flags(), but for XS code that called the old ones in the Perl_to_utf8_foo() forms, backwards compatibility versions need to be created. For calls of just the to_utf8_foo() forms, macros have been used to automatically call the new forms without the performance penalty of going through the compatibility functions.
* [perl #29070] Add vstring set-magicFather Chrysostomos2011-12-231-0/+1
| | | | | | | | | | | | Some operators, like pp_complement, assign their argument to TARG (which copies vstring magic), modify it in place, and then call set- magic. That’s supposed to work, but vstring magic was remaining as it was, such that ~v7 would still be treated as "v7" by vstring-aware code, even though the resulting string is not "\7". This commit adds vstring set-magic that checks to see whether the pv still matches the vstring. It cannot simply free the vstring magic, as that would prevent $x=v0 from working.
* Stop tell($glob_copy) from clearing PL_last_in_gvFather Chrysostomos2011-12-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bug is a side effect of rv2gv’s starting to return an incoercible mortal copy of a coercible glob in 5.14: $ perl5.12.4 -le 'open FH, "t/test.pl"; $fh=*FH; tell $fh; print tell' 0 $ perl5.14.0 -le 'open FH, "t/test.pl"; $fh=*FH; tell $fh; print tell' -1 In the first case, tell without arguments is returning the position of the filehandle. In the second case, tell with an explicit argument that happens to be a coercible glob (tell has an implicit rv2gv, so tell $fh is actu- ally tell *$fh) sets PL_last_in_gv to a mortal copy thereof, which is freed at the end of the statement, setting PL_last_in_gv to null. So there is no ‘last used’ handle by the time we get to the tell without arguments. This commit adds a new rv2gv flag that tells it not to copy the glob. By doing it unconditionally on the kidop, this allows tell(*$fh) to work the same way. Let’s hope nobody does tell(*{*$fh}), which will unset PL_last_in_gv because the inner * returns a mortal copy. This whole area is really icky. PL_last_in_gv should be refcounted, but that would cause handles to leak out of scope, breaking programs that rely on the auto-closing ‘feature’.
* utf8.c: Allow Changed behavior of utf8 under localeKarl Williamson2011-12-151-4/+5
| | | | | | | | | | This changes the 4 case changing functions to take extra parameters to specify if the utf8 string is to be processed under locale rules when the code points are < 256. The current functions are changed to macros that call the new versions so that current behavior is unchanged. An additional, static, function is created that makes sure that the 255/256 boundary is not crossed during the case change.
* Adjust substr offsets when using, not when creating, lvalueFather Chrysostomos2011-12-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When substr() occurs in potential lvalue context, the offsets are adjusted to the current string (negative being converted to positive, lengths reaching beyond the end of the string being shortened, etc.) as soon as the special lvalue to be returned is created. When that lvalue is assigned to, the original scalar is stringified once more. That implementation results in two bugs: 1) Fetch is called twice in a simple substr() assignment (except in void context, due to the special optimisation of commit 24fcb59fc). 2) These two calls are not equivalent: $SIG{__WARN__} = sub { warn "w ",shift}; sub myprint { print @_; $_[0] = 1 } print substr("", 2); myprint substr("", 2); The second one dies. The first one only warns. That’s mean. The error is also wrong, sometimes, if the original string is going to get longer before the substr lvalue is actually used. The behaviour of \substr($str, -1) if $str changes length is com- pletely undocumented. Before 5.10, it was documented as being unreli- able and subject to change. What this commit does is make the lvalue returned by substr remember the original arguments and only adjust the offsets when the assign- ment happens. This means that the following now prints z, instead of xyz (which is actually what I would expect): $str = "a"; $substr = \substr($str,-1); $str = "xyz"; print $substr;
* Break the -v code out from Perl_moreswitches() into S_minus_v().Nicholas Clark2011-12-041-0/+1
|
* Refactor S_usage() to take 0 parameters and exit directly().Nicholas Clark2011-12-041-1/+1
| | | | | This simplifies the code, as it's only called from one spot, in Perl_moreswitches().
* Make sitecustomize relocatableinc awareCarl Hayter2011-12-031-0/+1
| | | | | | | | | | When -Dusesitecustomize is used with -Duserelocatableinc, SITELIB_EXP/sitecustomize.pl is not found due to SITELIB_EXP having a '.../..' relocation path. This patch refactors the path relocation code from S_incpush() into S_mayberelocate() so that it can be used in both S_incpush() and in usesitecustomize's use of SITELIB_EXP.
* Make assignment over glob copies much fasterFather Chrysostomos2011-11-241-1/+1
| | | | | | | | | | | | | | | | | | | sv_force_normal is passed the SV_COW_DROP_PV flag if the scalar is about to be written over. That flag is not currently used. We can speed up assignment over fake GVs a lot by taking advantage of the flag. Before and after: $ time ./perl -e '$x = *foo, undef $x for 1..2000000' real 0m4.264s user 0m4.248s sys 0m0.007s $ time ./perl -e '$x = *foo, undef $x for 1..2000000' real 0m1.820s user 0m1.812s sys 0m0.005s
* Put sub redef warnings in one placeFather Chrysostomos2011-11-211-0/+3
| | | | | | | | | | The logic surrounding subroutine redefinition warnings (to warn or not to warn?) was in three places. Over time, they drifted apart, to the point that newXS was following completely different rules. It was only warning for redefinition of functions in the autouse namespace. Recent commits have brought it into conformity with the other redefi- nition warnings. Obviously it’s about time we put it in one function.
* Make const redef warnings default in newXSFather Chrysostomos2011-11-211-1/+1
| | | | | | | | | | | | | | | | | | | | There is no reason why constant redefinition warnings should be default warnings for sub foo(){1}, but not for newCONSTSUB (which calls newXS, which triggers the warning). To make this work properly, I also had to import sv.c’s ‘are these const subs from the same SV originally?’ logic. Constants created with XS can have NULL for the SV (they return an empty list or &PL_sv_undef), which means sv.c’s logic will stop *this=\&that from warning if both this and that are such XS-created constants. newCONSTSUB needed to be consistent with that. It required tweaking a test I added a few commits ago, which arguably shouldn’t have warned the way it was written. As of this commit (and before it, too, come to think of it), newXS_len_flags’s calling convention is quite awful and would need to be throughly re-thunk before being made into an API, or probably sim- ply never made into an API.
* Add newXS_len_flagsFather Chrysostomos2011-11-201-0/+1
| | | | | | | | | It accepts a length as well as a pv for the name. Since newXS_flags is marked with M in embed.fnc and is undocumented, technically policy allows me to change it, but there are files throughout cpan/ that use newXS_flags. So it seemed safer to add a new function.
* Add len flag to newCONSTSUB_flagsFather Chrysostomos2011-11-201-1/+1
| | | | | This function was added after 5.14.0, so it is not too late to change it. It is currently unused.
* Mention the variable name in the new length warningsFather Chrysostomos2011-11-181-1/+3
|
* Throw a helpful warning when someone tries length(@array) or length(%hash)Matthew Horsfall (alh)2011-11-181-0/+1
|
* [perl #70151] eval localises %^H at runtimeFather Chrysostomos2011-11-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | It doesn’t any more. Now the hints are localised in a separate inner scope surrounding the call to yyparse. This meant moving hint-handling code from pp_require and pp_entereval into S_doeval. Some tests in t/comp/hints.t were testing for the buggy behaviour, so they have been adjusted. Basically, this fixes sub import { eval "strict->import" } which should work the same way as sub import { strict->import } but was not working because %^H and $^H were being localised to the eval at its run time, not just its compilation. So the values assigned to %^H and $^H at the eval’s run time would simply be lost.
* embed.fnc: Make _to_upper_title_latin1() avail to pp.cKarl Williamson2011-11-111-1/+3
| | | | | | | If something like this were to be made more generally available, it would be better to have two in-line functions, to_upper_latin1() and to_title_latin1() that just call this underlying one with the correct final parameter.
* utf8.c: Faster latin1 foldingKarl Williamson2011-11-081-0/+1
| | | | | | | This adds a function similar to the ones for the other three case changing operations that works on latin1 characters only, and avoids having to go out to swashes. It changes to_uni_fold() and to_utf8_fold() to call it on the appropriate input
* utf8.c: Faster latin1 upper/title casingKarl Williamson2011-11-081-0/+1
| | | | | | | | | | | | | This creates a new function to handle upper/title casing code points in the latin1 range, and avoids using a swash to compute the case. This is because the correct values are compiled-in. And it calls this function when appropriate for both title and upper casing, in both utf8 and uni forms, Unlike the similar function for lower casing, it may make sense for this function to be called from outside utf8.c, but inside the core, so it is not static, but its name begins with an underscore.
* utf8.c: Refactor to_uni_lower()Karl Williamson2011-11-081-0/+1
| | | | | The portion that deals with Latin1 range characters is refactored into a separate (static) function, so that it can be called from more than one place.
* Warn for $[ ‘version’ checksFather Chrysostomos2011-11-011-0/+1
| | | | | | | Following Michael Schwern’s suggestion, here is a warning for those hapless folks who use $[ for version checks. It applies whenever $[ is used in one of: < > <= >=
* simplify op_dump() / -Dx sequencingDavid Mitchell2011-10-171-2/+0
| | | | | | | | | | | | | | | | Currently, whenever we dump an op tree, we first call sequence(), which walks the tree, creating address => sequence# mappings in PL_op_sequence. Then when individual ops or op-next fields are displayed, the sequence is looked up. Instead, do away with the initial walk, and just map addresses on request. This simplifies the code. As a deliberate side-effect, it no longer assigns a seq# of zero to null ops. This makes it easer to work out what's going on when you call op_dump() during a debugging session with partially constructed op-trees. It also removes the ambiguity in "====> 0" as to whether op_next is NULL or just points to an op_null.
* whichsig nul-cleanup.Brian Fraser2011-10-061-1/+3
| | | | | This adds _pv, _pvn, and _pv versions of whichsig() in mg.c, which get both kill "NAME" and %SIG lookup nul-clean.
* Oust cv_ckproto_lenFather Chrysostomos2011-10-061-1/+0
| | | | | | | | | | It is no longer used in core (having been superseded by cv_ckproto_len_flags), is unused on CPAN, and is not part of the API. The cv_ckproto ‘public’ macro is modified to use the _flags version. I put ‘public’ in quotes because, even before this commit, cv_ckproto was using a non-exported function, and hence could never have worked on a strict linker (or whatever you call it).
* toke.c, op.c, sv.c: Prototype parsing and checking are nul-and-UTF8 clean.Brian Fraser2011-10-061-0/+1
| | | | | | | | | | | | This means that eval "sub foo ($;\0whoops) { say @_ }" will correctly include \0whoops in the CV's prototype (while complaining about illegal characters), and that use utf8; BEGIN { $::{"foo"} = "\$\0L\351on" } BEGIN { eval "sub foo (\$\0L\x{c3}\x{a9}on) {};"; } will not warn about a mismatched prototype.
* universal.c: sv_does() UTF8 cleanup.Brian Fraser2011-10-061-0/+4
| | | | | This adds _sv, _pv, and _pvn forms to sv_does, and changes it to use sv_ref() instead of sv_reftype().