delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Label UTF8 cleanup	Brian Fraser	2012-03-25	1	-4/+4
\| \| \| \| \|	This meant changing LABEL's definition in perly.y, so most of this commit is actually from the regened files.
*	op.c: Warnings cleanup.	Brian Fraser	2012-03-23	1	-8/+29
\|
*	Remove yyerror_sv	Father Chrysostomos	2012-03-22	1	-5/+0
\| \| \| \| \|	This was added in the previous commit, but was unnecessary, as it is not used anywhere and is not part of the public API.
*	toke.c: yyerror cleanup.	Brian Fraser	2012-03-22	1	-1/+16
\|
*	Deprecate utf8_to_uvchr() and utf8_to_uvuni()	Karl Williamson	2012-03-19	1	-0/+2
\| \| \| \| \| \|	These functions can read beyond the end of their input strings if presented with malformed UTF-8 input. Perl core code has been converted to use other functions instead of these.
*	utf8.c: Add valid_utf8_to_uvuni() and valid_utf8_to_uvchr()	Karl Williamson	2012-03-19	1	-0/+10
\| \| \| \| \| \| \| \|	These functions are like utf8_to_uvuni() and utf8_to_uvchr(), but their name implies that the input UTF-8 has been validated. They are not currently documented, as it's best for XS writers to call the functions that do validation.
*	utf8.c: Add utf8_to_uvchr_buf() and utf8_to_uvuni_buf()	Karl Williamson	2012-03-19	1	-0/+12
\| \| \| \| \| \| \| \|	The existing functions (utf8_to_uvchr and utf8_to_uvuni) have a deficiency in that they could read beyond the end of the input string if given malformed input. This commit creates two new functions which behave as the old ones did, but have an extra parameter each, which gives the upper limit to the string, so no read beyond it is done.
*	rework how the trie logic handles the newer EXACT nodetypes	Yves Orton	2012-03-03	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This cleans up and simplifies and extends how the trie logic interacts with the new node types. This change ultimately makes the EXACTFU, EXACTFU_SS, EXACTFU_NO_TRIE (renamed to EXACTFU_TRICKYFOLD) work properly with the trie engine regardless of whether the string is utf8 or latin1. This patch depends on the following: EXACT => utf8 or "binary" text EXACTFU => either pre-folded utf8, or latin1 that has to be folded as though it was utf8 EXACTFU_SS => special case of EXACTFU to handle \xDF/ss (affects latin1 treatment) EXACTFU_TRICKYFOLD => special case of EXACTFU to handle tricky non-latin1 fold rules EXACTF => "old style fold logic" untriable nodetype EXACTFA => (currently) untriable nodetype EXACTFL => (currently) untriable nodetype See the comments in regcomp.sym for these fold types. This patch involves a number of distinct, but related parts. Starting from compilation: * Simplify how we detect a triable sequence given the new nodetypes, this also probably fixed some "bugs" in how we detected certain sequences, like /\|\|foo\|bar/. * Simplify how we read EXACTFU nodes under utf8 by removing the now redundant folding logic (EXACTFU nodes under utf8 are prefolded). Also extend this logic to handle latin1 patterns properly (in conjunction with other changes) * Part of the problems associated with EXACTFU_SS and EXACTFU_TRICKYFOLD have to do with how the trie logic interacts with the minlen logic. This change handles both by pessimising the minlen when encounting these nodetypes. One observation is that the minlen logic is basically broken, and works only because it conflates bytes and codepoints in such a way that we more or less always get a value small enough that things work out anyway. Fixing that is properly is the job of another patch. * Part of the problem of doing folding under unicode rules is that there are a lot of foldings possible, some with strange rules. This means that the bitmap logic does not work correctly in all cases, as we currently do not have any way to populate it properly. So this patch disables the bitmap entirely when folding is involved until that is fixed. The end result of this is: we can TRIE/AHOCORASICK any sequence of EXACT, or EXACTFU (ish) nodes, regardless of utf8 or not, but we disable the bitmap when folding. A note for follow up relating to this patch is that the way EXACTFU_XXX nodes are currently dealt with we wont build the "maximal" trie because of their presence, instead creating a "jumptrie" consisting of either a leading EXACTFU node followed by a EXACTFU_XXX node, or vice versa. We should eventually address that.
*	In perl.c, change S_open_script() to return rsfp.	Nicholas Clark	2012-02-27	1	-4/+3
\| \| \| \| \| \|	Previously it was being passed &rsfp as a parameter, because it was returning another value, fdscript. However, the return value has been ignored since commit cc69b689ee7c2745 removed suidperl in January 2009.
*	perl #77654: quotemeta quotes non-ASCII consistently	Karl Williamson	2012-02-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	As described in the pod changes in this commit, this changes quotemeta() to consistenly quote non-ASCII characters when used under unicode_strings. The behavior is changed for these and UTF-8 encoded strings to more closely align with Unicode's recommendations. The end result is that we could at some future point start using other characters as metacharacters than the 12 we do now.
*	Deprecate is_utf8_char()	Karl Williamson	2012-02-11	1	-0/+1
\| \| \| \| \| \| \|	This function assumes that there is enough space in the buffer to read however many bytes are indicated by the first byte in the alleged UTF-8 encoded string. This may not be true, and so it can read beyond the buffer end. is_utf8_char_buf() should be used instead.
*	Add is_utf8_char_buf()	Karl Williamson	2012-02-11	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	This function is to replace is_utf8_char(), and requires an extra parameter to ensure that it doesn't read beyond the end of the buffer. Convert is_utf8_char() and the only place in the Perl core to use the new one, assuming in each that there is enough space. Thanks to Jarkko Hietaniemi for suggesting this function name
*	add wrap_op_checker() API function	Zefram	2012-02-11	1	-0/+6
\| \| \| \| \|	This function provides a convenient and thread-safe way for modules to hook op checking.
*	regcomp.c: Add ability to have compiled-in inversion lists	Karl Williamson	2012-02-09	1	-0/+12
\| \| \| \| \| \| \| \| \|	This adds a routine that will take a C array and quickly create an inversion list that points to that array. Thus the array had better be exactly the internal form that is required for an inversion list. To make sure that this doesn't get out of sync, a new field in the list's header is created that is a combination of version-number/inversion-list-type.
*	regcomp.c: Add ability to take union of a complement	Karl Williamson	2012-02-09	1	-3/+7
\| \| \| \| \| \| \| \| \|	Previous commits have added the ability to the inversion list intersection routine to take the complement of one of its inputs. Likewise, for unions, this will be a frequent paradigm, and it is cheaper to do the complement of an input in the routine than to construct a new temporary that is the desired complement, and throw it away.
*	regcomp.c: _invlist_subtract() becomes a macro	Karl Williamson	2012-02-09	1	-4/+2
\| \| \| \| \| \|	This function is no longer necessary, as it is just a call to the newly created _invlist_intersection_maybe_complement_2nd() with the correct parameters.
*	regcomp.c: Add ability to take intersection of complement	Karl Williamson	2012-02-09	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out that it is a common paradigm to want to take the intersection of an inversion list with the complement of another inversion list. In fact, this is the how to subtract the second inversion list from the first, as what remains in the first after the subtraction is everything in it that is not in the second. It also turns out that it adds very few cycles to an intersection to complement one (or both, should we choose to) of the operands. By adding this capability, we don't have to create a copy of the inverted operand beforehand, just to throw it away.
*	regcomp.c: Chg invlist_union() to accept NULL first param	Karl Williamson	2012-02-09	1	-2/+1
\| \| \| \| \| \| \|	It is common in a loop to keep adding inversion lists to a current running total. But the first time through, the current union list needs to be initialized from NULL. This puts that code in the function instead of the callers each having to do it.
*	really allow pad_findmy's retval to be ignored	Zefram	2012-02-01	1	-2/+0
\| \| \| \| \|	402642c6301a1dbc64ea3acc8beee35078afee26 only changed pad_findmy_pvn. pad_findmy_pv and pad_findmy_sv need the same treatment.
*	Allow pad_findmy’s retval to be ignored	Father Chrysostomos	2012-02-01	1	-1/+0
\|
*	Make SvPVbyte return bytes for non-PVs	Father Chrysostomos	2012-01-31	1	-1/+1
\| \| \| \| \|	Instead of just doing SvPV on something that is not a PV, SvPVbyte should actually do what it is advertised as doing.
*	[perl #108994] Stop SvPVutf8 from coercing SVs	Father Chrysostomos	2012-01-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	In shouldn’t destroy globs or references passed to it, or try to coerce them if they are read-only or incoercible. I added tests for SvPVbyte at the same time, even though it was not exhibiting the same problems, as sv_utf8_downgrade doesn’t try to coerce anything. (SvPVbyte has its own set of bugs, which I hope to fix in fifthcoming commits.)
*	Move amagic hint checking to new function	Father Chrysostomos	2012-01-24	1	-0/+1
\| \| \| \|	so that stringification will be able to use it, too.
*	[rt.cpan.org #74289] Don’t make *CORE::foo read-only	Father Chrysostomos	2012-01-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	newATTRSUB requires the sub name to be passed to it wrapped up in a const op. Commit 8756617677dbd allowed it to accept a GV that way, since S_maybe_add_coresub (in gv.c) needed to pass it an existing GV not in the symbol table yet (to simplify code elsewhere). This had the inadvertent side-effect of making the GV read-only, since that’s what the check function for const ops does. Even if we were to call this a feature, it wouldn’t make sense as implemented, as GVs for non-ampable (&-able) subs like *CORE::chdir were not being made read-only. This commit adds a new flag to newATTRSUB, to allow a GV to be passed as the o parameter, instead of an op. While this may look as though it’s undoing the simplification in commit 8756617677dbd by adding more code, the new code is still conceptually simpler and more straightforward. Since newATTRSUB is in the API, I had to add a new _flags variant. (How did newATTRSUB get into the API to begin with?) In adding a test, I also discovered that ‘used once’ warnings were applying to these subs, which is obviously wrong. Commit 8756617677dbd caused that, too, as it was relying on the side-effect of newATTRSUB doing a GV lookup. This fixes that, too, by turning on the multi flag in S_maybe_add_coresub.
*	regcomp.c: Change variable meaning and hence name	Karl Williamson	2012-01-19	1	-2/+2
\| \| \| \| \| \| \|	I think it is clearer to note that what happens here is that the node can match fewer characters than what it would normally be thought to, and hence the returned value should be subtracted; it also means that the absolute value need not be taken
*	regcomp.c: Refactor join_exact() to eliminate extra passes	Karl Williamson	2012-01-19	1	-3/+4
\| \| \| \| \| \| \|	The strings in every EXACTFish node are examined for certain problematic sequences and code points. Prior to this patch, this was done in several passes, but this refactors the routine to do it in a single pass.
*	regcomp.c: Change param to join_exact()	Karl Williamson	2012-01-19	1	-2/+2
\| \| \| \| \| \|	This changes a parameter to this function to instead of changing a running total, return the actual value computed by the function; and it changes the calling areas of code to compensate.
*	regexec.c: Allow for returning shared swash	Karl Williamson	2012-01-13	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	This changes the function that returns the swash associated with a bracketed character class so that it returns the original swash and not a copy. The function is renamed and made accessible only from within regexec.c, and a new wrapper function with the original name is created that just calls the other one and returns a copy of the swash. Thus, all access from outside regexec.c will use a copy which if overwritten will not harm others; while the option exists from within regexec.c to use a shared version.
*	regcomp.c: Add _invlist_contents() to compactly dump inversion list	Karl Williamson	2012-01-13	1	-0/+6
\| \| \| \|	This will be used in future commits for debug traces
*	utf8.c: Add ability to pass inversion list to _core_swash_init()	Karl Williamson	2012-01-13	1	-1/+1
\| \| \| \| \| \| \|	Add a new parameter to _core_swash_init() that is an inversion list to add to the swash, along with a boolean to indicate if this inversion list is derived from a user-defined property. This capability will prove useful in future commits
*	utf8.c: Add flag to swash_init() to not croak on error	Karl Williamson	2012-01-13	1	-1/+1
\| \| \| \| \| \|	This adds the capability, to be used in future commits, for swash_ini() to return NULL instead of croaking if it can't find a property, so that the caller can choose how to handle the situation.
*	regcomp.c: Add _invlist_populate_swatch()	Karl Williamson	2012-01-13	1	-0/+6
\| \| \| \|	This function will be used in future commits
*	regcomp.c: Add invlist_search()	Karl Williamson	2012-01-13	1	-0/+6
\| \| \| \| \|	This function does a binary search on an inversion list. It will be used in future commits
*	utf8.c: New function to retrieve non-copy of swash	Karl Williamson	2012-01-13	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, swash_init returns a copy of the swash it finds. The core portions of the swash are read-only, and the non-read-only portions are derived from them. When the value for a code point is looked up, the results for it and adjacent code points are stored in a new element, so that the lookup never has to be performed again. But since a copy is returned, those results are stored only in the copy, and any other uses of the same logical stash don't have access to them, so the lookups have to be performed for each logical use. Here's an example. If you have 2 occurrences of /\p{Upper}/ in your program, there are 2 different swashes created, both initialized identically. As you start matching against code points, say "A" =~ /\p{Upper}/, the swashes diverge, as the results for each match are saved in the one applicable to that match. If you match "A" in each swash, it has to be looked up in each swash, and an (identical) element will be saved for it in each swash. This is wasteful of both time and memory. This patch renames the function and returns the original and not a copy, thus eliminating the overhead for stashes accessed through the new interface. The old function name is serviced by a new function which merely wraps the new name result with a copy, thus preserving the interface for existing calls. Thus, in the example above, there is only one swash, and matching "A" against it results in only one new element, and so the second use will find that, and not have to go out looking again. In a program with lots of regular expressions, the savings in time and memory can be quite large. The new name is restricted to use only in regcomp.c and utf8.c (unless XS code cheats the preprocessor), where we will code so as to not destroy the original's data. Otherwise, a change to that would change the definition of a Unicode property everywhere in the program. Note that there are no current callers of the new interface; these will be added in future commits.
*	embed.fnc: swash_init() return value should not be ignored	Karl Williamson	2012-01-13	1	-0/+1
\| \| \| \|	Otherwise can have memory leaks
*	utf8.c: Change name of static function	Karl Williamson	2012-01-13	1	-2/+2
\| \| \| \| \|	This function has always confused me, as it doesn't return a swash, but a swatch.
*	need backwards-compatile to_utf8_foo()	Karl Williamson	2012-01-08	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	These 4 functions have been replaced by variants to_utf8_foo_flags(), but for XS code that called the old ones in the Perl_to_utf8_foo() forms, backwards compatibility versions need to be created. For calls of just the to_utf8_foo() forms, macros have been used to automatically call the new forms without the performance penalty of going through the compatibility functions.
*	Eliminate ‘negative’ features	Father Chrysostomos	2011-12-24	1	-1/+1
\| \| \| \| \| \|	Now that we have hints in $^H to indicate the default feature bun- dle, there is no need for entries in %^H that turn features off by their presence.
*	[perl #29070] Add vstring set-magic	Father Chrysostomos	2011-12-23	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Some operators, like pp_complement, assign their argument to TARG (which copies vstring magic), modify it in place, and then call set- magic. That’s supposed to work, but vstring magic was remaining as it was, such that ~v7 would still be treated as "v7" by vstring-aware code, even though the resulting string is not "\7". This commit adds vstring set-magic that checks to see whether the pv still matches the vstring. It cannot simply free the vstring magic, as that would prevent $x=v0 from working.
*	Stop tell($glob_copy) from clearing PL_last_in_gv	Father Chrysostomos	2011-12-17	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This bug is a side effect of rv2gv’s starting to return an incoercible mortal copy of a coercible glob in 5.14: $ perl5.12.4 -le 'open FH, "t/test.pl"; $fh=FH; tell $fh; print tell' 0 $ perl5.14.0 -le 'open FH, "t/test.pl"; $fh=FH; tell $fh; print tell' -1 In the first case, tell without arguments is returning the position of the filehandle. In the second case, tell with an explicit argument that happens to be a coercible glob (tell has an implicit rv2gv, so tell $fh is actu- ally tell $fh) sets PL_last_in_gv to a mortal copy thereof, which is freed at the end of the statement, setting PL_last_in_gv to null. So there is no ‘last used’ handle by the time we get to the tell without arguments. This commit adds a new rv2gv flag that tells it not to copy the glob. By doing it unconditionally on the kidop, this allows tell($fh) to work the same way. Let’s hope nobody does tell({$fh}), which will unset PL_last_in_gv because the inner * returns a mortal copy. This whole area is really icky. PL_last_in_gv should be refcounted, but that would cause handles to leak out of scope, breaking programs that rely on the auto-closing ‘feature’.
*	Disable $[ under 5.16	Father Chrysostomos	2011-12-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the array_base feature to feature.pm Perl_feature_is_enabled has been modified to use PL_curcop, rather than PL_hintgv, so it can work with run-time hints as well. (PL_curcop holds the current state op at run time, and &PL_compiling at compile time, so it works for both.) The hints in $^H are not stored in the same place at compile time and run time, so the FEATURE_IS_ENABLED macro has been modified to check first whether PL_curop == &PL_compiling. Since array_base is on by default with no hint for it in %^H, it is a ‘negative’ feature, whose entry in %^H turns it off. feature.pm has been modified to support such negative features. The new FEATURE_IS_ENABLED_d can check whether such default features are enabled. This does make things less efficient, as every version declaration now loads feature.pm to disable all features (including turning off array_base, which entails adding an entry to %^H) before loading the new bundle. I have plans to make this more efficient.
*	utf8.c: Change prototypes of two functions	Karl Williamson	2011-12-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	_to_uni_fold_flags() and _to_fold_latin1() now have their flags parameter be a boolean. The name 'flags' is retained in case the usage ever expands instead of calling it by the name of the only use this currently has. This is as a result of confusion between this and _to_ut8_fold_flags() which does have more than one flag possibility.
*	utf8.c: Allow Changed behavior of utf8 under locale	Karl Williamson	2011-12-15	1	-13/+33
\| \| \| \| \| \| \| \| \| \|	This changes the 4 case changing functions to take extra parameters to specify if the utf8 string is to be processed under locale rules when the code points are < 256. The current functions are changed to macros that call the new versions so that current behavior is unchanged. An additional, static, function is created that makes sure that the 255/256 boundary is not crossed during the case change.
*	Adjust substr offsets when using, not when creating, lvalue	Father Chrysostomos	2011-12-04	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When substr() occurs in potential lvalue context, the offsets are adjusted to the current string (negative being converted to positive, lengths reaching beyond the end of the string being shortened, etc.) as soon as the special lvalue to be returned is created. When that lvalue is assigned to, the original scalar is stringified once more. That implementation results in two bugs: 1) Fetch is called twice in a simple substr() assignment (except in void context, due to the special optimisation of commit 24fcb59fc). 2) These two calls are not equivalent: $SIG{__WARN__} = sub { warn "w ",shift}; sub myprint { print @_; $_[0] = 1 } print substr("", 2); myprint substr("", 2); The second one dies. The first one only warns. That’s mean. The error is also wrong, sometimes, if the original string is going to get longer before the substr lvalue is actually used. The behaviour of \substr($str, -1) if $str changes length is com- pletely undocumented. Before 5.10, it was documented as being unreli- able and subject to change. What this commit does is make the lvalue returned by substr remember the original arguments and only adjust the offsets when the assign- ment happens. This means that the following now prints z, instead of xyz (which is actually what I would expect): $str = "a"; $substr = \substr($str,-1); $str = "xyz"; print $substr;
*	Break the -v code out from Perl_moreswitches() into S_minus_v().	Nicholas Clark	2011-12-04	1	-0/+3
\|
*	Refactor S_usage() to take 0 parameters and exit directly().	Nicholas Clark	2011-12-04	1	-4/+2
\| \| \| \| \|	This simplifies the code, as it's only called from one spot, in Perl_moreswitches().
*	Make sitecustomize relocatableinc aware	Carl Hayter	2011-12-03	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	When -Dusesitecustomize is used with -Duserelocatableinc, SITELIB_EXP/sitecustomize.pl is not found due to SITELIB_EXP having a '.../..' relocation path. This patch refactors the path relocation code from S_incpush() into S_mayberelocate() so that it can be used in both S_incpush() and in usesitecustomize's use of SITELIB_EXP.
*	‘Inline’ S_sv_unglob	Father Chrysostomos	2011-11-24	1	-1/+1
\| \| \| \| \|	S_sv_unglob is only called in one place, so inline it (but cheat, to preserve blame history).
*	Make assignment over glob copies much faster	Father Chrysostomos	2011-11-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sv_force_normal is passed the SV_COW_DROP_PV flag if the scalar is about to be written over. That flag is not currently used. We can speed up assignment over fake GVs a lot by taking advantage of the flag. Before and after: $ time ./perl -e '$x = foo, undef $x for 1..2000000' real 0m4.264s user 0m4.248s sys 0m0.007s $ time ./perl -e '$x = foo, undef $x for 1..2000000' real 0m1.820s user 0m1.812s sys 0m0.005s
*	Put sub redef warnings in one place	Father Chrysostomos	2011-11-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	The logic surrounding subroutine redefinition warnings (to warn or not to warn?) was in three places. Over time, they drifted apart, to the point that newXS was following completely different rules. It was only warning for redefinition of functions in the autouse namespace. Recent commits have brought it into conformity with the other redefi- nition warnings. Obviously it’s about time we put it in one function.