summaryrefslogtreecommitdiff
path: root/intrpvar.h
Commit message (Collapse)AuthorAgeFilesLines
* Use an array for some inversion listsKarl Williamson2012-12-221-6/+2
| | | | | Previous commits have placed some inversion list pointers into arrays. This commit extends that to another group of inversion lists
* Use an array for some inversion listsKarl Williamson2012-12-221-29/+2
| | | | | An earlier commit placed some inversion list pointers into an array. This commit extends that to another group of inversion lists.
* Use array for some inversion listsKarl Williamson2012-12-221-8/+1
| | | | | | This patch creates an array pointing to the inversion lists that cover the Latin-1 ranges for Posix character classes, and uses it instead of the individual variables previously referred to.
* intrpvar.h: Place some swash pointers in an arrayKarl Williamson2012-12-221-9/+1
|
* intrpvar.h: #include handy.hKarl Williamson2012-12-221-0/+2
| | | | This will allow some mnemonics to be used in future commits
* regexec.c: More efficient Korean \X processingKarl Williamson2012-12-161-1/+0
| | | | | | This refactors the code slightly that checks for Korean precomposed syllables in \X. It eliminates the PL_variable formerly used to keep track of things.
* Zap PL_glob_indexFather Chrysostomos2012-12-091-2/+0
| | | | As of the previous commit, nothing is using it.
* Add functions for getting ctype ALNUMCKarl Williamson2012-12-091-0/+1
| | | | | | | We think this is meant to stand for C's alphanumeric, that is what is matched by POSIX [:alnum:]. There were not functions and a dedicated swash available for accessing it. Future commits will want to use these.
* intrpvar.h: Add commentKarl Williamson2012-12-091-1/+1
|
* intrpvar.h: Use #define instead of hard-coded numberKarl Williamson2012-12-091-1/+1
| | | | The number 12 is mysterious as to why we are using it otherwise.
* Disable PL_sawampersandFather Chrysostomos2012-11-271-0/+2
| | | | | | | | | | | | | | | | | | | PL_sawampersand actually causes bugs (e.g., perl #4289), because the behaviour changes. eval '$&' after a match will produce different results depending on whether $& was seen before the match. Using copy-on-write for the pre-match copy (preceding patches do that) alleviates the slowdown caused by mentioning $&. The copy doesn’t happen unless the string is modified after the match. It’s now a post- match copy. So we no longer need to do things differently depending on whether $& has been seen. PL_sawampersand is now #defined to be equal to what it would be if every program began with $',$&,$`. I left the PL_sawampersand code in place, in case this commit proves immature. Running Configure with -Accflags=PERL_SAWAMPERSAND will reënable the PL_sawampersand mechanism.
* Remove 3 unused interpreter variablesKarl Williamson2012-11-261-3/+0
| | | | | | | | | | | | These variables have been unused in the Perl core since commit 4c88d5e0740d796bf5064336d280bba72897f385. The variables are undocumented. The only real use of any of these I found in CPAN is at https://metacpan.org/source/ABERGMAN/Devel-GC-Helper-0.25/Helper.xs#L1 The uses there appear to be in a list of known Perl variables. Since the module was published, more than a few new variables have been added, making this code obsolete anyway.
* Hash Function Change - Murmur hash and true per process hash seedYves Orton2012-11-171-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does the following: *) Introduces multiple new hash functions to choose from at build time. This includes Murmur-32, SDBM, DJB2, SipHash, SuperFast, and One-at-a-time. Currently this is handled by muning hv.h. Configure support hopefully to follow. *) Changes the default hash to Murmur hash which is faster than the old default One-at-a-time. *) Rips out the old HvREHASH mechanism and replaces it with a per-process random hash seed. *) Changes the old PL_hash_seed from an interpreter value to a global variable. This means it does not have to be copied during interpreter setup or cloning. *) Changes the format of the PERL_HASH_SEED variable to a hex string so that hash seeds longer than fit in an integer are possible. *) Changes the return of Hash::Util::hash_seed() from a number to a string. This is to accomodate hash functions which have more bits than can be fit in an integer. *) Adds new functions to Hash::Util to improve introspection of hashes -) hash_value() - returns an integer hash value for a given string. -) bucket_info() - returns basic hash bucket utilization info -) bucket_stats() - returns more hash bucket utilization info -) bucket_array() - which keys are in which buckets in a hash More details on the new hash functions can be found below: Murmur Hash: (v3) from google, see http://code.google.com/p/smhasher/wiki/MurmurHash3 Superfast Hash: From Paul Hsieh. http://www.azillionmonkeys.com/qed/hash.html DJB2: a hash function from Daniel Bernstein http://www.cse.yorku.ca/~oz/hash.html SDBM: a hash function sdbm. http://www.cse.yorku.ca/~oz/hash.html SipHash: by Jean-Philippe Aumasson and Daniel J. Bernstein. https://www.131002.net/siphash/ They have all be converted into Perl's ugly macro format. I have not done any rigorous testing to make sure this conversion is correct. They seem to function as expected however. All of them use the random hash seed. You can force the use of a given function by defining one of PERL_HASH_FUNC_MURMUR PERL_HASH_FUNC_SUPERFAST PERL_HASH_FUNC_DJB2 PERL_HASH_FUNC_SDBM PERL_HASH_FUNC_ONE_AT_A_TIME Setting the environment variable PERL_HASH_SEED_DEBUG to 1 will make perl output the current seed (changed to hex) and the hash function it has been built with. Setting the environment variable PERL_HASH_SEED to a hex value will cause that value to be used at the seed. Any missing bits of the seed will be set to 0. The bits are filled in from left to right, not the traditional right to left so setting it to FE results in a seed value of "FE000000" not "000000FE". Note that we do the hash seed initialization in perl_construct(). Doing it via perl_alloc() (via init_tls) causes problems under threaded builds as the buffers used for reentrant srand48 functions are not allocated. See also the p5p mail "Hash improvements blocker: portable random code that doesnt depend on a functional interpreter", Message-ID: <CANgJU+X+wNayjsNOpKRqYHnEy_+B9UH_2irRA5O3ZmcYGAAZFQ@mail.gmail.com>
* Validate above-Latin1 characters in \N{} aliasesKarl Williamson2012-11-111-0/+2
| | | | | | | | | | This completes the process of allowing users to define their own aliases for \N{} in any language they choose. Names have some validation applied so that they can't, for example, begin with something that is a digit in some Unicode script. Tests and documentation are included in this patch. The loop in toke.c that does the validation for user-supplied translators is revamped, and the messages that are output when there is an error are fixed to work with UTF-8.
* Used pad name lists for pad idsFather Chrysostomos2012-10-161-1/+0
| | | | | | | | | | | | | | | | | I added pad IDs so that a pad could record which pad it closes over, to avoid problems with closures closing over the wrong pad, resulting in crashes or bizarre copies. These pad IDs were shared between clones of the same pad. In commit 9ef8d56, for efficiency I made clones of the same closure share the same pad name list. It has just occurred to be that each padlist containing the same pad name list also has the same pad ID, so we can just use the pad name list itself as the ID. This makes padlists 32 bits smaller and eliminates PL_pad_generation from the interpreter struct.
* PATCH: [perl #89774] multi-char fold + its fold in char classKarl Williamson2012-10-141-0/+1
| | | | | | | | | | | | | | | | | | The design for handling characters that fold to multiple characters when the former are encountered in a bracketed character class is defective. The ticket reads, "If a bracketed character class includes a character that has a multi-char fold, and it also includes the first character of that fold, the multi-char fold will never be matched; just the first character of the fold.". Thus, in the class /[\0-\xff]/i, \xDF will never be matched, because its fold is 'ss', the first character of which, 's', is also in the class. The reason the design is defective is that it doesn't allow for backtracking and trying the other options. This commit solves this by effectively rewriting the above to be / (?: \xdf | [\0-\xde\xe0-\xff] ) /xi. And so the backtracking gets handled automatcially by the regex engine.
* Eliminate the vestigial comment "magical thingies" from intrpvar.hNicholas Clark2012-09-201-1/+0
| | | | | | | | | | | | | | The original comment "magical thingies" was added to perl.c by commit 8ebc5c0145d2e355 in Jan 1997. It is above code which frees 4 interpreter- global SVs in perl_destruct. The comment "magical thingies" was in intrpvar.h since the file was created by commit 49f531dad558d800 on 29 Nov 1997. At that time, it was followed by a block of 13 relevant interpreter global variables. However, by commit d4cce5f1785350c2 (30 Nov 1997) all bar two were now in other places, mostly in thrdvar.h. With the abolition of PL_formfeed, the comment now annotates just one "magical" thingy, PL_basetime, which isn't even one of the SVs freed at the analogous location in perl.c. Hence the comment adds no value.
* Get rid of PL_formfeed.Enache Adrian2012-09-201-2/+0
| | | | | | | | | | | | | | | | $^L is neither a magical variable, nor a normal one (like $;) but it's just a little bit special :) This patch removes PL_formfeed - IMHO, an extra gv_fetchpv per page when using formats isn't going to cause a sensible speed regression. I suppose that removing the intrpvar.h hunk from the patch is enough to keep binary compatibility - unless someone used PL_formfeed from an XS module. [with regen.pl run as noted by the author, and an additional change to perl.c to remove the reference to PL_formfeed added soon after this patch was sent]
* Use macro not swash for utf8 quotemetaKarl Williamson2012-09-131-1/+0
| | | | | | | | | | | | | | The rules for matching whether an above-Latin1 code point are now saved in a macro generated from a trie by regen/regcharclass.pl, and these are now used by pp.c to test these cases. This allows removal of a wrapper subroutine, and also there is no need for dynamic loading at run-time into a swash. This macro is about as big as I'm comfortable compiling in, but it saves the building of a hash that can grow over time, and removes a subroutine and interpreter variables. Indeed, performance benchmarks show that it is about the same speed as a hash, but it does not require having to load the rules in from disk the first time it is used.
* regexec.c: Use new macros instead of swashesKarl Williamson2012-09-131-7/+0
| | | | | | | | | | A previous commit has caused macros to be generated that will match Unicode code points of interest to the \X algorithm. This patch uses them. This speeds up modern Korean processing by 15%. Together with recent previous commits, the throughput of modern Korean under \X has more than doubled, and is now comparable to other languages (which have increased themselved by 35%)
* PL_sawampersand: use 3 bit flags rather than boolDavid Mitchell2012-09-081-1/+1
| | | | | | | | Set a separate flag for each of $`, $& and $'. It still works fine in boolean context. This will allow us to have more refined control over what parts of a match string to copy (we currently copy the whole string).
* Refactor \X regex handling to avoid a typical case table lookupKarl Williamson2012-08-281-1/+1
| | | | | | | | | Prior to this commit 98.4% of Unicode code points that went through \X had to be looked up to see if they begin a grapheme cluster; then looked up again to find that they didn't require special handling. This commit refactors things so only one look-up is required for those 98.4%. It changes the table generated by mktables to accomplish this, and hence the name of it, and references to it are changed to correspond.
* Prepare for Unicode 6.2Karl Williamson2012-08-261-1/+2
| | | | | | | | | | | This changes code to be able to handle Unicode 6.2, while continuing to handle all prevrious releases. The major change was a new definition of \X, which adds a property to its calculation. Unfortunately \X is hard-coded into regexec.c, and so has to revised whenever there is a change of this magnitude in Unicode, which fortunately isn't all that often. I refactored the code in mktables to make it easier next time there is a change like this one.
* Comment out unused functionKarl Williamson2012-08-251-1/+0
| | | | | | In looking at \X handling, I noticed that this function which is intended for use in it, actually isn't used. This function may someday be useful, so I'm leaving the source in.
* Use new types for comppad and comppad_nameFather Chrysostomos2012-08-211-2/+2
| | | | | | I know that a few times I’ve looked at perl source files to find out what type to use in ‘<type> foo = PL_whatever’. So I am changing intrpvar.h as well as the api docs.
* Fix format closure bug with redefined outer subFather Chrysostomos2012-08-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CVs close over their outer CVs. So, when you write: my $x = 52; sub foo { sub bar { sub baz { $x } } } baz’s CvOUTSIDE pointer points to bar, bar’s CvOUTSIDE points to foo, and foo’s to the main cv. When the inner reference to $x is looked up, the CvOUTSIDE chain is followed, and each sub’s pad is looked at to see if it has an $x. (This happens at compile time.) It can happen that bar is undefined and then redefined: undef &bar; eval 'sub bar { my $x = 34 }'; After this, baz will still refer to the main cv’s $x (52), but, if baz had ‘eval '$x'’ instead of just $x, it would see the new bar’s $x. (It’s not really a new bar, as its refaddr is the same, but it has a new body.) This particular case is harmless, and is obscure enough that we could define it any way we want, and it could still be considered correct. The real problem happens when CVs are cloned. When a CV is cloned, its name pad already contains the offsets into the parent pad where the values are to be found. If the outer CV has been undefined and redefined, those pad offsets can be com- pletely bogus. Normally, a CV cannot be cloned except when its outer CV is running. And the outer CV cannot have been undefined without also throwing away the op that would have cloned the prototype. But formats can be cloned when the outer CV is not running. So it is possible for cloned formats to close over bogus entries in a new parent pad. In this example, \$x gives us an array ref. It shows ARRAY(0xbaff1ed) instead of SCALAR(0xdeafbee): sub foo { my $x; format = @ ($x,warn \$x)[0] . } undef &foo; eval 'sub foo { my @x; write }'; foo __END__ And if the offset that the format’s pad closes over is beyond the end of the parent’s new pad, we can even get a crash, as in this case: eval 'sub foo {' . '{my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l,$m,$n,$o,$p,$q,$r,$s,$t,$u)}'x999 . q| my $x; format = @ ($x,warn \$x)[0] . } |; undef &foo; eval 'sub foo { my @x; my $x = 34; write }'; foo(); __END__ So now, instead of using CvROOT to identify clones of CvOUTSIDE(format), we use the padlist ID instead. Padlists don’t actually have an ID, so we give them one. Any time a sub is cloned, the new padlist gets the same ID as the old. The format needs to remember what its outer sub’s padlist ID was, so we put that in the padlist struct, too.
* regcomp.c: Fix multi-char fold bugKarl Williamson2012-08-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Input text to be matched under /i is placed in EXACTFish nodes. The current limit on such text is 255 bytes per node. Even if we raised that limit, it will always be finite. If the input text is longer than this, it is split across 2 or more nodes. A problem occurs when that split occurs within a potential multi-character fold. For example, if the final character that fits in a node is 'f', and the next character is 'i', it should be matchable by LATIN SMALL LIGATURE FI, but because Perl isn't structured to find multi-char folds that cross node boundaries, we will miss this it. The solution presented here isn't optimum. What we do is try to prevent all EXACTFish nodes from ending in a character that could be at the beginning or middle of a multi-char fold. That prevents the problem. But in actuality, the problem only occurs if the input text is actually a multi-char fold, which happens much less frequently. For example, we try to not end a full node with an 'f', but the problem doesn't actually occur unless the adjacent following node begins with an 'i' (or one of the other characters that 'f' participates in). That is, this patch splits when it doesn't need to. At the point of execution for this patch, we only know that the final character that fits in the node is that 'f'. The next character remains unparsed, and could be in any number of forms, a literal 'i', or a hex, octal, or named character constant, or it may need to be decoded (from 'use encoding'). So look-ahead is not really viable. So finding if a real multi-character fold is involved would have to be done later in the process, when we have full knowledge of the nodes, at the places where join_exact() is now called, and would require inserting a new node(s) in the middle of existing ones. This solution seems reasonable instead. It does not yet address named character constants (\N{}) which currently bypass the code added here.
* Eliminate PL_OP_SLAB_ALLOCFather Chrysostomos2012-07-121-11/+0
| | | | | | | | | | | | This commit eliminates the old slab allocator. It had bugs in it, in that ops would not be cleaned up properly after syntax errors. So why not fix it? Well, the new slab allocator *is* the old one fixed. Now that this is gone, we don’t have to worry as much about ops leak- ing when errors occur, because it won’t happen any more. Recent commits eliminated the only reason to hang on to it: PERL_DEBUG_READONLY_OPS required it.
* PERL_DEBUG_READONLY_OPS with the new allocatorFather Chrysostomos2012-07-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | I want to eliminate the old slab allocator (PL_OP_SLAB_ALLOC), but this useful debugging tool needs to be rewritten for the new one first. This is slightly better than under PL_OP_SLAB_ALLOC, in that CVs cre- ated after the main CV starts running will get read-only ops, too. It is when a CV finishes compiling and relinquishes ownership of the slab that the slab is made read-only, because at that point it should not be used again for allocation. BEGIN blocks are exempt, as they are processed before the Slab_to_ro call in newATTRSUB. The Slab_to_ro call must come at the very end, after LEAVE_SCOPE, because otherwise the ops freed via the stack (the SAVEFREEOP calls near the top of newATTRSUB) will make the slab writa- ble again. At that point, the BEGIN block has already been run and its slab freed. Maybe slabs belonging to BEGIN blocks can be made read-only later. Under PERL_DEBUG_READONLY_OPS, op slabs have two extra fields to record the size and readonliness of each slab. (Only the first slab in a CV’s slab chain uses the readonly flag, since it is conceptually simpler to treat them all as one unit.) Without recording this infor- mation manually, things become unbearably slow, the tests taking hours and hours instead of minutes.
* handy.h: Fix isBLANK_uni and isBLANK_utf8Karl Williamson2012-06-291-0/+1
| | | | | | | | | | These macros have never worked outside the Latin1 range, so this extends them to work. There are no tests I could find for things in handy.h, except that many of them are called all over the place during the normal course of events. This commit adds a new file for such testing, containing for now only with a few tests for the isBLANK's
* eliminate PL_reginterp_cntDavid Mitchell2012-06-131-2/+0
| | | | | | This used to be the mechanism to determine whether "use re 'eval'" needed to be in scope; but now that we make a clear distinction between literal and runtime code blocks, it's no longer needed.
* [perl #78742] Store CopSTASH in a pad under threadsFather Chrysostomos2012-06-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this commit, a pointer to the cop’s stash was stored in cop->cop_stash under non-threaded perls, and the name and name length were stored in cop->cop_stashpv and cop->cop_stashlen under ithreads. Consequently, eval "__PACKAGE__" would end up returning the wrong package name under threads if the current package had been assigned over. This commit changes the way cops store their stash under threads. Now it is an offset (cop->cop_stashoff) into the new PL_stashpad array (just a mallocked block), which holds pointers to all stashes that have code compiled in them. I didn’t use the lexical pads, because CopSTASH(cop) won’t work unless PL_curpad is holding the right pad. And things start to get very hairy in pp_caller, since the correct pad isn’t anywhere easily accessible on the context stack (oldcomppad actually referring to the current comppad). The approach I’ve followed uses far less code, too. In addition to fixing the bug, this also saves memory. Instead of allocating a separate PV for every single statement (to hold the stash name), now all lines of code in a package can share the same stashpad slot. So, on a 32-bit OS X, that’s 16 bytes less memory per COP for short package names. Since stashoff is the same size as stashpv, there is no difference there. Each package now needs just 4 bytes in the stashpad for storing a pointer. For speed’s sake PL_stashpadix stores the index of the last-used stashpad offset. So only when switching packages is there a linear search through the stashpad.
* Excise PL_amagic_generationFather Chrysostomos2012-05-231-2/+0
| | | | | | | | | The core is not using it any more. Every CPAN module that increments it also does newXS, which triggers mro_method_changed_in, which is sufficient; so nothing will break. So, to keep those modules compiling, PL_amagic_generation is now an alias to PL_na outside the core.
* Remove gete?[ug]id cachingÆvar Arnfjörð Bjarmason2012-02-181-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we cache the UID/GID and effective UID/GID similarly to how we used to cache getpid() before v5.14.0-251-g0e21945. Remove this magical behavior in favor of always calling getpid(), getgid() etc. This resolves RT #96208. A minimal testcase for this is the following by Leon Timmermans attached to RT #96208: eval { require 'syscall.ph'; 1 } or eval { require 'sys/syscall.ph'; 1 } or die $@; if (syscall(&SYS_setuid, $ARGV[0] + 0 || 1000) >= 0 or die "$!") { printf "\$< = %d, getuid = %d\n", $<, syscall(&SYS_getuid); } I.e. if we call the sete?[ug]id() functions unbeknownst to perl the $<, $>, $( and $) variables won't be updated. This results in the same sort of issues we had with $$ before v5.14.0-251-g0e21945, and getppid() before my v5.15.7-407-gd7c042c patch. I'm completely eliminating the PL_egid, PL_euid, PL_gid and PL_uid variables as part of this patch, this will break some CPAN modules, but it'll be really easy before the v5.16.0 final to reinstate them. I'd like to remove them to see what breaks, and how easy it is to fix it. These variables are not part of the public API, and the modules using them could either use the Perl_gete?[ug]id() functions or are working around the bug I'm fixing with this commit. The new PL_delaymagic_(egid|euid|gid|uid) variables I'm adding are *only* intended to be used internally in the interpreter to facilitate the delaymagic in Perl_pp_sassign. There's probably some way not to export these to programs that embed perl, but I haven't found out how to do that.
* perl #77654: quotemeta quotes non-ASCII consistentlyKarl Williamson2012-02-151-0/+1
| | | | | | | | | | As described in the pod changes in this commit, this changes quotemeta() to consistenly quote non-ASCII characters when used under unicode_strings. The behavior is changed for these and UTF-8 encoded strings to more closely align with Unicode's recommendations. The end result is that we *could* at some future point start using other characters as metacharacters than the 12 we do now.
* Further eliminate POSIX-emulation under LinuxThreadsÆvar Arnfjörð Bjarmason2012-02-151-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under POSIX threads the getpid() and getppid() functions return the same values across multiple threads, i.e. threads don't have their own PID's. This is not the case under the obsolete LinuxThreads where each thread has a different PID, so getpid() and getppid() will return different values across threads. Ever since the first perl 5.0 we've returned POSIX-consistent semantics for $$, until v5.14.0-251-g0e21945 when the getpid() cache was removed. In 5.8.1 Rafael added further explicit POSIX emulation in perl-5.8.0-133-g4d76a34 [1] by explicitly caching getppid(), so that multiple threads would always return the same value. I don't think all this effort to emulate POSIX sematics is worth it. I think $$ and getppid() are OS-level functions that should always return the same as their C equivalents. I shouldn't have to use a module like Linux::Pid to get the OS version of the return values. This is pretty much a complete non-issue in practice these days, LinuxThreads was a Linux 2.4 thread implementation that nobody maintains anymore[2], all modern Linux distros use NPTL threads which don't suffer from this discrepancy. Debian GNU/kFreeBSD does use LinuxThreads in the 6.0 release, but they too will be moving away from it in future releases, and really, nobody uses Debian GNU/kFreeBSD anyway. This caching makes it unnecessarily tedious to fork an embedded Perl interpreter. When someone that constructs an embedded perl interpreter and forks their application, the fork(2) system call isn't going to run Perl_pp_fork(), and thus the return value of $$ and getppid() doesn't reflect the current process. See [3] for a bug in uWSGI related to this, and Perl::AfterFork on the CPAN for XS code that you need to run after forking a PerlInterpreter unbeknownst to perl. We've already been failing the tests in t/op/getpid.t on these Linux systems that nobody apparently uses, the Debian GNU/kFreeBSD users did notice and filed #96270, this patch fixes that failure by changing the tests to test for different behavior under LinuxThreads, I've tested that this works on my Debian GNU/kFreeBSD 6.0.4 virtual machine. If this change is found to be unacceptable (i.e. we want to continue to emulate POSIX thread semantics for the sake of LinuxThreads) we also need to revert v5.14.0-251-g0e21945, because currently we're only emulating POSIX semantics for getppid(), not getpid(). But I don't think we should do that, both v5.14.0-251-g0e21945 and this commit are awesome. This commit includes a change to embedvar.h made by "make regen_headers". 1. http://www.nntp.perl.org/group/perl.perl5.porters/2002/08/msg64603.html 2. http://pauillac.inria.fr/~xleroy/linuxthreads/ 3. http://projects.unbit.it/uwsgi/ticket/85
* intrpvar.h: Rmv no longer used PL_ variableKarl Williamson2012-02-111-2/+0
| | | | | Commit 24caacbccae7b938deecdcc3f13dd66c9c6a684e removed all uses of this variable, but failed to remove it.
* regcomp.c: /[[:lower:]]/i should match the same as /\p{Lower}/iKarl Williamson2012-02-111-0/+2
| | | | | | Same for [[:upper:]] and \p{Upper}. These were matching instead all of [[:alpha:]] or \p{Alpha}. What /\p{Lower}/i and /\p{Upper}/i match instead is \p{Cased}, and so that is what these should match.
* Add compile-time inversion lists for POSIX classesKarl Williamson2012-02-091-0/+45
| | | | | | | | | | | | | | These will be used in regcomp.c to replace the existing bit-wise handling of these, enabling subsequent optimizations. These are compiled-in, and hence affect the memory footprint of every program, including those that don't use Unicode. The lists that aren't tiny are therefore currently restricted to only the Latin1 range; anything needed beyond that will have to be read in at execution time, just as before. The design allows for easy conversion from Latin1 to use the full Unicode range, should it be deemed desirable for some or all of these.
* regcomp.c: Use compile-time invlistsKarl Williamson2012-02-091-1/+6
| | | | | | This creates three simple compile-time inversion lists from the data that has been generated in a previous commit, and uses two of them. Three PL_ variables are used to store them.
* intrpvar.h: clarification in commentKarl Williamson2012-01-131-1/+1
|
* Re-order intrpvar.h to avoid false warnings about holes.Nicholas Clark2011-11-111-1/+1
| | | | | | | | | | Under the default configuration options for ithreads on x86_64 *nix, PERL_IMPLICIT_CONTEXT is defined. The variables specific to this are at the end of the interpreter struct, and their size is not an integer multiple of its alignment constraint. Hence there will always be a "hole". Move the "hole" so that it is beyond the end of the structure. This avoids the Linux tool "pahole", used for finding wasted space, from a false positive report of a hole that can't be avoided.
* Re-order intrpvar.h to avoid holes in the interpreter struct.Nicholas Clark2011-11-111-8/+6
| | | | | | | | Because commit 45d91b83242e0418 needed to change a buffer size in a per-thread variable, it created a hole in the ithreads interpreter struct, as structure members after the buffer must be word aligned. Re-order various structure members to avoid the hole.
* intrpvar.h: Increase size of variable that stores UTF8 bytesKarl Williamson2011-11-091-2/+2
| | | | | | A Perl utf8 string can occupy 13 bytes. This only accepted up to 11, causing a swash assertion failure for very large code points matching Unicode properties.
* Fix CORE::globFather Chrysostomos2011-10-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | This commit makes CORE::glob bypassing glob overrides. A side effect of the fix is that, with the default glob implementa- tion, undefining *CORE::GLOBAL::glob no longer results in an ‘unde- fined subroutine’ error. Another side effect is that compilation of a glob op no longer assumes that the loading of File::Glob will create the *CORE::GLOB::glob type- glob. ‘++$INC{"File/Glob.pm"}; sub File::Glob::csh_glob; eval '<*>';’ used to crash. This is accomplished using a mechanism similar to lock() and threads::shared. There is a new PL_globhook interpreter varia- ble that pp_glob calls when there is no override present. Thus, File::Glob (which is supposed to be transparent, as it *is* the built-in implementation) no longer interferes with the user mechanism for overriding glob. This removes one tier from the five or so hacks that constitute glob’s implementation, and which work together to make it one of the buggiest and most inconsistent areas of Perl.
* Remove part of intrpvar.h commentFather Chrysostomos2011-10-241-3/+1
| | | | This second sentence is no longer true as of 87b9e160.
* utf8.c: Add function to retrieve new _Perl_IDStart propKarl Williamson2011-10-011-0/+1
|
* Don't use swash to find cntrlsKarl Williamson2011-10-011-1/+0
| | | | | | | | | Unicode stability policy guarantees that no code points will ever be added to the control characters beyond those already in it. All such characters are in the Latin1 range, and so the Perl core already knows which ones those are, and so there is no need to go out to disk and create a swash for these.
* No need for swashes for properties that are ASCII-onlyKarl Williamson2011-10-011-3/+0
| | | | | | These three properties are restricted to being true only for ASCII characters. That information is compiled into Perl, so no need to create swashes for them.
* No need for swashes for computing if ASCIIKarl Williamson2011-10-011-1/+0
| | | | | This information is trivially computed via the macro, no need to go out to disk and store a swash for this.