summaryrefslogtreecommitdiff
path: root/t
Commit message (Collapse)AuthorAgeFilesLines
* test for error set on socket() failureTony Cook2019-02-201-0/+12
|
* re/user_prop_race_thr.t: reduce timeoutDavid Mitchell2019-02-191-1/+1
| | | | | | | | | | This new test script has a test that's supposed to exercise an up-to 10s wait-and-retry loop when loading properties. It has a 500s timeout built-in for if that fails. On my system its been intermittently failing (not sure if due to something I'm doing or a problem with the test or with regcomp.c) which effectively hangs the test run. So decrease the timeout to 25 secs.
* PATCH: [perl #133767] Assertion failureKarl Williamson2019-02-161-0/+1
| | | | | | | | | | | The problem here is that a syntax error occurs and hence certain things don't get done, but processing continues, as the error isn't checked for until after the return of the function that found it. The failing assertion is checking that those certain things actually did get done. There appear to be good reasons to defer the raising of the error until then, so the simplest way to fix this is to generalize the code so that the failing assertion doesn't happen.
* Add .t for testing user-defined \p{} racesKarl Williamson2019-02-141-0/+117
|
* t/re/regexp_unicode_prop.t: Make sure sub called only onceKarl Williamson2019-02-141-5/+18
| | | | | | | User-defined properties are supposed to be called just once for /i and once for non-/i. This adds tests for that. It turns out that this was broken in blead.
* t/re/regexp_unicode_prop.t: Add testsKarl Williamson2019-02-141-1/+76
| | | | | Add some tests. These test various error conditions that haven't been tested before.
* t/re/regexp_unicode_prop.t: Test that can have nested pkgsKarl Williamson2019-02-141-2/+2
| | | | That is, in \p{user-defined}
* t/re/regexp_unicode_prop.t: Add some stressKarl Williamson2019-02-141-5/+8
| | | | | This adds some trailing spaces and comments in expansion of \p{user-defined}/ to verify things work.
* t/op/taint.t: Add testKarl Williamson2019-02-141-1/+17
|
* Move \p{user-defined} to core from utf8_heavy.plKarl Williamson2019-02-142-6/+6
| | | | | | | | | | | | | | This large commit moves the handling of user-defined properties to C code. This should speed it up, but the main reason to do this is to stop using swashes in this case, leaving only tr/// using them. Once that too is converted, all swash handling can be ripped out of perl. Doing this in perl has caused some nasty interactions that will now be fixed automatically. The change is not entirely transparent, however (besides speed and the possibility of removing these interactions). perldelta in this commit details these.
* t/porting/manifest.t add line numberNicolas R2019-02-141-3/+3
| | | | | Improve t/porting/manifest.t output on errors to show the line number.
* Net::Ping 501_ping_icmpv6.t: disable sudo testNicolas R2019-02-141-0/+1
| | | | | This is similar to the changes made in 7bfdd8260c we do not want to use 'sudo' during the tests.
* Update Net::Ping to upstream version 2.71Nicolas R2019-02-142-3/+3
| | | | | | | | | | This retains blead customizations: * 1a58b39af8 remove of 'use vars' * 7bfdd8260c 500_ping_icmp.t: remove sudo code These changes are not required anymore, they are merged upstream * 0fc44d0a18 avoid stderr noise in tests
* (perl #133660) add test for goto &sub in overload leakingTony Cook2019-02-131-1/+21
| | | | The bug in this case was fixed in db9848c8d.
* t/loc_tools.pl: C.UTF-8 is a likely localeKarl Williamson2019-02-061-0/+1
| | | | | When looking for locales on a system, try this one which seems to be getting to be available widely.
* Add Turkish locale handling to /i pattern matchingKarl Williamson2019-02-051-1/+1
| | | | | | Previous commits in this series have changed uc(), lc(), fc(), etc. to know how to handle Turkish UTF-8 locales. This commit extends this to /i regular expression pattern matching.
* t/op/lc.t: Add tests for Turkish localesKarl Williamson2019-02-051-9/+70
| | | | But since these aren't recognized yet, they will be skipped
* Add .t to test Turkic locale foldingKarl Williamson2019-02-051-0/+35
| | | | | | | This just calls fold_grind.pl with a particular option. But, as of this commit, Turkish locales aren't recognized specially, so this test just always skips.
* t/re/fold_grind.pl: Enhance to deal with Turkic rulesKarl Williamson2019-02-051-53/+181
| | | | | | | | | | The CaseFolding.txt file has special locale-dependent rules. This commit changed fold_grind to notice them, and to generate tests for the situation we aren't in, which are expected to fail. Since, as of this commit, the Turkic locale is not recognized, this commit has the effect of generating tests for the Turkic locale, running them, and making sure they fail when appropriate.
* t/loc_tools.pl: Add functions to find Turkic UTF-8 localesKarl Williamson2019-02-051-2/+37
| | | | | | These will be used by later commits. But right now Perl doesn't know how to determine if a locale is Turkic, so these functions return no locale, until later in this commit series
* regcomp.c: Under /l any < 256 char can match any otherKarl Williamson2019-02-051-1/+1
| | | | | | | | | | | | The code knew this, but it was adding the ASCII alphabetics to the list of things that matched in UTF-8 locales. This is unnecessary, as we've long had the infrastructure elsewhere to handle all potential mappings from a Latin1 code point to other Latin1, so we can just rely on it. And it created complexities for future commits in this series. The MICRO SIGN is the exception, as it folds to non-Latin1 in UTF-8 locales, and this is the place where the structure exists to handle that.
* t/loc_tools.pl: Add fcn to return all UTF-8 localesKarl Williamson2019-02-041-3/+21
| | | | This will be needed in future commits
* t/op/lc.t: Add 'use strict'Karl Williamson2019-02-041-2/+5
|
* t/re/fold_grind.pl: White-space onlyKarl Williamson2019-02-041-1/+5
| | | | Just align some logical or clauses for readability.
* regcomp.c: Fix recent optimization of [...] bugKarl Williamson2019-02-041-0/+1
| | | | | | This bug was introduced in b2296192536090829ba6d2cb367456f4e346dcc6 n 5.29.7. Using /il should not result in looking for a [:posix:] class that matches the code points given.
* (perl #133782) set magic when changing $^RTony Cook2019-01-301-1/+9
| | | | | | | | | The regexp engine sets and restores $^R in a few places, but didn't mg_set() (SvSETMAGIC()) it at all. Calls to length() on $^R, both within regexp code blocks and on a successful match could add utf8 length magic to $^R, and modifying $^R without mg_set() could leave now invalid length magic.
* Split t/re/fold_grind.t into multiple test filesKarl Williamson2019-01-297-36/+178
| | | | | | | | | | | | | | | This has been a goal for a long time, but I thought it would be a lot of work, but now have realized that there was a fairly easy simplistic approach. The core file is renamed fold_grind.pl. It formerly had an outer loop which iterated over the possible character set regex pattern modifiers, /a, /l, etc that were tested. Now that loop is just a block and new wrapper files have been created, one per modifier. They just pass a global to the core file that gives which modifier this test file is to use. Hence each file corresponds to one iteration of the old outer loop, splitting the tests up into 6 smaller tests that can run in parallel.
* don't leak temp filesTony Cook2019-01-281-1/+1
| | | | | | | | the test I added allocated more temp files, but didn't arrange for backup files to be cleaned up. Modified the cleanup to clean up every generated temp and backup file even if more are allocated in the future with mkfiles()
* (perl #130367) separate error for push etc on hash/globTony Cook2019-01-221-0/+34
|
* Test file in Encode is no longer customisedChris 'BinGOs' Williams2019-01-211-1/+0
|
* PATCH: [perl #133756] Failure to match properlyKarl Williamson2019-01-101-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was caused by a counting error. An EXACTFish regnode has a finite length it can hold for the string being matched. If that length is exceeded, a 2nd node is used for the next segment of the string, for as many regnodes as are needed. A problem occurs if a regnode ends with one of the 22 characters in Unicode 11 that occur in non-final positions of a multi-character fold. The design of the pattern matching engine doesn't allow matches across regnodes. Consider, for example if a node ended in the letter 'f' and the next node begins with the letter 'i'. That sequence should match, under /i, the ligature "fi" (U+FB01). But it wouldn't because the pattern splits them across nodes. The solution I adopted was to forbid a node to end with one of those 22 characters if there is another string node that follows it. This is not fool proof, for example, if the entire node consisted of only these characters, one would have to split it at some position (In that case, we just take as much of the string as will fit.) But for real life applications, it is good enough. What happens if a node ends with one of the 22, is that the node is shortened so that those are instead placed at the beginning of the following node. When the code encounters this situation, it backs off until it finds a character that isn't a non-final fold one, and closes the node with that one. A /i node is filled with the fold of the input, for several reasons. The most obvious is that it saves time, you can skip folding the pattern at runtime. But there are reasons based on the design of the optimzer as well, which I won't go into here, but are documented in regcomp.c. When we back out the final characters in a node, we also have to back out the corresponding unfolded characters in the input, so that those can be (folded) into the following node. Since the number of characters in the fold may not be the same as unfolded, there is not an easily discernable correspondence between the input and the folded output. That means that generally, what has to be done is that the input is reparsed from the beginning of the node, but the permitted length has been shortened (we know precisely how much to shorten it to) so that it will end with something other than the 22. But, the code saves the previous input character's position (for other reasons), so if we only have to backup one character, we can just use that and not have to reparse. This bug was that the code thought a two character backup was really a one character one, and did not reparse the node, creating an off-by-one error, and a character was simply omitted in the pattern (that should have started the following node). And the input had two of the 22 characters adjacent to each other in just the right positions that the node was split. The bisect showed that when the node size was changed the bug went away, at least for this particular input string. But a different, longer, string would have triggered the bug, and this commit fixes that. This bug is actually very unlikely to occur in most real world applications. That is because other changes in the regex compiler have caused nodes to be split so that things that don't particpate in folds at all are separated out into EXACT nodes. (The reason for that is it allows the optimizer things to grab on to under /i that it wouldn't otherwise have known about.) That means that anything like this string would never cause the bug to happen because blanks and commas, etc. would be in separate nodes, and so no node would ever get large enough to fill the 238 available byte slots in a node (235 on EBCDIC). Only a long string without punctuation would trigger it. I have artificially constructed such a string in the tests added by this commit. One of the 22 characters is 't', so long strings of DNA "ACTG" could trigger this bug. I find it somewhat amusing that this is something like a DNA transcription error, which occurs in nature at very low rates, but selection, it is believed, will make sure the error rate is above zero.
* (perl #132158) abort compilation if we see an error compiling a formTony Cook2019-01-031-0/+9
|
* First "eof" should return trueHauke D2019-01-021-1/+0
| | | | | When no file has previously been opened, "eof" should return true. This behavior was broken by 32e653230c7ccc (see also [#60978]).
* [perl #133721] TODO test for eof with no ${^LAST_FH}Tony Cook2019-01-021-1/+9
|
* [perl #133524] report line number for Prototype not terminatedTony Cook2019-01-021-0/+6
| | | | | Previously COPLINE was updated (to the end of the file) before reporting the error, which wasn't useful.
* Correct spelling error.James E Keenan2018-12-291-1/+1
|
* Change length-1 ASCII fold pairs to ANYOFM regnodesKarl Williamson2018-12-261-5/+5
| | | | | | | | | | A node that matches only 'A' and 'a', for example, can be turned into an ANYOFM node, which is faster to execute. This is done after joining of adjacent EXACTFish nodes, as longer nodes are better than shorter ones, including because they lessen the number of bugs with multi-char folds not matching because of node boundaries. But if a length 1 node remains, ANYOFM is better.
* Add new regnode: ANYOFH, without a bitmapKarl Williamson2018-12-261-232/+232
| | | | | | | | | | | This commit adds a regnode for the case where nothing in the bit map has matches. This allows the bitmap to be omitted, saving 32 bytes of otherwise wasted space per node. Many non-Latin Unicode properties have this characteristic. Further, since this node applies only to code points above 255, which are representable only in UTF-8, we can trivially fail a match where the target string isn't in UTF-8. Time savings also accrue from skipping the bitmap look-up. When swashes are removed, even more time will be saved.
* Revamp qr/[...]/ optimizationsKarl Williamson2018-12-263-10/+525
| | | | | | | | | | | | | | | | This commit extensively changes the optimizations for ANYOF regnodes that represent bracketed character classes. The removal of the regex compilation pass now makes these feasible and desirable. Compilation now tries hard to optimize an ANYOF node into something smaller and/or faster when feasible. Now, qr/[X]/ for any single character or POSIX class X, and any modifiers like /d, /i, etc, should be the same as qr/X/ for the same modifiers, unless it would require the pattern to be upgraded from non-UTF-8 to UTF-8, unless not doing so could introduce bugs. These changes fix some issues with multi-character /i folding.
* regcomp.c: Simplify handling of EXACTFish nodes with 's' at edgeKarl Williamson2018-12-261-1/+1
| | | | | | | | | | | | | | | | | | Commit 8a100c918ec81926c0536594df8ee1fcccb171da created node types for handling an 's' at the leading edge, at the trailing edge, and at both edges for nodes under /di that there is nothing else in that would prevent them from being EXACTFU nodes. If two of these get joined, it could create an 'ss' sequence which can't be an EXACTFU node, for U+DF would match them unconditionally. Instead, under /di it should match if and only if the target string is UTF-8 encoded. I realized later that having three types becomes harder to deal with when adding yet more node types, so this commit turns the three into just one node type, indicating that at least one edge of the node is an 's'. It also simplifies the parsing of the pattern and determining which node to use.
* t/re/anyof.t: Sort tests; remove dupsKarl Williamson2018-12-261-113/+101
| | | | | This makes it easier to add new tests without duplicating, as witnessed by the duplicate ones this commit removes
* re/anyof.t: Extract code into a functionKarl Williamson2018-12-251-43/+51
| | | | | This is in preparation for a future commit where it will be used in more than one place.
* t/re/anyof.t: Add capability to utf8::upgrade()Karl Williamson2018-12-251-1/+7
| | | | | | ANYOF nodes can generate different things depending on the UTF-8ness of the pattern. This adds the capability of conveniently specifying in a test that the pattern should be upgraded
* t/re/anyof.t: Add 'strict', 'warnings' pragmasKarl Williamson2018-12-251-0/+2
|
* Thoroughly test paragraph modeJames E Keenan2018-12-191-0/+504
| | | | For: RT # 133722
* regcomp.c: Tighten embedded patterns in regex setsKarl Williamson2018-12-161-2/+2
| | | | | | | In the (?[ ... ]) regex sets features, one can embed another compiled regex set pattern. Such compiled patterns always have a flag of '^', which we weren't looking for prior to this commit. That meant that uncompiled patterns would be mistaken for compiled ones.
* Correct previous perldelta entry, and add a testKarl Williamson2018-12-161-1/+6
| | | | | | | | | The text of perl5294delta was wrong about a change. This commit changes that text, and adds an entry to the latest perldelta with the correction. A test has been added to verify the way things work. The wrong language led to this blog post, and my comment in it: https://www.effectiveperlprogramming.com/2018/12/perl-v5-30-lets-you-match-more-with-the-general-quantifier/
* t/io/eintr.t: Skip some tests on pre-16 Darwin.Abigail2018-12-151-55/+62
| | | | | | | | | | | | The tests where we write a string larger than the pipe size to a pipe hang on 15.6.0, while they seem to work on Darwin 17.7.0. So we will skip these tests on Darwin, if the major version is less than 16. (We may adjust this is we have more reports on which versions between 15.6.0 and 17.7.0 success/fail). Note that the tests hang even if we send a string of 512 characters, which is much, much smaller than the actual size of the string in the test.
* regcomp.c: Allow more EXACTFish nodes to be trieableKarl Williamson2018-12-071-1/+1
| | | | | | | | | | | | | | | | | | The previous two commits fixed bugs where it would be possible during optimization to join two EXACTFish nodes together, and the result would not work properly with LATIN SMALL LETTER SHARP S. But by doing so, the commits caused all non-UTF-8 EXACTFU nodes that begin or end with [Ss] from being trieable. This commit changes things so that the only the ones that are non-trieable are the ones that, when joined, have the sequence [Ss][Ss] in them. To do so, I created three new node types that indicate if the node begins with [Ss] or ends with them, or both. These preclude having to examine the node contents at joining to determine this. And since there are plenty of node types available, it seemed the best choice. But other options would be available should we run out of nodes. Examining the first and final characters of a node is not expensive, for example.
* t/harness: Catch incorrect serial directory specificationKarl Williamson2018-12-071-0/+6
| | | | | The previous commit fixed a bug. This commit detects if someone creates a new instance of that bug.