summaryrefslogtreecommitdiff
path: root/testsuite/driver/runtests.py
Commit message (Collapse)AuthorAgeFilesLines
* testsuite: Use colors more consistentlyBen Gamari2019-12-051-8/+1
|
* testsuite: Make performance metric summary more readableBen Gamari2019-12-051-15/+35
| | | | Along with some refactoring.
* testsuite: Factor out terminal coloringBen Gamari2019-12-051-0/+3
|
* CI: Always dump performance metrics.David Eichmann2019-10-221-3/+19
|
* testsuite: More type checking fixesBen Gamari2019-07-181-2/+2
|
* testsuite: A major revamp of the driverBen Gamari2019-06-251-11/+20
| | | | | | | | | This tries to put the testsuite driver into a slightly more maintainable condition: * Add type annotations where easily done * Use pathlib.Path instead of str paths * Make it pass the mypy typechecker
* gitlab-ci: Lint testsuite for framework failuresBen Gamari2019-06-141-1/+1
| | | | | This introduces a new lint job checking for framework failures and listing broken tests.
* Update terminal title while running test-suiteOleg Grenrus2019-05-141-1/+18
| | | | | Useful progress indicator even when `make test VERBOSE=1`, and when you do something else, but have terminal title visible.
* Exit with exit code 1 when tests unexpectedly passMatthew Pickering2019-02-231-0/+1
| | | | | | This was causing gitlab to not report from builds as failing. It also highlighted a problem with the LLVM tests where some of the external interpreter tests are failing.
* Fix test runner crash when not in a git repoDavid Eichmann2019-02-211-6/+5
| | | | Respect `inside_git_repo()` when checking performance stats.
* Fix and Reapply "Performance tests: recover a baseline from ancestor commits ↵David Eichmann2019-02-161-8/+27
| | | | and CI results."
* Revert "Performance tests: recover a baseline from ancestor commits and CI ↵Ben Gamari2019-01-311-28/+8
| | | | | | | | | results." Unfortunately this has broken all future commits due to spurious(?) performance changes which I have been unable to work around. This reverts commit cc2261d42f6a954d88e355aaad41f001f65c95da.
* Performance tests: recover a baseline from ancestor commits and CI results.David Eichmann2019-01-301-8/+28
| | | | gitlab-ci: push performance metrics as git notes to the "GHC Performance Notes" repository.
* Revert "Batch merge"Ben Gamari2019-01-301-28/+8
| | | | This reverts commit 76c8fd674435a652c75a96c85abbf26f1f221876.
* Batch mergeBen Gamari2019-01-301-8/+28
|
* testsuite: Add predicate for CPU feature availabilityBen Gamari2019-01-271-0/+7
| | | | | | | Previously testing code-generation for ISA extensions was nearly impossible since we had no ability to determine whether the host supports the needed extension. Here we fix this by introducing a simple /proc/cpuinfo-based testsuite predicate. We really ought to
* testsuite: Print which ways we are going to runBen Gamari2018-12-121-1/+4
|
* Do not save performance test results if worktree is dirty.David Eichmann2018-12-111-2/+7
| | | | | | | | | | | | Reviewers: bgamari, tdammers Reviewed By: bgamari, tdammers Subscribers: rwbarton, carter GHC Trac Issues: #15924 Differential Revision: https://phabricator.haskell.org/D5368
* testsuite: Don't use git status to determine whether we are inside a repoBen Gamari2018-12-011-3/+3
| | | | | Git status is extremely expensive for this task. We instead use `git rev-parse HEAD` and throw away the output to ensure we don't spam the user.
* Skip all performance tests if not in a git repo.David Eichmann2018-11-301-6/+26
| | | | | | | | | | | | Reviewers: bgamari, tdammers, osa1 Reviewed By: tdammers Subscribers: osa1, tdammers, rwbarton, carter GHC Trac Issues: #15923 Differential Revision: https://phabricator.haskell.org/D5367
* testsuite: Save performance metrics in git notes.David Eichmann2018-11-071-11/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the following improvement: - Automatically records test metrics (per test environment) so that the programmer need not supply nor update expected values in *.T files. - On expected metric changes, the programmer need only indicate the direction of change in the git commit message. - Provides a simple python tool "perf_notes.py" to compare metrics over time. Issues: - Using just the previous commit allows performance to drift with each commit. - Currently we allow drift as we have a preference for minimizing false positives. - Some possible alternatives include: - Use metrics from a fixed commit per test: the last commit that allowed a change in performance (else the oldest metric) - Or use some sort of aggregate since the last commit that allowed a change in performance (else all available metrics) - These alternatives may result in a performance issue (with the test driver) having to heavily search git commits/notes. - Run locally, performance tests will trivially pass unless the tests were run locally on the previous commit. This is often not the case e.g. after pulling recent changes. Previously, *.T files contain statements such as: ``` stats_num_field('peak_megabytes_allocated', (2, 1)) compiler_stats_num_field('bytes allocated', [(wordsize(64), 165890392, 10)]) ``` This required the programmer to give the expected values and a tolerance deviation (percentage). With this patch, the above statements are replaced with: ``` collect_stats('peak_megabytes_allocated', 5) collect_compiler_stats('bytes allocated', 10) ``` So that programmer must only enter which metrics to test and a tolerance deviation. No expected value is required. CircleCI will then run the tests per test environment and record the metrics to a git note for that commit and push them to the git.haskell.org ghc repo. Metrics will be compared to the previous commit. If they are different by the tolerance deviation from the *.T file, then the corresponding test will fail. By adding to the git commit message e.g. ``` # Metric (In|De)crease <metric(s)> <options>: <tests> Metric Increase ['bytes allocated', 'peak_megabytes_allocated'] \ (test_env='linux_x86', way='default'): Test012, Test345 Metric Decrease 'bytes allocated': Test678 Metric Increase: Test711 ``` This will allow the noted changes (letting the test pass). Note that by omitting metrics or options, the change will apply to all possible metrics/options (i.e. in the above, an increase for all metrics in all test environments is allowed for Test711) phabricator will use the message in the description Reviewers: bgamari, hvr Reviewed By: bgamari Subscribers: rwbarton, carter GHC Trac Issues: #12758 Differential Revision: https://phabricator.haskell.org/D5059
* Simplify testsuite driver, part 2Krzysztof Gogolewski2018-08-121-4/+4
| | | | | | | | | | | | | | | | | | Summary: - Avoid import *; this helps tools such as pyflakes. The last occurrence in runtests.py is not easy to remove as it's used by .T files. - Use False/True instead of 0/1. Test Plan: validate Reviewers: bgamari, thomie, simonmar Reviewed By: thomie Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5062
* Simplify testsuite driverKrzysztof Gogolewski2018-08-111-1/+1
| | | | | | | | | | | | | | | | | | | Summary: - remove clean_cmd - framework_failures was undefined - times_file was not used - if_verbose_dump was called only when verbose >= 1; remove the check - simplify normalise_whitespace Test Plan: validate Reviewers: bgamari, thomie Reviewed By: thomie Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5061
* Testsuite driver: fix encoding issue when calling ghc-pkgKrzysztof Gogolewski2018-08-061-1/+1
| | | | | | | | | | | | | | | | Summary: In Python 3, subprocess.communicate() returns a pair of bytes, which need to be decoded. In runtests.py, we were just calling str() instead, which converts b'x' to "b'x'". As a result, the loop that was checking pkginfo for lines starting with 'library-dirs' couldn't work. Reviewers: bgamari, thomie, Phyx Reviewed By: thomie Subscribers: Phyx, rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5046
* Fix typosKrzysztof Gogolewski2018-08-051-1/+1
|
* Remove dead code in testsuite driverKrzysztof Gogolewski2018-07-271-4/+0
| | | | | | | | | | | | Test Plan: validate Reviewers: bgamari, O7 GHC - Testsuite Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4972
* #15387 Fix setting testsuite verbose to zeroAntti Siponen2018-07-161-1/+1
|
* testsuite: Print summary even if interruptedBen Gamari2018-06-141-15/+18
| | | | | | | | | | | | | | Fixes #15265. Reviewers: osa1 Reviewed By: osa1 Subscribers: rwbarton, thomie, carter GHC Trac Issues: #15265 Differential Revision: https://phabricator.haskell.org/D4841
* Windows: fix all failing tests.Tamar Christina2018-01-021-10/+23
| | | | | | | | | | | | | | | | | | | | | | This makes the testsuite pass clean on Windows again. It also fixes the `libstdc++-6.dll` error harbormaster was showing. I'm marking some tests as isolated tests to reduce their flakiness (mostly concurrency tests) when the test system is under heavy load. Updates process submodule. Test Plan: ./validate Reviewers: hvr, bgamari, erikd, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4277
* testsuite: Exit with non-zero exit code when tests failBen Gamari2017-12-181-1/+8
|
* Only look for locales of the form LL.VVGabor Greif2017-12-111-1/+1
| | | | | | Because in recent RHEL7 suddenly locales like `bokmål` pop up, which screw up reading-in of ASCII strings a line later. This additional criterion reliably eliminates those unicode characters.
* testsuite: Fix validation of waysBen Gamari2017-09-051-9/+18
| | | | | | | | | | | | Test Plan: Run `make test WAY=prof` Reviewers: angerman, austin Subscribers: rwbarton, thomie GHC Trac Issues: #14181 Differential Revision: https://phabricator.haskell.org/D3917
* testsuite: Don't pass allow_abbrevBen Gamari2017-07-281-2/+1
| | | | | | | | | | | | | This is only supported by Python 3.5 and later, which is too new for us to rely on. Reviewers: austin Subscribers: rwbarton, thomie, RyanGlScott GHC Trac Issues: #14050 Differential Revision: https://phabricator.haskell.org/D3803
* testsuite: Produce JUnit outputBen Gamari2017-07-281-0/+5
| | | | | | | | | | | | Test Plan: Validate, try ingesting into Jenkins. Reviewers: austin Subscribers: rwbarton, thomie GHC Trac Issues: #13716 Differential Revision: https://phabricator.haskell.org/D3796
* Switched out optparse for argparse in runtests.pyJared Weakly2017-07-281-77/+57
| | | | | | | | | | | | Tangentially related to my prior work on trac ticket #12758. Signed-off-by: Jared Weakly <jweakly@pdx.edu> Reviewers: austin, bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3792
* testsuite: Move echoing commands in make invocations to VERBOSE=5Reid Barton2017-03-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | D2894 added a new verbosity level VERBOSE=4 to strip -s/--silent flags from make invocations in test commands. This will probably cause the test to fail of course, but is useful for seeing what a test that's already failing is doing. However there was already an undocumented meaning of VERBOSE=4, added in commit cfeededf, that causes the results of performance tests to be printed unconditionally (even when they are within the expected range). nomeata's ghc builder uses these figures to collect historical data on performance test figures. The new meaning of VERBOSE=4 added in D2894 means that any test that uses make now fails on the builder. This commit moves the new behavior of D2894 to the level VERBOSE=5 so that nomeata's ghc builder again produces useful results on failing tests. It also adds documentation for both settings. Test Plan: did some manual testing Reviewers: austin, bgamari, Phyx, nomeata Reviewed By: bgamari, Phyx Subscribers: nomeata, thomie, Phyx Differential Revision: https://phabricator.haskell.org/D3141
* testsuite: Remove old python version testsReid Barton2017-02-231-38/+4
| | | | | | | | | | | | Test Plan: harbormaster Reviewers: austin, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3140
* Revert "Suppress duplicate .T files"Gabor Greif2016-12-221-1/+1
| | | | | | | | This reverts commit 9a29b65bda8aed4c5fdbff25866ddf2dd1583210. It turns out that while not harmful, that commit is unnecessary, and a `make clean` resolved it. See: https://phabricator.haskell.org/rGHC9a29b65bda8aed4c5fdbff25866ddf2dd1583210
* Suppress duplicate .T filesGabor Greif2016-12-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As per http://stackoverflow.com/questions/7961363/removing-duplicates-in-lists use the set() function to zap duplicates from the obtained list of .T files. I am using $ python3 --version Python 3.5.1 and strangely findTFiles() returns some .T files twice: -- BEFORE Found 376 .T files... ... ====> Scanning ../../libraries/array/tests/all.T ====> Scanning ../../libraries/array/tests/all.T *** framework failure for T2120(duplicate) There are multiple tests with this name *** framework failure for largeArray(duplicate) There are multiple tests with this name *** framework failure for array001(duplicate) There are multiple tests with this name *** framework failure for T9220(duplicate) There are multiple tests with this name *** framework failure for T229(duplicate) There are multiple tests with this name ... -- AFTER Found 365 .T files... ... ====> Scanning ../../libraries/array/tests/all.T ... Even more strangely 'find' begs to differ: $ find libraries testsuite/tests -name "*.T" | sort | uniq | wc -l 368
* testsuite: Use python3 by defaultBen Gamari2016-11-291-4/+3
| | | | | | | | | | | | | | | | | | | | Summary: It turns out that Phyx's fix for #12554 (D2684) still fails with mingw-w64 python 2.7. However, Python 3 (both msys2 and mingw-w64) work fine. Given that supporting Python 2 has already become rather tiresome (as @thomie warned it would), let's just move to python3 by default. Test Plan: Validate Reviewers: austin, Phyx Reviewed By: Phyx Subscribers: Phyx, thomie Differential Revision: https://phabricator.haskell.org/D2766 GHC Trac Issues: #12554
* Fix testsuite threading, timeout, encoding and performance issues on WindowsTamar Christina2016-11-291-19/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a land far far away, a project called Cygwin was born. Cygwin used newlib as it's standard C library implementation. But Cygwin wanted to emulate POSIX systems as closely as possible. So it implemented `execv` using the Windows function `spawnve`. Specifically ``` spawnve (_P_OVERLAY, path, argv, cur_environ ()) ``` `_P_OVERLAY` is crucial, as it makes the function behave *sort of* like execv on linux. the child process replaces the original process. With one major difference because of the difference in process models on Windows: the original process signals the caller that it's done. this is why the file is still locked. because it's still running, control was returned because the parent process was destroyed, but the child is still running. I think it's just pure dumb luck, that the older runtimes are slow enough to give the process time to terminate before we tried deleting the file. Which explains why you do have sporadic failures even on older runtimes like 2.5.0, of a test or two (like T7307). So this patch fixes a couple of things. I leverage the existing `timeout.exe` to implement a workaround for this issue. a) The old timeout used to start the process then assign it to the job. This is slightly faulty since child processes are only assigned to a job is their parent were assigned at the time they started. So this was a race condition. I now create the process suspended, assign it to the job and then resume it. Which means all child processes are not running under the same job. b) First things, Is to prevent dangling child processes. I mark the job with `JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE` so when the last process in the job is done, it insures all processes under the job are killed. c) Secondly, I change the way we wait for results. Instead of waiting for the parent process to terminate, I wait for the job itself to terminate. There's a slight subtlety there, we can't wait on the job itself. Instead we have to create an I/O Completion port and wait for signals on it. See https://blogs.msdn.microsoft.com/oldnewthing/20130405-00/?p=4743 This fixes the issues on all runtimes for me and makes T7307 pass consistenly. The threading was also simplified by hiding all the locking in a single semaphore and a completion class. Futhermore some additional error reporting was added. For encoding the testsuite now no longer passes a file handle to the subprocess since on windows, sh.exe seems to acquire a lock on the file that is not released in a timely fashion. I suspect this because cygwin seems to emulate console handles by creating file handles and using those for std handles. So when we give it an existing file handle it just locks the file. I what's happening is that it's not releasing the handle until all shared cygwin processes are dead. Which explains why it worked in single threaded mode. So now instead we pass a pipe and do not interpret the resulting data. Any bytes written to stdin or read out of stdout/stderr are done so in binary mode and we do not interpret the data. The reason for this is that we have encoding tests in GHC which pass invalid utf-8. If we try to handle the data as text then python will throw an exception instead of a test comparison failing. Also I have fixed the ability to override `PYTHON` when calling `make tests`. This now works the same as with `.\validate`. Finally, after cleaning up the locks I was able to make the abort behavior work correctly as I believe it was intended: when you press Ctrl+C and send an interrupt signal, the testsuite finishes the active tests and then gracefully exits showing you a report of the progress it did make. So using Ctrl+C will not just *die* as it did before. These changes lift the restriction on which python version you use (msys/mingw) or which runtime or python 3 or python 2. All combinations should now be supported. Test Plan: PATH=/usr/local/bin:/mingw64/bin:$APPDATA/cabal/bin:$PATH && PYTHON=/usr/bin/python THREADS=9 make test THREADS=9 make test PATH=/usr/local/bin:/mingw64/bin:$APPDATA/cabal/bin:$PATH && PYTHON=/usr/bin/python ./validate --quiet --testsuite-only Reviewers: erikd, RyanGlScott, bgamari, austin Subscribers: jrtc27, mpickering, thomie, #ghc_windows_task_force Differential Revision: https://phabricator.haskell.org/D2684 GHC Trac Issues: #12725, #12554, #12661, #12004
* testsuite: Simplify kernel32 glue logicBen Gamari2016-11-021-5/+6
| | | | | | | | | | | | | | | | | | | | On Windows the testsuite driver calls kernel32 to set the current terminal codepage. The previous implementation of this was significantly more complex than necessary, and was wrong in the case of MSYS2, which requires that we explicitly load the library using the name of its DLL, including its file extension. Test Plan: Validate on Windows Reviewers: austin, RyanGlScott, Phyx Reviewed By: RyanGlScott, Phyx Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2641 GHC Trac Issues: #12661
* testsuite/driver: Allow threading on WindowsBen Gamari2016-10-171-3/+0
| | | | | | | | | | | | | | | | It seems that threading now works fine. The only caveat here is that it makes some race conditions more likely (e.g. #12554), although these also appear to affect single-threaded runs. Test Plan: Validate on Windows Reviewers: austin, Phyx Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2600 GHC Trac Issues: #10510
* testsuite/driver: More Unicode awarenessBen Gamari2016-10-171-2/+7
| | | | | | | | | | | | | Explicitly specify utf8 encoding in a few spots which were failing on Windows with Python 3. Test Plan: Validate Reviewers: austin, thomie Differential Revision: https://phabricator.haskell.org/D2602 GHC Trac Issues: #9184
* Testsuite: framework failure improvements (#11165)Thomas Miedema2016-06-281-7/+13
| | | | | | * add framework failures to unexpected results list * report errors in .T files as framework failures (show in summary) * don't report missing tests when framework failures in .T files
* Testsuite: cleanup printing of summaryThomas Miedema2016-06-281-2/+2
| | | | | | Just use a simple list of tuples, instead of a nested map. -90 lines of code.
* Testsuite: never pick up .T files in .run directoriesThomas Miedema2016-06-271-1/+1
| | | | | And use os.walk instead of calling os.listdir many times. The testsuite driver should be able to handle backward slashes on Windows now.
* Testsuite: write "\n" instead of "\r\n" when using mingw PythonThomas Miedema2016-06-181-1/+1
| | | | | | | | | | | | | | | Mingw style Python uses '\r\n' by default for newlines. This is annoying, because it means that when a GHC developer on Windows uses mingw Python to `make accept` a test, every single line of the .stderr file is touched. This makes it difficult to spot the real changes, and it leads to unnecessary git history bloat. Prevent this from happening by using io.open instead of open. See `Note [Universal newlines]` Reviewed by: Phyx Differential Revision: https://phabricator.haskell.org/D2342
* Testsuite: run tests in <testdir>.run instead of /tmpThomas Miedema2016-06-181-2/+48
| | | | | | | | | | | | | | | | | | | | As discussed in Phab:D1187, this approach makes it a bit easier to inspect the test directory while working on a new test. The only tests that needed changes are the ones that refer to files in ancestor directories. Those files are now copied directly into the test directory. validate still runs the tests in a temporary directory in /tmp, see `Note [Running tests in /tmp]` in testsuite/driver/runtests.py. Update submodule hpc. Reviewed by: simonmar Differential Revision: https://phabricator.haskell.org/D2333 GHC Trac Issues: #11980
* Testsuite driver: always quote opts.testdirThomas Miedema2016-06-071-1/+1
| | | | | | | This makes sure the testsuite keeps working when testdir contains backward slashes. Differential Revision: https://phabricator.haskell.org/D2314