summaryrefslogtreecommitdiff
path: root/tests
Commit message (Collapse)AuthorAgeFilesLines
* cmdline: port to argparseGeorg Brandl2021-01-221-22/+27
|
* conftest: disallow error tokens in examplefilesGeorg Brandl2021-01-219-23/+31
| | | | | | They are ok in small snippets to demonstrate error cases. Also recode all examplefiles to UTF-8.
* Rename "tests/lexers" to "tests/snippets" and update the contributionGeorg Brandl2021-01-20320-2/+2
| | | | docs to point to both snippets and examplefiles.
* tests: code style fixupsGeorg Brandl2021-01-2014-119/+129
|
* Also add auto-updatable output-based tests to examplefiles (#1689)Oleh Prypin2021-01-20958-2947/+1089467
| | | Co-authored-by: Georg Brandl <georg@python.org>
* Replace tests that assert on token output with auto-updatable samples (#1649)Oleh Prypin2021-01-18362-7329/+7726
|
* Matlab class properties (#1466)Dan2021-01-181-34/+137
| | | | | | | | | | | | | | | | | | | * WIP: Add failing test for a matlab class with properties. * Add some missing keywords * Add leading \s* matchers to things above the command form regex, as it tends to swallow keywords otherwise. * Add support for the special 'properties' block syntax. * Fix apparent infinite loop when given garbage input. * Use includes to clean up some of my copypasta. * Fix negative lookahead when there's more than one space between operators. * Use Whitespace not Text for spaces; combine adjacent whitespace. * Add support for declarative property constraints.
* Run pyupgrade across codebase to modernize syntax and patterns (#1622)Jon Dufresne2021-01-1767-85/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | pyupgrade is a tool to automatically upgrade syntax for newer versions of the Python language. The project has been Python 3 only since 35544e2fc6eed0ce4a27ec7285aac71ff0ddc473, allowing for several cleanups: - Remove unnecessary "-*- coding: utf-8 -*-" cookie. Python 3 reads all source files as utf-8 by default. - Replace IOError/EnvironmentError with OSError. Python 3 unified these exceptions. The old names are aliases only. - Use the Python 3 shorter super() syntax. - Remove "utf8" argument form encode/decode. In Python 3, this value is the default. - Remove "r" from open() calls. In Python 3, this value is the default. - Remove u prefix from Unicode strings. In Python 3, all strings are Unicode. - Replace io.open() with builtin open(). In Python 3, these functions are functionally equivalent. Co-authored-by: Matthäus G. Chajdas <Anteru@users.noreply.github.com>
* Fix for lexing Python raw f-strings with backslashes (#1683)Jeppe Dakin2021-01-171-0/+48
| | | | | | | | | * introduce and apply rfstringescape * add unit test for raw f-strings * add further tests * fix comment
* Added `pygmentize -C` option to guess a lexer from contentGeorg Brandl2021-01-171-0/+7
|
* Do not guess MIME or SQL without reasonGeorg Brandl2021-01-171-1/+0
| | | | constant returns from analyse_text are not useful.
* fix coding style in test_analyzer_lexerGeorg Brandl2021-01-111-22/+31
|
* Detect malformed closing tags as errors. (#1656)Catatonic2021-01-061-0/+33
|
* Fix #1582 -- invalid comment in Matlab example.Matthäus G. Chajdas2021-01-061-1/+1
|
* Markdown lexer improvements (#1623)Leistungsabfall2021-01-061-30/+106
| | | | | | | | | * improve fenced code recognition for markdown lexer * improve inline code detection * improve detection of some Markdown keywords * remove Markdown recognition of code indented by 4 spaces as reliable detection is not possible with regex
* support indented entries in IniLexer (#1624)Leistungsabfall2021-01-041-0/+81
|
* Update Crystal lexer (#1650)Oleh Prypin2021-01-042-16/+237
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * crystal: drop all classes from builtins; these aren't normally highlighted ("normally" meaning all other highlighter tools) * crystal: fix percent-strings, drop Ruby-specific arbitrary delimiters It seems that Ruby supports strings such as `%*text*` where `*` can be anything. But Crystal never had anything like that. It does, however, keep `%|text|`, so add a case for that. * crystal: update keywords and builtins * crystal: fix string literals and escape sequences Update list of escapes. Support Unicode escape sequences. Also remove the Ruby-specific `:@foo` symbol syntax, Crystal doesn't have it. * crystal: uppercase identifiers aren't always constants Make `FOO::Bar` be highlighted like `Foo::Bar` would be, rather than like `FOO` * crystal: annotations can be namespaced Highlight the entire inside part of `@[Foo::Bar]`, not just the `Foo` part (these used to be named 'attributes' but the official name is 'annotations' now, so I also change that) * fixup! crystal: fix percent-strings, drop Ruby-specific arbitrary delimiters
* Fix Coq-related bug #678 (#1648)Maximilian Wuttke2021-01-041-0/+33
| | | | | | | | | | | * Unicode support for Coq Catch-all lexing for `Name.Builtin.Pseudo`, as in the lean lexer. This fixes #678. * Coq lexer: improve `analyse_text` * Add a test for Coq
* Merge branch 'master' of https://github.com/felixhao28/pygments into ↵Matthäus G. Chajdas2021-01-041-2/+24
|\ | | | | | | felixhao28-master
| * Update tests/test_javascript.pyFelix Hao2021-01-041-1/+1
| | | | | | Co-authored-by: Mestery <48163546+Mesteery@users.noreply.github.com>
| * add test_function_definitionYiyang Hao2020-08-241-1/+23
| |
* | Bump copyright year.Matthäus G. Chajdas2021-01-0357-57/+57
| |
* | Merge github.com:mathiasertl/pygmentsGeorg Brandl2020-12-281-4/+25
|\ \ | | | | | | | | | fixes #1645
| * | add tests to illustrate problem discussed in PR #1645Mathias Ertl2020-12-261-0/+25
| | |
| * | consider trailing whitespace a part of the prompt, making copy/paste more ↵Mathias Ertl2020-12-251-5/+3
| | | | | | | | | | | | straight forward
* | | do_insertions: do not emit empty tokensGeorg Brandl2020-12-283-19/+4
|/ /
* | Restore timing stats in test_examplefiles, and cut down USD file.Georg Brandl2020-12-252-56/+16
| |
* | all: weed out more backtracking string regexesGeorg Brandl2020-12-251-3/+1
| |
* | Fix backtracking string regexes in JavascriptLexer und TypescriptLexer.Georg Brandl2020-12-171-1/+3
| | | | | | | | fixes #1637
* | Increase timeout.Matthäus G. Chajdas2020-12-051-4/+4
| | | | | | | | | | This should fix the tests failing on PyPy. Eventually we'll need a more robust solution for this.
* | Unclosed script/style tag handling Fixes #1614 (#1615)Nick Gerner2020-12-051-0/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Explicitly handle unclosed <script> and <style> tags which previously would result in O(n^2) work to lex as Error tokens per character up to the end of the line or end of file (whichever comes first). Now we try lexing the rest of the line as Javascript/CSS if there's no closing script/style tag. We recover on the next line in the root state if there is a newline, otherwise just keep parsing as Javascript/CSS. This is similar to how the error handling in lexer.py works except we get Javascript or CSS tokens instead of Error tokens. And we get to the end of the line much faster since we don't apply an O(n) regex for every character in the line. I added a new test suite for html lexer (there wasn't one except for coverage in test_examplefiles.py) including a trivial happy-path case and several cases around <script> and <style> fragments, including regression coverage that fails on the old logic.
* | testing turtle prefix names where reference starts with number (#1590)elf Pavlik2020-12-052-6/+48
| | | | | | | | | | | | | | | | | | * testing turtle prefix names where reference starts with number * remove case insensitive flag from Turtle lexer * use same end-of-string regex as in SPARQL and ShExC * make example.ttl valid turtle
* | Fix a catastrophic backtracking bug in JavaLexer (#1594)Kurt McKee2020-11-091-1/+24
| | | | | | | | | | | | | | * JavaLexer: Demonstrate a catastrophic backtracking bug * JavaLexer: Fix a catastrophic backtracking bug Closes #1586
* | Fix Mason regex.Matthäus G. Chajdas2020-11-081-2/+1
| | | | | | | | Previously, the tag was cut off.
* | Fix Mason regex.Matthäus G. Chajdas2020-11-081-2/+65
| | | | | | | | | | | | | | | | | | Previously, something like: <%class>text</%class> would not get matched correctly. This was due to the capturing group capturing the wrong part of the tag -- instead of class, it would capture the part after class before >. With this commit, the capturing group correctly matches the start/end tag. This commit also adds a unit test to verify this.
* | fix closing tag for unnamed blocks on MasonLexer (#1592)Carlos Henrique Guardão Gandarez2020-11-081-1/+16
| |
* | test_templates: simplify and rename moduleGeorg Brandl2020-10-301-15/+3
| |
* | added documentationSean McElwain2020-10-301-0/+4
| |
* | added test to track djangojavascript lexer fixSean McElwain2020-10-301-0/+37
| |
* | Fix test.Matthäus G. Chajdas2020-10-281-1/+1
| |
* | Remove margin: 0 from <pre> styling.Matthäus G. Chajdas2020-10-2833-33/+33
| | | | | | | | | | | | This seems to break some themes which were not expecting Pygments to change margins, and it doesn't look like it makes a difference for standalone Pygments.
* | MySQL: Tokenize quoted schema object names, and escape characters, uniquely ↵Kurt McKee2020-10-272-9/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#1555) * MySQL: Tokenize quoted schema object names, and escape characters, uniquely Changes in this patch: * Name.Quoted and Name.Quoted.Escape are introduced as non-standard tokens * HTML and LaTeX formatters were confirmed to provide default formatting if they encounter these two non-standard tokens. They also add style classes based on the token name, like "n-Quoted" (HTML) or "nQuoted" (LaTeX) so that users can add custom styles for these. * Removed "\`" and "\\" as schema object name escapes. These are relics of the previous regular expression for backtick-quoted names and are not treated as escape sequences. The behavior was confirmed in the MySQL documentation as well as by running queries in MySQL Workbench. * Prevent "123abc" from being treated as an integer followed by a schema object name. MySQL allows leading numbers in schema object names as long as 0-9 are not the only characters in the schema object name. * Add ~10 more unit tests to validate behavior. Closes #1551 * Remove an end-of-line regex match that triggered a lint warning Also, add tests that confirm correct behavior. No tests failed before or after removing the '$' match in the regex, but now regexlint isn't complaining. Removing the '$' matching probably depends on the fact that Pygments adds a newline at the end of the input text, so there is always something after a bare integer literal.
* | Speed up JSON and reduce HTML formatter consumption (#1569)Kurt McKee2020-10-261-56/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Update the JSON-LD keyword list to match JSON-LD 1.1 Changes in this patch: * Update the JSON-LD URL to HTTPS * Update the list of JSON-LD keywords * Make the JSON-LD parser less dependent on the JSON lexer implementation * Add unit tests for the JSON-LD lexer * Add unit tests for the JSON parser This includes: * Testing valid literals * Testing valid string escapes * Testing that object keys are tokenized differently from string values * Rewrite the JSON lexer Related to #1425 Included in this change: * The JSON parser is rewritten * The JSON bare object parser no longer requires additional code * `get_tokens_unprocessed()` returns as much as it can to reduce yields (for example, side-by-side punctuation is not returned separately) * The unit tests were updated * Add unit tests based on Hypothesis test results * Reduce HTML formatter memory consumption by ~33% and speed it up Related to #1425 Tested on a 118MB JSON file. Memory consumption tops out at ~3GB before this patch and drops to only ~2GB with this patch. These were the command lines used: python -m pygments -l json -f html -o .\new-code-classes.html .\jc-output.txt python -m pygments -l json -f html -O "noclasses" -o .\new-code-styles.html .\jc-output.txt * Add an LRU cache to the HTML formatter's HTML-escaping and line-splitting For a 118MB JSON input file, this reduces memory consumption by ~500MB and reduces formatting time by ~15 seconds. * JSON: Add a catastrophic backtracking test back to the test suite * JSON: Update the comment that documents the internal queue * JSON: Document in comments that ints/floats/constants are not validated
* | Speculative fix for #1579. (#1583)Matthäus G. Chajdas2020-10-2433-102/+102
| | | | | | | | This removes the top/bottom padding changes, and only keeps left/right padding, in the hope that this does not break all Sphinx themes.
* | TNTLexer: Don't crash on unexpected EOL. (#1570)Ken2020-10-141-0/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * TNTLexer: Don't crash on unexpected EOL Catch IndexErrors in each line and error the rest of the line, leaving whatever tokens were found. * Write and pass tests for Typographic Number Theory pygments/lexers/tnt.py: * Fix indentation on import * Fix: TNTLexer.cur is class-level reference if not initialized in get_tokens_unprocessed, so init it in __init__ too * Fix: Fantasy markers are not allowed as components of other formulas, so have a dedicated check for them in the body of get_tokens_unprocessed which disables the normal formula handling if present * Clarify TNTLexer.lineno docstring * Attempt to discard tokens before an IndexError +tests/test_tnt.py: * Test every method, and test both +ve and -ve matches for most * Lexer fixture is test-level to reinitialize cur clean each time * Don't test actual get_tokens_unprocessed method (besides for fantasy markers) because the text testing is left to examplefiles AUTHORS: + Add myself to credits :) * Add a TNT test just to make sure no crashes
* | Fix LatexEmbeddedLexer (#1517)Eric Wieser2020-09-301-1/+50
| | | | | | | | | | | | | | * Fix LatexEmbeddedLexer to not call the nested tokenizer piecewise * Reuse the existing do_insertions function * Add a test for the `LatexEmbeddedLexer`
* | Add analyze_text to make make check happy. (#1549)Matthäus G. Chajdas2020-09-231-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add analyze_text to make make check happy. This also fixes a few small bugs: * Slash uses *.sla as the file ending, not *.sl * IDL has endelse, not elseelse * Improve various analyse_text methods. * Improve various analyse_text methods. * Make Perl less confident in presence of :=. * Improve brainfuck check to not parse the whole input. * Improve Unicon by matching \self, /self * Fix Ezhil not matching against the input text * Simplify Modula2::analyse_text.
* | python traceback lexer: fix custom exceptions without messageGeorg Brandl2020-09-191-0/+4
| | | | | | | | fixes #1548
* | fix regression in JSON lexer, bump to 2.7.12.7.1Georg Brandl2020-09-171-7/+12
| | | | | | | | Fixes #1544
* | all: remove "u" string prefix (#1536)Georg Brandl2020-09-0835-1804/+1795
| | | | | | | | | | | | | | | | | | | | | | * all: remove "u" string prefix * util: remove unirange Since Python 3.3, all builds are wide unicode compatible. * unistring: remove support for narrow-unicode builds which stopped being relevant with Python 3.3