summaryrefslogtreecommitdiff
path: root/tests
Commit message (Collapse)AuthorAgeFilesLines
* Fix backtracking string regexes in JavascriptLexer und TypescriptLexer.Georg Brandl2020-12-171-1/+3
| | | | fixes #1637
* Increase timeout.Matthäus G. Chajdas2020-12-051-4/+4
| | | | | This should fix the tests failing on PyPy. Eventually we'll need a more robust solution for this.
* Unclosed script/style tag handling Fixes #1614 (#1615)Nick Gerner2020-12-051-0/+129
| | | | | | | | | | | | | | | | | | | Explicitly handle unclosed <script> and <style> tags which previously would result in O(n^2) work to lex as Error tokens per character up to the end of the line or end of file (whichever comes first). Now we try lexing the rest of the line as Javascript/CSS if there's no closing script/style tag. We recover on the next line in the root state if there is a newline, otherwise just keep parsing as Javascript/CSS. This is similar to how the error handling in lexer.py works except we get Javascript or CSS tokens instead of Error tokens. And we get to the end of the line much faster since we don't apply an O(n) regex for every character in the line. I added a new test suite for html lexer (there wasn't one except for coverage in test_examplefiles.py) including a trivial happy-path case and several cases around <script> and <style> fragments, including regression coverage that fails on the old logic.
* testing turtle prefix names where reference starts with number (#1590)elf Pavlik2020-12-052-6/+48
| | | | | | | | | * testing turtle prefix names where reference starts with number * remove case insensitive flag from Turtle lexer * use same end-of-string regex as in SPARQL and ShExC * make example.ttl valid turtle
* Fix a catastrophic backtracking bug in JavaLexer (#1594)Kurt McKee2020-11-091-1/+24
| | | | | | | * JavaLexer: Demonstrate a catastrophic backtracking bug * JavaLexer: Fix a catastrophic backtracking bug Closes #1586
* Fix Mason regex.Matthäus G. Chajdas2020-11-081-2/+1
| | | | Previously, the tag was cut off.
* Fix Mason regex.Matthäus G. Chajdas2020-11-081-2/+65
| | | | | | | | | Previously, something like: <%class>text</%class> would not get matched correctly. This was due to the capturing group capturing the wrong part of the tag -- instead of class, it would capture the part after class before >. With this commit, the capturing group correctly matches the start/end tag. This commit also adds a unit test to verify this.
* fix closing tag for unnamed blocks on MasonLexer (#1592)Carlos Henrique Guardão Gandarez2020-11-081-1/+16
|
* test_templates: simplify and rename moduleGeorg Brandl2020-10-301-15/+3
|
* added documentationSean McElwain2020-10-301-0/+4
|
* added test to track djangojavascript lexer fixSean McElwain2020-10-301-0/+37
|
* Fix test.Matthäus G. Chajdas2020-10-281-1/+1
|
* Remove margin: 0 from <pre> styling.Matthäus G. Chajdas2020-10-2833-33/+33
| | | | | | This seems to break some themes which were not expecting Pygments to change margins, and it doesn't look like it makes a difference for standalone Pygments.
* MySQL: Tokenize quoted schema object names, and escape characters, uniquely ↵Kurt McKee2020-10-272-9/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#1555) * MySQL: Tokenize quoted schema object names, and escape characters, uniquely Changes in this patch: * Name.Quoted and Name.Quoted.Escape are introduced as non-standard tokens * HTML and LaTeX formatters were confirmed to provide default formatting if they encounter these two non-standard tokens. They also add style classes based on the token name, like "n-Quoted" (HTML) or "nQuoted" (LaTeX) so that users can add custom styles for these. * Removed "\`" and "\\" as schema object name escapes. These are relics of the previous regular expression for backtick-quoted names and are not treated as escape sequences. The behavior was confirmed in the MySQL documentation as well as by running queries in MySQL Workbench. * Prevent "123abc" from being treated as an integer followed by a schema object name. MySQL allows leading numbers in schema object names as long as 0-9 are not the only characters in the schema object name. * Add ~10 more unit tests to validate behavior. Closes #1551 * Remove an end-of-line regex match that triggered a lint warning Also, add tests that confirm correct behavior. No tests failed before or after removing the '$' match in the regex, but now regexlint isn't complaining. Removing the '$' matching probably depends on the fact that Pygments adds a newline at the end of the input text, so there is always something after a bare integer literal.
* Speed up JSON and reduce HTML formatter consumption (#1569)Kurt McKee2020-10-261-56/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Update the JSON-LD keyword list to match JSON-LD 1.1 Changes in this patch: * Update the JSON-LD URL to HTTPS * Update the list of JSON-LD keywords * Make the JSON-LD parser less dependent on the JSON lexer implementation * Add unit tests for the JSON-LD lexer * Add unit tests for the JSON parser This includes: * Testing valid literals * Testing valid string escapes * Testing that object keys are tokenized differently from string values * Rewrite the JSON lexer Related to #1425 Included in this change: * The JSON parser is rewritten * The JSON bare object parser no longer requires additional code * `get_tokens_unprocessed()` returns as much as it can to reduce yields (for example, side-by-side punctuation is not returned separately) * The unit tests were updated * Add unit tests based on Hypothesis test results * Reduce HTML formatter memory consumption by ~33% and speed it up Related to #1425 Tested on a 118MB JSON file. Memory consumption tops out at ~3GB before this patch and drops to only ~2GB with this patch. These were the command lines used: python -m pygments -l json -f html -o .\new-code-classes.html .\jc-output.txt python -m pygments -l json -f html -O "noclasses" -o .\new-code-styles.html .\jc-output.txt * Add an LRU cache to the HTML formatter's HTML-escaping and line-splitting For a 118MB JSON input file, this reduces memory consumption by ~500MB and reduces formatting time by ~15 seconds. * JSON: Add a catastrophic backtracking test back to the test suite * JSON: Update the comment that documents the internal queue * JSON: Document in comments that ints/floats/constants are not validated
* Speculative fix for #1579. (#1583)Matthäus G. Chajdas2020-10-2433-102/+102
| | | | This removes the top/bottom padding changes, and only keeps left/right padding, in the hope that this does not break all Sphinx themes.
* TNTLexer: Don't crash on unexpected EOL. (#1570)Ken2020-10-141-0/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * TNTLexer: Don't crash on unexpected EOL Catch IndexErrors in each line and error the rest of the line, leaving whatever tokens were found. * Write and pass tests for Typographic Number Theory pygments/lexers/tnt.py: * Fix indentation on import * Fix: TNTLexer.cur is class-level reference if not initialized in get_tokens_unprocessed, so init it in __init__ too * Fix: Fantasy markers are not allowed as components of other formulas, so have a dedicated check for them in the body of get_tokens_unprocessed which disables the normal formula handling if present * Clarify TNTLexer.lineno docstring * Attempt to discard tokens before an IndexError +tests/test_tnt.py: * Test every method, and test both +ve and -ve matches for most * Lexer fixture is test-level to reinitialize cur clean each time * Don't test actual get_tokens_unprocessed method (besides for fantasy markers) because the text testing is left to examplefiles AUTHORS: + Add myself to credits :) * Add a TNT test just to make sure no crashes
* Fix LatexEmbeddedLexer (#1517)Eric Wieser2020-09-301-1/+50
| | | | | | | * Fix LatexEmbeddedLexer to not call the nested tokenizer piecewise * Reuse the existing do_insertions function * Add a test for the `LatexEmbeddedLexer`
* Add analyze_text to make make check happy. (#1549)Matthäus G. Chajdas2020-09-231-0/+60
| | | | | | | | | | | | | | | | | | | * Add analyze_text to make make check happy. This also fixes a few small bugs: * Slash uses *.sla as the file ending, not *.sl * IDL has endelse, not elseelse * Improve various analyse_text methods. * Improve various analyse_text methods. * Make Perl less confident in presence of :=. * Improve brainfuck check to not parse the whole input. * Improve Unicon by matching \self, /self * Fix Ezhil not matching against the input text * Simplify Modula2::analyse_text.
* python traceback lexer: fix custom exceptions without messageGeorg Brandl2020-09-191-0/+4
| | | | fixes #1548
* fix regression in JSON lexer, bump to 2.7.12.7.1Georg Brandl2020-09-171-7/+12
| | | | Fixes #1544
* all: remove "u" string prefix (#1536)Georg Brandl2020-09-0835-1804/+1795
| | | | | | | | | | | * all: remove "u" string prefix * util: remove unirange Since Python 3.3, all builds are wide unicode compatible. * unistring: remove support for narrow-unicode builds which stopped being relevant with Python 3.3
* Update Fennel keywords to catch up to version 0.6.0.Phil Hagelberg2020-09-071-44/+90
| | | | | | Remove support for single-quoted strings. Update fennelview example to latest version of library.
* Overhaul Javascript numeric literals (#1534)Kurt McKee2020-09-062-70/+154
| | | | | | | | | | | | | | | | | | | | | | * Rename the "Javascript" tests to reflect that they are for CoffeeScript This change also modifies the module docstring to reflect the file's purpose. * Overhaul the Javascript numeric literal parsing Fixes #307 This patch contains the following changes: * Adds 50+ unit tests for Javascript numeric literals * Forces ASCII numbers for float literals (so, now reject `.୪`) * Adds support for Javascript's BigInt notation (`100n`) * Adds support for leading-zero-only octal notation (`0777`) * Adds support for scientific notation with no significand (`1e10`) Numeric literal parsing is based on information at: * https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Grammar_and_types * https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures
* Overhaul the MySQL lexer (#1527)Kurt McKee2020-09-062-0/+381
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Overhaul the MySQL lexer Fixes #975, #1063, #1453 Changes include: Documentation ------------- * Note in the lexer docstring that Oracle MySQL is the target syntax. MariaDB syntax is not a target (though there is significant overlap). Unit tests ---------- * Add 140 unit tests for MySQL. Literals -------- * Hexadecimal/binary/date/time/timestamp literals are supported. * Integer mantissas are supported for scientific notation. * In-string escapes are now tokenized properly. * Support the "unknown" constant. Comments -------- * Optimizer hints are now supported, and keywords are recognized and tokenized as preprocessor instructions. * Remove nested multi-line comment support, which is no longer supported in MySQL. Variables --------- * Support the '@' prefix for variable names. * Lift restrictions on characters in unquoted variable names. (MySQL does not impose a restriction on lead characters.) * Support single/double/backtick-quoted variable names, including escapes. * Support the '@@' prefix for system variable names. * Support '?' as a variable so people can demonstrate prepared statements. Keywords -------- * Keyword / data type / function are now in a separate, auto-updating file. * Support 25 additional data types (including spatial and JSON types). * Support 460 additional MySQL keywords. * Support 372 MySQL functions. Explicit function support resolves a bug that causes non-function items to be treated as functions simply because they have a trailing opening parenthesis. * Support exceptions for the 'SET' keyword, which is both a datatype and a keyword depending on context. Schema object names ------------------- * Support Unicode in MySQL schema object names. * Support parsing of backtick-quoted schema object name escapes. (Escapes do not produce a distinct token type at this time.) Operators --------- * Remove non-operator characters from the list of operators. * Remove non-punctuation characters from the list of punctuation. * Cleanup items based on feedback * Remove an unnecessary optional newline lookahead for single-line comments
* all: fixup remaining regexlint warningsGeorg Brandl2020-09-061-1/+0
|
* Add lexer for PsySH console for PHP (#1438)Ben Ramsey2020-09-041-0/+47
| | | | | This lexer is based on the PythonConsoleLexer and provides the ability to highlight console input and output for PsySH, a developer console and REPL for PHP. See https://psysh.org.
* more explicitly define escape sequencies in JsonLexer (fix #1065) (#1528)Nick Gerner2020-08-311-0/+28
| | | | | * more explicitly define escape sequencies in JsonLexer (fix #1065) * adding test coverage for #1065
* Fix cmake header (#1491)Thomas Aglassinger2020-08-231-0/+29
| | | | | | | | | * Fixed guessing of CMake by header. * Version number can have multiple digits. * Tabs are handled as white space. * Trailing comments are ignored. * Cleaned up regex to detect CMake header.
* Add lexer for Pointless (#1494)Avery N. Nortonsmith2020-08-231-0/+30
| | | | | | | | | | | | | | | | | | | * add lexer for pointless * lexer docstring formatting * added link to languages doc * update authors * update version * added double string * added upval keyword * simplify ptls example code * rename doubleString -> multiString
* Update copyright year (fixes #1514.)Matthäus G. Chajdas2020-08-2248-48/+48
|
* Merge pull request #1500 from pygments/improve-linenos-handlingMatthäus G. Chajdas2020-08-2264-832/+704
|\ | | | | Improve HTML formatter output.
| * Improve HTML formatter output.improve-linenos-handlingMatthäus G. Chajdas2020-07-3164-832/+704
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the previous changes, we started to emit one <pre> per line for line numbers. This breaks for instance the Sphinx-RTD-Theme, which expects the line numbers to be formatted the same way as the normal content. This commit makes the following changes: * Emit a single <pre> inside the linenos div * Wrap individual lines into <span> as needed * Update all tests * Don't yield empty <span> elements when no style is specified This also makes the .html test files look correct when looked at with a browser, as there is no extra whitespace in them which needs stripping.
* | Added BARE schema lexer (#1488)Martijn Braam2020-08-221-0/+43
| |
* | Manually merge PR#1497.Matthäus G. Chajdas2020-08-221-0/+18
| | | | | | | | | | This is a manual merge as we don't want to pull in the documentation change as part of this fix for a cleaner history.
* | Add a PromQL lexer (#1506)Pablo SEMINARIO2020-08-192-0/+318
| | | | | | Including tests and an example.promql file.
* | Update for Csound 6.15.0 (#1509)Nate Whetsell2020-08-171-1/+1
|/ | | | | * Update for Csound 6.15.0 * Update comment
* Improve Markdown lexer (#1495)Leistungsabfall2020-07-211-0/+525
| | | | | | | | | | | | | | | | | | | | | * Add support for Setext-style headings in Markdown * Improve inline code detection in Markdown * Add support for indented code blocks in Markdown * Improve italics & bold detection in Markdown * Simplify italics & bold regexes in Markdown * Add warning about possible unrecognized internal tags in Markdown * Improve striktethrough detection in Markdown * Small bugfix in Markdown * Small bugfix in Markdown * Small refactoring in Markdown
* Fix Solarized line number colors (#1477)Paweł Fertyk2020-07-0467-37/+1064
| | | | | | | | | | | | | | | | | * Add font and background colors to Style * Move all styles to get_style_defs, add tests * Remove hardcoded styles, add special lineno style * Add styles for special line numbers in tables * Update noclasses documentation * Refactor linenos elements and styles, add tests * Update AUTHORS * Fix multiple CSS prefixes, add tests
* Add support for PowerShell Remoting sessions (#1398)Geert Smelt2020-06-301-1/+29
| | | | | | | | | | | * Add support for PowerShell Remoting sessions * Add test case for PowerShell Remoting sessions * Make whitespace after prompt optional * Fix test case containing backslashes * Add test case for local PowerShell sessions
* Add Arrow lexer (#1481)Ken2020-06-211-0/+60
| | | | | | | * Add Arrow lexer * Pass tests: raw string for regex * Make requested changes
* Improve SystemVerilog class/endclass lexer rules (#1471)Chris Drake2020-06-061-0/+93
| | | | | | | | | | | The class looks like: class class_identifier [#(param_decls)] [extends class_identifier #(params)]; ... endclass [: class_identifier] Using the same Java convention of Keyword.Declaration and Name.Class. Add a test_systemverilog_classes unit test to test_hdl.
* add Singularity lexer (#1285)Georg Brandl2020-06-011-0/+45
| | | Co-authored-by: Bryton Hall <email@bryton.io>
* SystemVerilog keyword/operator improvements (#1464)Chris Drake2020-06-011-0/+249
| | | | | | | | | | | | | | | | | | | | | | | | | * Move SystemVerilog type keywords Put them next to the generic keywords list. * Change a couple SystemVerilog keywords to operators The 'inside' and 'dist' keywords are described as operators in the SystemVerilog standard, below unary increment/decrement, and above concatenation in precedence. See 1800-2017 tables 11-1 and 11-2 for a list of operators. This matches the description of pygemnts Operator.Word token: "For any operator that is a word (e.g. not)." * Add a SystemVerilog operators unit test Copy/paste the contents of 1800-2017 Table 11-2, and see what the SV lexer chops it up into. I made lots of comments for potential improvements. Some operators, such as '[' and '.' are being labeled as punctuation. Also, multi-character operators such as '<<<=' are being split up into multiple, single-character tokens, eg '<' '<' '<' '='.
* Add GDScript lexer (#1457)Paweł Fertyk2020-06-013-2/+258
| | | | | | | | | | | | | | | | | | | | | * Added GDScript lexer * Fix regular expressions in GDScript lexer * Update GDScript lexer with the current version from Godot docs * Add tests for GDScript lexer * Update authors * Add an example file for GDScript * Implement analyze_text for GAP and GDScript * Fix example file name in tests * Update license Co-authored-by: Daniel J. Ramirez <djrmuv@gmail.com>
* Refactor SystemVerilog unit testsChris Drake2020-05-261-250/+252
| | | | | | | | | | | | Most of the contents of these two unit tests are static. Move things around so the entire test fits on a single page, for better readability/maintainability. Name the code part <TEST_NAME>_TEXT, and the tokens part <TEST_NAME>_TOKENS. Choosing "text" b/c it's the parameter name to the lexer.get_tokens(text) method.
* Update `Inform6Lexer` to Inform 6.34 (#1461)David Corbett2020-05-261-11/+19
|
* Add lexer for Devicetree language (#1434)Maxime Chretien2020-05-261-0/+164
| | | | | | | | | | | | | | | | | | | * Add lexer for Devicetree language Signed-off-by: Maxime Chretien <maxime.chretien@bootlin.com> * Devicetree lexer: fix random input test error Signed-off-by: Maxime Chretien <maxime.chretien@bootlin.com> * Devicetree lexer: fix example file reference Signed-off-by: Maxime Chretien <maxime.chretien@bootlin.com> * Devicetree lexer: Reduce example file size Also add some missing language elements Signed-off-by: Maxime Chretien <maxime.chretien@bootlin.com>
* Update SystemVerilog literal constants (#1460)Chris Drake2020-05-261-8/+144
| | | | | | | | | | | | | | | | The original implementation was missing some of the more arcane features such as underbars, the character 's' for signed/unsigned, support for spaces before/after the base specifier, capital letter base specifiers (ie 'B 'D 'H), and the 4-state 'xXzZ?' characters. For regular integers, the 'l' and 'L' suffixes are not valid. That is, unlike C, in Verilog '42L' is not a valid int literal. Create a new test that exercises most of the interesting kinds of SystemVerilog numbers. This fixes a couple minor issues with what type of number the lexer returns. For example, Numbers like '42' used to return Integer.Hex, but now return Integer.Decimal.
* Add support for .tid files (TiddlyWiki5) (#1390)Max2020-05-241-0/+72
| | | | | | | | | | | * add support for .tid files (TiddlyWiki5) * add lexers/_mapping.py * markup.py: change versionadded of TiddlyWiki5Lexer to 2.7 * markup.py, TiddlyWiki5Lexer: use non-greedy matcher for table headers, footers, captions and classes * markup.py, TiddlyWiki5Lexer: make timestamps of type Number.Integer