summaryrefslogtreecommitdiff
path: root/pygments
Commit message (Collapse)AuthorAgeFilesLines
* Remove the alias for the RawTokenLexer.raw-aliasGeorg Brandl2020-12-242-2/+2
| | | | | | | | RawTokenLexer was broken until 2.7.4, so it seems pretty much unused, and it led to tracebacks when the "raw" alias was used from some markup that allows specifying a language alias. We'll still keep the class for special usage as intended.
* fix oversightGeorg Brandl2020-12-191-1/+1
|
* fix inefficient regexes for guessing lexersGeorg Brandl2020-12-192-3/+1
|
* Limit recursion with nesting Ruby heredocsGeorg Brandl2020-12-172-4/+10
| | | | fixes #1638
* Fix backtracking string regexes in JavascriptLexer und TypescriptLexer.Georg Brandl2020-12-171-5/+24
| | | | fixes #1637
* fixes #1625: infinite loop in SML lexerGeorg Brandl2020-12-101-6/+6
| | | | | Reason was a lookahead-only pattern which was included in the state where the lookahead was transitioning to.
* Prepare 2.7.3 release.2.7.3Matthäus G. Chajdas2020-12-061-1/+1
|
* Unclosed script/style tag handling Fixes #1614 (#1615)Nick Gerner2020-12-051-0/+12
| | | | | | | | | | | | | | | | | | | Explicitly handle unclosed <script> and <style> tags which previously would result in O(n^2) work to lex as Error tokens per character up to the end of the line or end of file (whichever comes first). Now we try lexing the rest of the line as Javascript/CSS if there's no closing script/style tag. We recover on the next line in the root state if there is a newline, otherwise just keep parsing as Javascript/CSS. This is similar to how the error handling in lexer.py works except we get Javascript or CSS tokens instead of Error tokens. And we get to the end of the line much faster since we don't apply an O(n) regex for every character in the line. I added a new test suite for html lexer (there wasn't one except for coverage in test_examplefiles.py) including a trivial happy-path case and several cases around <script> and <style> fragments, including regression coverage that fails on the old logic.
* testing turtle prefix names where reference starts with number (#1590)elf Pavlik2020-12-051-11/+51
| | | | | | | | | * testing turtle prefix names where reference starts with number * remove case insensitive flag from Turtle lexer * use same end-of-string regex as in SPARQL and ShExC * make example.ttl valid turtle
* Update mapfiles and CHANGES.Matthäus G. Chajdas2020-12-051-1/+1
|
* Update jvm.py (#1587)Boris Kheyfets2020-12-051-1/+2
| | | Added support for kotlin scripts.
* ImgFormatter: Use the start position based on the length of text (#1611)strawberry beach sandals2020-11-281-8/+20
|
* llvm lexer: add poison keyword (#1612)Nuno Lopes2020-11-281-1/+1
|
* fix ecl doc reference (#1609)Carlos Henrique Guardão Gandarez2020-11-251-1/+1
|
* lean: Add missing keywordsEric Wieser2020-11-191-0/+2
|
* JuttleLexer: Fix duplicate 'juttle' occurance in lexer aliases.Sumanth V Rao2020-11-192-2/+2
| | | | | | | | | | | | | | | | | The output from pygments.lexers.get_all_lexers() contains 'juttle' twice in the aliases section for the Juttle lexer entry. This could be reproduced using: >>> from pygments.lexers import get_all_lexers >>> lexers = get_all_lexers() >>> {alias[0]: alias[1] for alias in lexers}.get('Juttle') ('juttle', 'juttle') This patch fixes the duplicate entry and generates the associated _mapping.py file. Fixes: #1604
* Rust: update builtins/macros/keywords for 1.47Georg Brandl2020-11-191-27/+36
|
* minor variable name fixupGeorg Brandl2020-11-191-5/+5
|
* Rust lexer: changing rust macro typeK. Lux2020-11-191-1/+1
| | | | Rust macros seem to fit more into the "magic function" category than into the "builtin" one.
* Rust lexer: bug fix with regex lexer and '!' + r'\b'K. Lux2020-11-191-1/+1
| | | | | | Rust macros end with a '!'. The word border (regex '\b') for such expressions is located before the '!' (e. g. "print\b!(...)"). The regex here used the suffix option, which added an r'\b' after each regex (e. g. r'print!\b'). Therefore, the supplied regular expressions didn't match the rust macros. To fix this problem, the suffix is removed. As every macro ends with an '!' (which implicitely includes a word border before), it's not necessary anyway.
* Rust lexer: move keywords from funcs_macros to typesK. Lux2020-11-191-2/+1
| | | | 'drop', 'Some', 'None', 'Ok' and 'Err' are types, not macros.
* Add Javascript 'async', 'await' keywords (#1605)Chris Nevers2020-11-171-1/+1
|
* shell: improve docstrings for the "session" type lexersGeorg Brandl2020-11-111-5/+10
| | | | fixes #1599
* json: deprecate BareJsonObjectLexerGeorg Brandl2020-11-112-6/+10
| | | | fixes #1600
* Fix a catastrophic backtracking bug in JavaLexer (#1594)Kurt McKee2020-11-091-1/+8
| | | | | | | * JavaLexer: Demonstrate a catastrophic backtracking bug * JavaLexer: Fix a catastrophic backtracking bug Closes #1586
* Fix Mason regex.Matthäus G. Chajdas2020-11-081-3/+2
| | | | Previously, the tag was cut off.
* Fix Mason regex.Matthäus G. Chajdas2020-11-081-1/+1
| | | | | | | | | Previously, something like: <%class>text</%class> would not get matched correctly. This was due to the capturing group capturing the wrong part of the tag -- instead of class, it would capture the part after class before >. With this commit, the capturing group correctly matches the start/end tag. This commit also adds a unit test to verify this.
* fix closing tag for unnamed blocks on MasonLexer (#1592)Carlos Henrique Guardão Gandarez2020-11-081-1/+1
|
* removed \'{* ... *}\' as a django commentSean McElwain2020-10-301-1/+1
|
* Remove margin: 0 from <pre> styling.Matthäus G. Chajdas2020-10-281-1/+1
| | | | | | This seems to break some themes which were not expecting Pygments to change margins, and it doesn't look like it makes a difference for standalone Pygments.
* MySQL: Tokenize quoted schema object names, and escape characters, uniquely ↵Kurt McKee2020-10-271-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#1555) * MySQL: Tokenize quoted schema object names, and escape characters, uniquely Changes in this patch: * Name.Quoted and Name.Quoted.Escape are introduced as non-standard tokens * HTML and LaTeX formatters were confirmed to provide default formatting if they encounter these two non-standard tokens. They also add style classes based on the token name, like "n-Quoted" (HTML) or "nQuoted" (LaTeX) so that users can add custom styles for these. * Removed "\`" and "\\" as schema object name escapes. These are relics of the previous regular expression for backtick-quoted names and are not treated as escape sequences. The behavior was confirmed in the MySQL documentation as well as by running queries in MySQL Workbench. * Prevent "123abc" from being treated as an integer followed by a schema object name. MySQL allows leading numbers in schema object names as long as 0-9 are not the only characters in the schema object name. * Add ~10 more unit tests to validate behavior. Closes #1551 * Remove an end-of-line regex match that triggered a lint warning Also, add tests that confirm correct behavior. No tests failed before or after removing the '$' match in the regex, but now regexlint isn't complaining. Removing the '$' matching probably depends on the fact that Pygments adds a newline at the end of the input text, so there is always something after a bare integer literal.
* Add 'some' Ada reserved word (#1581)Léo Germond2020-10-271-3/+3
| | | | | | The some Ada reserved word is available since Ada 2012, it is used in the same context as the any keyword. See RM 2.9 - Reserved Words https://www.adaic.org/resources/add_content/standards/12rm/html/RM-2-9.html for a list of keywords (with this inclusion, all are covered if I'm not mistaken) and for usage example See RM 4.5.8 - Quantified expressions https://www.adaic.org/resources/add_content/standards/12rm/html/RM-4-5-8.html
* Speed up JSON and reduce HTML formatter consumption (#1569)Kurt McKee2020-10-262-102/+243
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Update the JSON-LD keyword list to match JSON-LD 1.1 Changes in this patch: * Update the JSON-LD URL to HTTPS * Update the list of JSON-LD keywords * Make the JSON-LD parser less dependent on the JSON lexer implementation * Add unit tests for the JSON-LD lexer * Add unit tests for the JSON parser This includes: * Testing valid literals * Testing valid string escapes * Testing that object keys are tokenized differently from string values * Rewrite the JSON lexer Related to #1425 Included in this change: * The JSON parser is rewritten * The JSON bare object parser no longer requires additional code * `get_tokens_unprocessed()` returns as much as it can to reduce yields (for example, side-by-side punctuation is not returned separately) * The unit tests were updated * Add unit tests based on Hypothesis test results * Reduce HTML formatter memory consumption by ~33% and speed it up Related to #1425 Tested on a 118MB JSON file. Memory consumption tops out at ~3GB before this patch and drops to only ~2GB with this patch. These were the command lines used: python -m pygments -l json -f html -o .\new-code-classes.html .\jc-output.txt python -m pygments -l json -f html -O "noclasses" -o .\new-code-styles.html .\jc-output.txt * Add an LRU cache to the HTML formatter's HTML-escaping and line-splitting For a 118MB JSON input file, this reduces memory consumption by ~500MB and reduces formatting time by ~15 seconds. * JSON: Add a catastrophic backtracking test back to the test suite * JSON: Update the comment that documents the internal queue * JSON: Document in comments that ints/floats/constants are not validated
* Prepare 2.7.2 release.2.7.2Matthäus G. Chajdas2020-10-241-1/+1
| | | | Update CHANGES, bump version.
* Speculative fix for #1579. (#1583)Matthäus G. Chajdas2020-10-241-2/+2
| | | | This removes the top/bottom padding changes, and only keeps left/right padding, in the hope that this does not break all Sphinx themes.
* TNTLexer: Don't crash on unexpected EOL. (#1570)Ken2020-10-141-54/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * TNTLexer: Don't crash on unexpected EOL Catch IndexErrors in each line and error the rest of the line, leaving whatever tokens were found. * Write and pass tests for Typographic Number Theory pygments/lexers/tnt.py: * Fix indentation on import * Fix: TNTLexer.cur is class-level reference if not initialized in get_tokens_unprocessed, so init it in __init__ too * Fix: Fantasy markers are not allowed as components of other formulas, so have a dedicated check for them in the body of get_tokens_unprocessed which disables the normal formula handling if present * Clarify TNTLexer.lineno docstring * Attempt to discard tokens before an IndexError +tests/test_tnt.py: * Test every method, and test both +ve and -ve matches for most * Lexer fixture is test-level to reinitialize cur clean each time * Don't test actual get_tokens_unprocessed method (besides for fantasy markers) because the text testing is left to examplefiles AUTHORS: + Add myself to credits :) * Add a TNT test just to make sure no crashes
* llvm lexer: add freeze instruction and bfloat type (#1565)Nuno Lopes2020-10-061-11/+12
|
* Fix spelling mistakes - fixes #1562.Matthäus G. Chajdas2020-10-041-2/+2
|
* Add missing tokens to SPARQL lexer (#1559)Lucas Werkmeister2020-10-021-5/+5
| | | | | | | | | | | | | | | | @belett noticed that VALUES was missing [1]; I found the other ones by running the following snippet on the SPARQL 1.1 Query Language spec: new Set(Array.from(document.querySelectorAll('.grammarTable')) .reduce((text, elem) => text + elem.textContent) .match(/'[a-z0-9-_ ]*'/ig)) I don’t know why a few keywords were missing; the docstring linked to the SPARQL 1.0 Query Language spec (also fixed here), but the lexer already contained other tokens which were only added in SPARQL 1.1, such as the aggregate functions (MIN, MAX etc.), which have already been in Pygments since the initial commit of the current history (6ded9db394). [1]: https://phabricator.wikimedia.org/T264175
* Fix LatexEmbeddedLexer (#1517)Eric Wieser2020-09-301-9/+42
| | | | | | | * Fix LatexEmbeddedLexer to not call the nested tokenizer piecewise * Reuse the existing do_insertions function * Add a test for the `LatexEmbeddedLexer`
* Add analyze_text to make make check happy. (#1549)Matthäus G. Chajdas2020-09-2326-6/+314
| | | | | | | | | | | | | | | | | | | * Add analyze_text to make make check happy. This also fixes a few small bugs: * Slash uses *.sla as the file ending, not *.sl * IDL has endelse, not elseelse * Improve various analyse_text methods. * Improve various analyse_text methods. * Make Perl less confident in presence of :=. * Improve brainfuck check to not parse the whole input. * Improve Unicon by matching \self, /self * Fix Ezhil not matching against the input text * Simplify Modula2::analyse_text.
* Grammar correction 'an generator' -> 'a generator'zkneupper2020-09-231-1/+1
|
* image formatter: find ttc fonts on MacGeorg Brandl2020-09-201-1/+2
| | | | fixes #1223
* python traceback lexer: fix custom exceptions without messageGeorg Brandl2020-09-191-1/+1
| | | | fixes #1548
* fix regression in JSON lexer, bump to 2.7.12.7.1Georg Brandl2020-09-172-3/+3
| | | | Fixes #1544
* Preparing 2.7.0 release.2.7.0Matthäus G. Chajdas2020-09-121-1/+1
|
* all: remove "u" string prefix (#1536)Georg Brandl2020-09-0837-1932/+1828
| | | | | | | | | | | * all: remove "u" string prefix * util: remove unirange Since Python 3.3, all builds are wide unicode compatible. * unistring: remove support for narrow-unicode builds which stopped being relevant with Python 3.3
* Fix a Windows/PyPy3 test failure (#1533)Kurt McKee2020-09-071-1/+8
| | | | | | | PyPy3 on Windows has a test failure in `test_cmdline:test_outfile()` when trying to unlink the temporary output file. The root cause is that `cmdline:inner_main()` does not explicitly close the file that it opens, and PyPy3 isn't auto-closing the file when `inner_main()` returns. This prevents the file from being unlinked, and the test case fails.
* fennel: fixup string regexGeorg Brandl2020-09-071-1/+1
|
* Avoid catastrophic backtracking.Phil Hagelberg2020-09-071-1/+1
| | | | as advised in https://github.com/pygments/pygments/pull/1535/files/f581f2892154e8e4ed673ab940abf8af43ebe66b#r484028618