| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
RawTokenLexer was broken until 2.7.4, so it seems pretty much unused,
and it led to tracebacks when the "raw" alias was used from some
markup that allows specifying a language alias.
We'll still keep the class for special usage as intended.
|
| |
|
| |
|
| |
|
|
|
|
| |
fixes #1638
|
|
|
|
| |
fixes #1637
|
| |
|
|
|
|
|
| |
Reason was a lookahead-only pattern which was included in the state
where the lookahead was transitioning to.
|
| |
|
|
|
|
|
| |
Mention the Mason fix, add use past tense (mostly) for CHANGES. Mention
the changes to CSS as well as this might affect various themes.
|
|
|
|
|
| |
This should fix the tests failing on PyPy. Eventually we'll need a more
robust solution for this.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Explicitly handle unclosed <script> and <style> tags which previously
would result in O(n^2) work to lex as Error tokens per character up to
the end of the line or end of file (whichever comes first).
Now we try lexing the rest of the line as Javascript/CSS if there's no
closing script/style tag. We recover on the next line in the root state
if there is a newline, otherwise just keep parsing as Javascript/CSS.
This is similar to how the error handling in lexer.py works except we
get Javascript or CSS tokens instead of Error tokens. And we get to the
end of the line much faster since we don't apply an O(n) regex for every
character in the line.
I added a new test suite for html lexer (there wasn't one except for
coverage in test_examplefiles.py) including a trivial happy-path case
and several cases around <script> and <style> fragments, including
regression coverage that fails on the old logic.
|
|
|
|
|
|
|
|
|
| |
* testing turtle prefix names where reference starts with number
* remove case insensitive flag from Turtle lexer
* use same end-of-string regex as in SPARQL and ShExC
* make example.ttl valid turtle
|
| |
|
|
|
| |
Added support for kotlin scripts.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The output from pygments.lexers.get_all_lexers() contains 'juttle'
twice in the aliases section for the Juttle lexer entry.
This could be reproduced using:
>>> from pygments.lexers import get_all_lexers
>>> lexers = get_all_lexers()
>>> {alias[0]: alias[1] for alias in lexers}.get('Juttle')
('juttle', 'juttle')
This patch fixes the duplicate entry and generates the associated
_mapping.py file.
Fixes: #1604
|
| |
|
| |
|
|
|
|
| |
Rust macros seem to fit more into the "magic function" category than into the "builtin" one.
|
|
|
|
|
|
| |
Rust macros end with a '!'. The word border (regex '\b') for such expressions is located before the '!' (e. g. "print\b!(...)"). The regex here used the suffix option, which added an r'\b' after each regex (e. g. r'print!\b'). Therefore, the supplied regular expressions didn't match the rust macros.
To fix this problem, the suffix is removed. As every macro ends with an '!' (which implicitely includes a word border before), it's not necessary anyway.
|
|
|
|
| |
'drop', 'Some', 'None', 'Ok' and 'Err' are types, not macros.
|
| |
|
| |
|
|
|
|
| |
fixes #1599
|
|
|
|
| |
fixes #1600
|
|
|
|
|
|
|
| |
* JavaLexer: Demonstrate a catastrophic backtracking bug
* JavaLexer: Fix a catastrophic backtracking bug
Closes #1586
|
|
|
|
| |
Previously, the tag was cut off.
|
|
|
|
|
|
|
|
|
| |
Previously, something like:
<%class>text</%class> would not get matched correctly. This was due to
the capturing group capturing the wrong part of the tag -- instead of
class, it would capture the part after class before >. With this commit,
the capturing group correctly matches the start/end tag. This commit
also adds a unit test to verify this.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
This seems to break some themes which were not expecting Pygments to
change margins, and it doesn't look like it makes a difference for
standalone Pygments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#1555)
* MySQL: Tokenize quoted schema object names, and escape characters, uniquely
Changes in this patch:
* Name.Quoted and Name.Quoted.Escape are introduced as non-standard tokens
* HTML and LaTeX formatters were confirmed to provide default formatting
if they encounter these two non-standard tokens. They also add style
classes based on the token name, like "n-Quoted" (HTML) or "nQuoted"
(LaTeX) so that users can add custom styles for these.
* Removed "\`" and "\\" as schema object name escapes. These are relics
of the previous regular expression for backtick-quoted names and are
not treated as escape sequences. The behavior was confirmed in the
MySQL documentation as well as by running queries in MySQL Workbench.
* Prevent "123abc" from being treated as an integer followed by a schema
object name. MySQL allows leading numbers in schema object names as long
as 0-9 are not the only characters in the schema object name.
* Add ~10 more unit tests to validate behavior.
Closes #1551
* Remove an end-of-line regex match that triggered a lint warning
Also, add tests that confirm correct behavior. No tests failed before
or after removing the '$' match in the regex, but now regexlint isn't
complaining.
Removing the '$' matching probably depends on the fact that Pygments
adds a newline at the end of the input text, so there is always something
after a bare integer literal.
|
|
|
|
|
|
| |
The some Ada reserved word is available since Ada 2012, it is used in the same context as the any keyword.
See RM 2.9 - Reserved Words https://www.adaic.org/resources/add_content/standards/12rm/html/RM-2-9.html for a list of keywords (with this inclusion, all are covered if I'm not mistaken)
and for usage example
See RM 4.5.8 - Quantified expressions https://www.adaic.org/resources/add_content/standards/12rm/html/RM-4-5-8.html
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Update the JSON-LD keyword list to match JSON-LD 1.1
Changes in this patch:
* Update the JSON-LD URL to HTTPS
* Update the list of JSON-LD keywords
* Make the JSON-LD parser less dependent on the JSON lexer implementation
* Add unit tests for the JSON-LD lexer
* Add unit tests for the JSON parser
This includes:
* Testing valid literals
* Testing valid string escapes
* Testing that object keys are tokenized differently from string values
* Rewrite the JSON lexer
Related to #1425
Included in this change:
* The JSON parser is rewritten
* The JSON bare object parser no longer requires additional code
* `get_tokens_unprocessed()` returns as much as it can to reduce yields
(for example, side-by-side punctuation is not returned separately)
* The unit tests were updated
* Add unit tests based on Hypothesis test results
* Reduce HTML formatter memory consumption by ~33% and speed it up
Related to #1425
Tested on a 118MB JSON file. Memory consumption tops out at ~3GB before
this patch and drops to only ~2GB with this patch. These were the command
lines used:
python -m pygments -l json -f html -o .\new-code-classes.html .\jc-output.txt
python -m pygments -l json -f html -O "noclasses" -o .\new-code-styles.html .\jc-output.txt
* Add an LRU cache to the HTML formatter's HTML-escaping and line-splitting
For a 118MB JSON input file, this reduces memory consumption by ~500MB
and reduces formatting time by ~15 seconds.
* JSON: Add a catastrophic backtracking test back to the test suite
* JSON: Update the comment that documents the internal queue
* JSON: Document in comments that ints/floats/constants are not validated
|
|
|
|
| |
Update CHANGES, bump version.
|
|
|
|
| |
This removes the top/bottom padding changes, and only keeps left/right
padding, in the hope that this does not break all Sphinx themes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* TNTLexer: Don't crash on unexpected EOL
Catch IndexErrors in each line and error the rest of the line,
leaving whatever tokens were found.
* Write and pass tests for Typographic Number Theory
pygments/lexers/tnt.py:
* Fix indentation on import
* Fix: TNTLexer.cur is class-level reference if not initialized
in get_tokens_unprocessed, so init it in __init__ too
* Fix: Fantasy markers are not allowed as components of other formulas,
so have a dedicated check for them in the body of get_tokens_unprocessed
which disables the normal formula handling if present
* Clarify TNTLexer.lineno docstring
* Attempt to discard tokens before an IndexError
+tests/test_tnt.py:
* Test every method, and test both +ve and -ve matches for most
* Lexer fixture is test-level to reinitialize cur clean each time
* Don't test actual get_tokens_unprocessed method (besides for fantasy markers)
because the text testing is left to examplefiles
AUTHORS:
+ Add myself to credits :)
* Add a TNT test just to make sure no crashes
|
| |
|
|
|
| |
Co-authored-by: Matthäus G. Chajdas <Anteru@users.noreply.github.com>
|
| |
|