| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
fixes #1637
|
|
|
|
|
| |
This should fix the tests failing on PyPy. Eventually we'll need a more
robust solution for this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Explicitly handle unclosed <script> and <style> tags which previously
would result in O(n^2) work to lex as Error tokens per character up to
the end of the line or end of file (whichever comes first).
Now we try lexing the rest of the line as Javascript/CSS if there's no
closing script/style tag. We recover on the next line in the root state
if there is a newline, otherwise just keep parsing as Javascript/CSS.
This is similar to how the error handling in lexer.py works except we
get Javascript or CSS tokens instead of Error tokens. And we get to the
end of the line much faster since we don't apply an O(n) regex for every
character in the line.
I added a new test suite for html lexer (there wasn't one except for
coverage in test_examplefiles.py) including a trivial happy-path case
and several cases around <script> and <style> fragments, including
regression coverage that fails on the old logic.
|
|
|
|
|
|
|
|
|
| |
* testing turtle prefix names where reference starts with number
* remove case insensitive flag from Turtle lexer
* use same end-of-string regex as in SPARQL and ShExC
* make example.ttl valid turtle
|
|
|
|
|
|
|
| |
* JavaLexer: Demonstrate a catastrophic backtracking bug
* JavaLexer: Fix a catastrophic backtracking bug
Closes #1586
|
|
|
|
| |
Previously, the tag was cut off.
|
|
|
|
|
|
|
|
|
| |
Previously, something like:
<%class>text</%class> would not get matched correctly. This was due to
the capturing group capturing the wrong part of the tag -- instead of
class, it would capture the part after class before >. With this commit,
the capturing group correctly matches the start/end tag. This commit
also adds a unit test to verify this.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
This seems to break some themes which were not expecting Pygments to
change margins, and it doesn't look like it makes a difference for
standalone Pygments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#1555)
* MySQL: Tokenize quoted schema object names, and escape characters, uniquely
Changes in this patch:
* Name.Quoted and Name.Quoted.Escape are introduced as non-standard tokens
* HTML and LaTeX formatters were confirmed to provide default formatting
if they encounter these two non-standard tokens. They also add style
classes based on the token name, like "n-Quoted" (HTML) or "nQuoted"
(LaTeX) so that users can add custom styles for these.
* Removed "\`" and "\\" as schema object name escapes. These are relics
of the previous regular expression for backtick-quoted names and are
not treated as escape sequences. The behavior was confirmed in the
MySQL documentation as well as by running queries in MySQL Workbench.
* Prevent "123abc" from being treated as an integer followed by a schema
object name. MySQL allows leading numbers in schema object names as long
as 0-9 are not the only characters in the schema object name.
* Add ~10 more unit tests to validate behavior.
Closes #1551
* Remove an end-of-line regex match that triggered a lint warning
Also, add tests that confirm correct behavior. No tests failed before
or after removing the '$' match in the regex, but now regexlint isn't
complaining.
Removing the '$' matching probably depends on the fact that Pygments
adds a newline at the end of the input text, so there is always something
after a bare integer literal.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Update the JSON-LD keyword list to match JSON-LD 1.1
Changes in this patch:
* Update the JSON-LD URL to HTTPS
* Update the list of JSON-LD keywords
* Make the JSON-LD parser less dependent on the JSON lexer implementation
* Add unit tests for the JSON-LD lexer
* Add unit tests for the JSON parser
This includes:
* Testing valid literals
* Testing valid string escapes
* Testing that object keys are tokenized differently from string values
* Rewrite the JSON lexer
Related to #1425
Included in this change:
* The JSON parser is rewritten
* The JSON bare object parser no longer requires additional code
* `get_tokens_unprocessed()` returns as much as it can to reduce yields
(for example, side-by-side punctuation is not returned separately)
* The unit tests were updated
* Add unit tests based on Hypothesis test results
* Reduce HTML formatter memory consumption by ~33% and speed it up
Related to #1425
Tested on a 118MB JSON file. Memory consumption tops out at ~3GB before
this patch and drops to only ~2GB with this patch. These were the command
lines used:
python -m pygments -l json -f html -o .\new-code-classes.html .\jc-output.txt
python -m pygments -l json -f html -O "noclasses" -o .\new-code-styles.html .\jc-output.txt
* Add an LRU cache to the HTML formatter's HTML-escaping and line-splitting
For a 118MB JSON input file, this reduces memory consumption by ~500MB
and reduces formatting time by ~15 seconds.
* JSON: Add a catastrophic backtracking test back to the test suite
* JSON: Update the comment that documents the internal queue
* JSON: Document in comments that ints/floats/constants are not validated
|
|
|
|
| |
This removes the top/bottom padding changes, and only keeps left/right
padding, in the hope that this does not break all Sphinx themes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* TNTLexer: Don't crash on unexpected EOL
Catch IndexErrors in each line and error the rest of the line,
leaving whatever tokens were found.
* Write and pass tests for Typographic Number Theory
pygments/lexers/tnt.py:
* Fix indentation on import
* Fix: TNTLexer.cur is class-level reference if not initialized
in get_tokens_unprocessed, so init it in __init__ too
* Fix: Fantasy markers are not allowed as components of other formulas,
so have a dedicated check for them in the body of get_tokens_unprocessed
which disables the normal formula handling if present
* Clarify TNTLexer.lineno docstring
* Attempt to discard tokens before an IndexError
+tests/test_tnt.py:
* Test every method, and test both +ve and -ve matches for most
* Lexer fixture is test-level to reinitialize cur clean each time
* Don't test actual get_tokens_unprocessed method (besides for fantasy markers)
because the text testing is left to examplefiles
AUTHORS:
+ Add myself to credits :)
* Add a TNT test just to make sure no crashes
|
|
|
|
|
|
|
| |
* Fix LatexEmbeddedLexer to not call the nested tokenizer piecewise
* Reuse the existing do_insertions function
* Add a test for the `LatexEmbeddedLexer`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add analyze_text to make make check happy.
This also fixes a few small bugs:
* Slash uses *.sla as the file ending, not *.sl
* IDL has endelse, not elseelse
* Improve various analyse_text methods.
* Improve various analyse_text methods.
* Make Perl less confident in presence of :=.
* Improve brainfuck check to not parse the whole input.
* Improve Unicon by matching \self, /self
* Fix Ezhil not matching against the input text
* Simplify Modula2::analyse_text.
|
|
|
|
| |
fixes #1548
|
|
|
|
| |
Fixes #1544
|
|
|
|
|
|
|
|
|
|
|
| |
* all: remove "u" string prefix
* util: remove unirange
Since Python 3.3, all builds are wide unicode compatible.
* unistring: remove support for narrow-unicode builds
which stopped being relevant with Python 3.3
|
|
|
|
|
|
| |
Remove support for single-quoted strings.
Update fennelview example to latest version of library.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Rename the "Javascript" tests to reflect that they are for CoffeeScript
This change also modifies the module docstring to reflect the file's purpose.
* Overhaul the Javascript numeric literal parsing
Fixes #307
This patch contains the following changes:
* Adds 50+ unit tests for Javascript numeric literals
* Forces ASCII numbers for float literals (so, now reject `.୪`)
* Adds support for Javascript's BigInt notation (`100n`)
* Adds support for leading-zero-only octal notation (`0777`)
* Adds support for scientific notation with no significand (`1e10`)
Numeric literal parsing is based on information at:
* https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Grammar_and_types
* https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Overhaul the MySQL lexer
Fixes #975, #1063, #1453
Changes include:
Documentation
-------------
* Note in the lexer docstring that Oracle MySQL is the target syntax.
MariaDB syntax is not a target (though there is significant overlap).
Unit tests
----------
* Add 140 unit tests for MySQL.
Literals
--------
* Hexadecimal/binary/date/time/timestamp literals are supported.
* Integer mantissas are supported for scientific notation.
* In-string escapes are now tokenized properly.
* Support the "unknown" constant.
Comments
--------
* Optimizer hints are now supported, and keywords are
recognized and tokenized as preprocessor instructions.
* Remove nested multi-line comment support, which is no
longer supported in MySQL.
Variables
---------
* Support the '@' prefix for variable names.
* Lift restrictions on characters in unquoted variable names.
(MySQL does not impose a restriction on lead characters.)
* Support single/double/backtick-quoted variable names, including escapes.
* Support the '@@' prefix for system variable names.
* Support '?' as a variable so people can demonstrate prepared statements.
Keywords
--------
* Keyword / data type / function are now in a separate, auto-updating file.
* Support 25 additional data types (including spatial and JSON types).
* Support 460 additional MySQL keywords.
* Support 372 MySQL functions.
Explicit function support resolves a bug that causes non-function
items to be treated as functions simply because they have a trailing
opening parenthesis.
* Support exceptions for the 'SET' keyword, which is both a datatype and
a keyword depending on context.
Schema object names
-------------------
* Support Unicode in MySQL schema object names.
* Support parsing of backtick-quoted schema object name escapes.
(Escapes do not produce a distinct token type at this time.)
Operators
---------
* Remove non-operator characters from the list of operators.
* Remove non-punctuation characters from the list of punctuation.
* Cleanup items based on feedback
* Remove an unnecessary optional newline lookahead for single-line comments
|
| |
|
|
|
|
|
| |
This lexer is based on the PythonConsoleLexer and provides the ability
to highlight console input and output for PsySH, a developer console and
REPL for PHP. See https://psysh.org.
|
|
|
|
|
| |
* more explicitly define escape sequencies in JsonLexer (fix #1065)
* adding test coverage for #1065
|
|
|
|
|
|
|
|
|
| |
* Fixed guessing of CMake by header.
* Version number can have multiple digits.
* Tabs are handled as white space.
* Trailing comments are ignored.
* Cleaned up regex to detect CMake header.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* add lexer for pointless
* lexer docstring formatting
* added link to languages doc
* update authors
* update version
* added double string
* added upval keyword
* simplify ptls example code
* rename doubleString -> multiString
|
| |
|
|\
| |
| | |
Improve HTML formatter output.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With the previous changes, we started to emit one <pre> per line for
line numbers. This breaks for instance the Sphinx-RTD-Theme, which
expects the line numbers to be formatted the same way as the normal
content. This commit makes the following changes:
* Emit a single <pre> inside the linenos div
* Wrap individual lines into <span> as needed
* Update all tests
* Don't yield empty <span> elements when no style is specified
This also makes the .html test files look correct when looked at with a
browser, as there is no extra whitespace in them which needs stripping.
|
| | |
|
| |
| |
| |
| |
| | |
This is a manual merge as we don't want to pull in the documentation
change as part of this fix for a cleaner history.
|
| |
| |
| | |
Including tests and an example.promql file.
|
|/
|
|
|
| |
* Update for Csound 6.15.0
* Update comment
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add support for Setext-style headings in Markdown
* Improve inline code detection in Markdown
* Add support for indented code blocks in Markdown
* Improve italics & bold detection in Markdown
* Simplify italics & bold regexes in Markdown
* Add warning about possible unrecognized internal tags in Markdown
* Improve striktethrough detection in Markdown
* Small bugfix in Markdown
* Small bugfix in Markdown
* Small refactoring in Markdown
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add font and background colors to Style
* Move all styles to get_style_defs, add tests
* Remove hardcoded styles, add special lineno style
* Add styles for special line numbers in tables
* Update noclasses documentation
* Refactor linenos elements and styles, add tests
* Update AUTHORS
* Fix multiple CSS prefixes, add tests
|
|
|
|
|
|
|
|
|
|
|
| |
* Add support for PowerShell Remoting sessions
* Add test case for PowerShell Remoting sessions
* Make whitespace after prompt optional
* Fix test case containing backslashes
* Add test case for local PowerShell sessions
|
|
|
|
|
|
|
| |
* Add Arrow lexer
* Pass tests: raw string for regex
* Make requested changes
|
|
|
|
|
|
|
|
|
|
|
| |
The class looks like:
class class_identifier [#(param_decls)] [extends class_identifier #(params)];
...
endclass [: class_identifier]
Using the same Java convention of Keyword.Declaration and Name.Class.
Add a test_systemverilog_classes unit test to test_hdl.
|
|
|
| |
Co-authored-by: Bryton Hall <email@bryton.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Move SystemVerilog type keywords
Put them next to the generic keywords list.
* Change a couple SystemVerilog keywords to operators
The 'inside' and 'dist' keywords are described as operators in the
SystemVerilog standard, below unary increment/decrement, and above
concatenation in precedence.
See 1800-2017 tables 11-1 and 11-2 for a list of operators.
This matches the description of pygemnts Operator.Word token:
"For any operator that is a word (e.g. not)."
* Add a SystemVerilog operators unit test
Copy/paste the contents of 1800-2017 Table 11-2,
and see what the SV lexer chops it up into.
I made lots of comments for potential improvements.
Some operators, such as '[' and '.' are being labeled as punctuation.
Also, multi-character operators such as '<<<=' are being split up
into multiple, single-character tokens, eg '<' '<' '<' '='.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Added GDScript lexer
* Fix regular expressions in GDScript lexer
* Update GDScript lexer with the current version from Godot docs
* Add tests for GDScript lexer
* Update authors
* Add an example file for GDScript
* Implement analyze_text for GAP and GDScript
* Fix example file name in tests
* Update license
Co-authored-by: Daniel J. Ramirez <djrmuv@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the contents of these two unit tests are static.
Move things around so the entire test fits on a single page,
for better readability/maintainability.
Name the code part <TEST_NAME>_TEXT,
and the tokens part <TEST_NAME>_TOKENS.
Choosing "text" b/c it's the parameter name to the
lexer.get_tokens(text) method.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add lexer for Devicetree language
Signed-off-by: Maxime Chretien <maxime.chretien@bootlin.com>
* Devicetree lexer: fix random input test error
Signed-off-by: Maxime Chretien <maxime.chretien@bootlin.com>
* Devicetree lexer: fix example file reference
Signed-off-by: Maxime Chretien <maxime.chretien@bootlin.com>
* Devicetree lexer: Reduce example file size
Also add some missing language elements
Signed-off-by: Maxime Chretien <maxime.chretien@bootlin.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original implementation was missing some of the more arcane features
such as underbars, the character 's' for signed/unsigned, support for
spaces before/after the base specifier, capital letter base specifiers
(ie 'B 'D 'H), and the 4-state 'xXzZ?' characters.
For regular integers, the 'l' and 'L' suffixes are not valid.
That is, unlike C, in Verilog '42L' is not a valid int literal.
Create a new test that exercises most of the interesting kinds of
SystemVerilog numbers.
This fixes a couple minor issues with what type of number the lexer
returns. For example, Numbers like '42' used to return Integer.Hex,
but now return Integer.Decimal.
|
|
|
|
|
|
|
|
|
|
|
| |
* add support for .tid files (TiddlyWiki5)
* add lexers/_mapping.py
* markup.py: change versionadded of TiddlyWiki5Lexer to 2.7
* markup.py, TiddlyWiki5Lexer: use non-greedy matcher for table headers, footers, captions and classes
* markup.py, TiddlyWiki5Lexer: make timestamps of type Number.Integer
|