delta/python-markdown.git/markdown/preprocessors.py, branch 3.3.2

Refactor HTML Parser (#803)

2020-09-22T14:42:17+00:00

The HTML parser has been completely replaced. The new HTML parser is built on Python's html.parser.HTMLParser, which alleviates various bugs and simplifies maintenance of the code.

The md_in_html extension has been rebuilt on the new HTML Parser, which drastically simplifies it. Note that raw HTML elements with a markdown attribute defined are now converted to ElementTree Elements and are rendered by the serializer. Various bugs have been fixed.

Link reference parsing, abbreviation reference parsing and footnote reference parsing has all been moved from preprocessors to blockprocessors, which allows them to be nested within other block level elements. Specifically, this change was necessary to maintain the current behavior in the rebuilt md_in_html extension. A few random edge-case bugs (see the included tests) were resolved in the process.

Closes #595, closes #780, closes #830 and closes #1012.

Drop support for Python 2.7 (#865)

2019-10-24T13:36:04+00:00

* Python syntax upgraded using `pyupgrade --py3-plus`
* Travis no longer uses `sudo`. See https://blog.travis-ci.com/2018-11-19-required-linux-infrastructure-migration

See #760 for Python Version Support Timeline and related dicussion.

Move isBlockLevel to class. (#693)

2018-07-31T18:12:49+00:00

Allows users and/or extensions to alter the list of block level 
elements. The old implementation remains with a DeprecationWarning. 
Fixes #575.

Consistent copyright headers.

2018-07-27T18:43:09+00:00

Fixes #435.

All Markdown instances are now 'md'. (#691)

2018-07-27T14:55:41+00:00

Previously, instances of the Markdown class were represented as any one 
of 'md', 'md_instance', or 'markdown'. This inconsistency made it 
difficult when developing extensions, or just maintaining the existing 
code. Now, all instances are consistently represented as 'md'.

The old attributes on class instances still exist, but raise a 
DeprecationWarning when accessed. Also on classes where the instance was 
optional, the attribute always exists now and is simply None if no 
instance was provided (previously the attribute wouldn't exist).

Replace homegrown OrderedDict with purpose-built Registry. (#688)

2018-07-27T14:23:55+00:00

All processors and patterns now get "registered" to a Registry.
Each item is given a name (string) and a priority. The name is for
later reference and the priority can be either an integer or float
and is used to sort. Priority is sorted from highest to lowest. A 
Registry instance is a list-like iterable with the items auto-sorted 
by priority. If two items have the same priority, then they are 
listed in the order there were "registered". Registering a new 
item with the same name as an already registered item replaces
the old item with the new item (however, the new item is sorted by
its newly assigned priority). To remove an item, "deregister" it by 
name or index.

A backwards compatible shim is included so that existing simple
extensions should continue to work. DeprecationWarnings will 
be raised for any code which calls the old API.

Fixes #418.

Correct spelling mistakes.

2018-01-13T16:42:50+00:00

Removed deprecated safe_mode.

2018-01-12T00:04:49+00:00

Fix raw html reference issue (#585)

2018-01-04T20:07:45+00:00

Preserve the line which a reference was on to prevent raw HTML indexing issue. Fixes #584.

Prevent raw HTML parsing issue in abbr and footnotes

Peserve abbreviation line when stripping and preserve a line for each footnote block.  Footnotes should also accumulate the extraneous padding.

Test extra lines at the end of references

Strip the gathered extraneous whitespace

When processing footnotes, we don't actually care to process the extra whitespace at the end of a footnote, but we want it to calculate lines to preserve.

Fix HTML parse with empty lines (#537)

2017-01-24T15:36:37+00:00

If both open and close was not found in first block, additional blocks
were evaluated without context of previous blocks.  The algorithm needs
to evaluate a buffer with the left bracket present.  So feed in all
items and get the right bracket, then adjust the data_index to be
relative to the last block. Fixes #452.