summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorWaylan Limberg <waylan.limberg@icloud.com>2020-09-22 10:42:17 -0400
committerGitHub <noreply@github.com>2020-09-22 10:42:17 -0400
commitb701c34ebd7b2d0eb319517b9a275ddf0c89608d (patch)
treeb79839201a337d38276f345595947b0a15a7567b
parent90e750b1f4fa8d150d7b5a4709858c786f2794dd (diff)
downloadpython-markdown-b701c34ebd7b2d0eb319517b9a275ddf0c89608d.tar.gz
Refactor HTML Parser (#803)
The HTML parser has been completely replaced. The new HTML parser is built on Python's html.parser.HTMLParser, which alleviates various bugs and simplifies maintenance of the code. The md_in_html extension has been rebuilt on the new HTML Parser, which drastically simplifies it. Note that raw HTML elements with a markdown attribute defined are now converted to ElementTree Elements and are rendered by the serializer. Various bugs have been fixed. Link reference parsing, abbreviation reference parsing and footnote reference parsing has all been moved from preprocessors to blockprocessors, which allows them to be nested within other block level elements. Specifically, this change was necessary to maintain the current behavior in the rebuilt md_in_html extension. A few random edge-case bugs (see the included tests) were resolved in the process. Closes #595, closes #780, closes #830 and closes #1012.
-rw-r--r--.spell-dict1
-rw-r--r--docs/change_log/release-3.3.md15
-rw-r--r--docs/extensions/md_in_html.md262
-rw-r--r--markdown/blockprocessors.py30
-rw-r--r--markdown/extensions/abbr.py49
-rw-r--r--markdown/extensions/footnotes.py164
-rw-r--r--markdown/extensions/md_in_html.py303
-rw-r--r--markdown/htmlparser.py202
-rw-r--r--markdown/postprocessors.py5
-rw-r--r--markdown/preprocessors.py300
-rw-r--r--tests/basic/inline-html-advanced.html12
-rw-r--r--tests/basic/inline-html-advanced.txt14
-rw-r--r--tests/basic/inline-html-comments.html11
-rw-r--r--tests/basic/inline-html-comments.txt13
-rw-r--r--tests/basic/inline-html-simple.html61
-rw-r--r--tests/basic/inline-html-simple.txt72
-rw-r--r--tests/extensions/extra/abbr.html4
-rw-r--r--tests/extensions/extra/abbr.txt13
-rw-r--r--tests/extensions/extra/raw-html.html11
-rw-r--r--tests/extensions/github_flavored.html1
-rw-r--r--tests/misc/ampersand.html2
-rw-r--r--tests/misc/ampersand.txt5
-rw-r--r--tests/misc/block_html5.html16
-rw-r--r--tests/misc/block_html5.txt14
-rw-r--r--tests/misc/block_html_attr.html27
-rw-r--r--tests/misc/block_html_attr.txt24
-rw-r--r--tests/misc/block_html_simple.html10
-rw-r--r--tests/misc/block_html_simple.txt9
-rw-r--r--tests/misc/comments.html9
-rw-r--r--tests/misc/comments.txt10
-rw-r--r--tests/misc/div.html10
-rw-r--r--tests/misc/div.txt11
-rw-r--r--tests/misc/html-comments.html2
-rw-r--r--tests/misc/html-comments.txt2
-rw-r--r--tests/misc/html.html29
-rw-r--r--tests/misc/html.txt40
-rw-r--r--tests/misc/markup-inside-p.html21
-rw-r--r--tests/misc/markup-inside-p.txt21
-rw-r--r--tests/misc/mismatched-tags.html14
-rw-r--r--tests/misc/mismatched-tags.txt9
-rw-r--r--tests/misc/more_comments.html8
-rw-r--r--tests/misc/more_comments.txt11
-rw-r--r--tests/misc/multi-line-tags.html13
-rw-r--r--tests/misc/multi-line-tags.txt13
-rw-r--r--tests/misc/multiline-comments.html37
-rw-r--r--tests/misc/multiline-comments.txt38
-rw-r--r--tests/misc/php.html11
-rw-r--r--tests/misc/php.txt13
-rw-r--r--tests/misc/pre.html13
-rw-r--r--tests/misc/pre.txt14
-rw-r--r--tests/misc/raw_whitespace.html8
-rw-r--r--tests/misc/raw_whitespace.txt10
-rw-r--r--tests/test_syntax/blocks/test_html_blocks.py1319
-rw-r--r--tests/test_syntax/extensions/test_abbr.py242
-rw-r--r--tests/test_syntax/extensions/test_footnotes.py243
-rw-r--r--tests/test_syntax/extensions/test_md_in_html.py764
-rw-r--r--tests/test_syntax/inline/test_links.py182
57 files changed, 3538 insertions, 1229 deletions
diff --git a/.spell-dict b/.spell-dict
index eed0f67..fbe4865 100644
--- a/.spell-dict
+++ b/.spell-dict
@@ -131,6 +131,7 @@ Treeprocessor
Treeprocessors
tuple
tuples
+unclosed
unescape
unescaping
unittest
diff --git a/docs/change_log/release-3.3.md b/docs/change_log/release-3.3.md
index 010f526..ab7a7c6 100644
--- a/docs/change_log/release-3.3.md
+++ b/docs/change_log/release-3.3.md
@@ -66,6 +66,21 @@ The following new features have been included in the 3.3 release:
Any random HTML attribute can be defined and set on the `<code>` tag of fenced code
blocks when the `attr_list` extension is enabled (#816).
+* The HTML parser has been completely replaced. The new HTML parser is built on Python's
+ [html.parser.HTMLParser](https://docs.python.org/3/library/html.parser.html), which
+ alleviates various bugs and simplify maintenance of the code (#803, #830).
+
+* The [Markdown in HTML](../extensions/md_in_html.md) extension has been rebuilt on the
+ new HTML Parser, which drastically simplifies it. Note that raw HTML elements with a
+ `markdown` attribute defined are now converted to ElementTree Elements and are rendered
+ by the serializer. Various bugs have been fixed (#803, #595, #780, and #1012).
+
+* Link reference parsing, abbreviation reference parsing and footnote reference parsing
+ has all been moved from `preprocessors` to `blockprocessors`, which allows them to be
+ nested within other block level elements. Specifically, this change was necessary to
+ maintain the current behavior in the rebuilt Markdown in HTML extension. A few random
+ edge-case bugs (see the included tests) were resolved in the process (#803).
+
## Bug fixes
The following bug fixes are included in the 3.3 release:
diff --git a/docs/extensions/md_in_html.md b/docs/extensions/md_in_html.md
index b57197b..ba4424b 100644
--- a/docs/extensions/md_in_html.md
+++ b/docs/extensions/md_in_html.md
@@ -4,122 +4,234 @@ title: Markdown in HTML Extension
## Summary
-An extensions that parses Markdown inside of HTML tags.
+An extension that parses Markdown inside of HTML tags.
-## Usage
+## Syntax
-From the Python interpreter:
+By default, Markdown ignores any content within a raw HTML block-level element. With the `md-in-html` extension
+enabled, the content of a raw HTML block-level element can be parsed as Markdown by including a `markdown` attribute
+on the opening tag. The `markdown` attribute will be stripped from the output, while all other attributes will be
+preserved.
-```pycon
->>> import markdown
->>> html = markdown.markdown(text, extensions=['md_in_html'])
-```
+The `markdown` attribute can be assigned one of three values: [`"1"`](#1), [`"block"`](#block), or [`"span"`](#span).
-Unlike the other Extra features, this feature is built into the markdown core and
-is turned on when `markdown.extensions.extra` or `markdown.extensions.md_in_html`
-is enabled.
+!!! note
-The content of any raw HTML block element can be Markdown-formatted simply by
-adding a `markdown` attribute to the opening tag. The markdown attribute will be
-stripped from the output, but all other attributes will be preserved.
+ The expressions "block-level" and "span-level" as used in this document refer to an element's designation
+ according to the HTML specification. Whereas the `"span"` and `"block"` values assigned to the `markdown`
+ attribute refer to the Markdown parser's behavior.
-If the markdown value is set to `1` (recommended) or any value other than `span`
-or `block`, the default behavior will be executed: `p`,`h[1-6]`,`li`,`dd`,`dt`,
-`td`,`th`,`legend`, and `address` elements skip block parsing while others do not.
-If the default is overridden by a value of `span`, *block parsing will be skipped*
-regardless of tag. If the default is overridden by a value of `block`,
-*block parsing will occur* regardless of tag.
+### `markdown="1"` { #1 }
-#### Simple Example:
+When the `markdown` attribute is set to `"1"`, then the parser will use the default behavior for that specific tag.
-```md
-This is *true* markdown text.
+The following tags have the `block` behavior by default: `address`, `article`, `aside`, `blockquote`, `body`,
+`colgroup`, `details`, `div`, `dl`, `fieldset`, `figcaption`, `figure`, `footer`, `form`, `iframe`, `header`, `hr`,
+`main`, `menu`, `nav`, `map`, `noscript`, `object`, `ol`, `section`, `table`, `tbody`, `thead`, `tfoot`, `tr`, and
+`ul`.
+For example, the following:
+
+```
<div markdown="1">
-This is *true* markdown text.
+This is a *Markdown* Paragraph.
</div>
```
-#### Result:
+... is rendered as:
-```html
-<p>This is <em>true</em> markdown text.</p>
+``` html
<div>
-<p>This is <em>true</em> markdown text.</p>
+<p>This is a <em>Markdown</em> Paragraph.</p>
</div>
```
-### Nested Markdown Inside HTML Blocks
+The following tags have the `span` behavior by default: `address`, `dd`, `dt`, `h[1-6]`, `legend`, `li`, `p`, `td`,
+and `th`.
-Nested elements are more sensitive and must be used cautiously. To avoid
-unexpected results:
+For example, the following:
-* Only nest elements within block mode elements.
-* Follow the closing tag of inner elements with a blank line.
-* Only have one level of nesting.
+```
+<p markdown="1">
+This is not a *Markdown* Paragraph.
+</p>
+```
-#### Complex Example:
+... is rendered as:
-```md
-<div markdown="1" name="Example">
+``` html
+<p>
+This is not a <em>Markdown</em> Paragraph.
+</p>
+```
-The text of the `Example` element.
+### `markdown="block"` { #block }
-<div markdown="1" name="DefaultBlockMode">
-This text gets wrapped in `p` tags.
-</div>
+When the `markdown` attribute is set to `"block"`, then the parser will force the `block` behavior on the contents of
+the element so long as it is one of the `block` or `span` tags.
-The tail of the `DefaultBlockMode` subelement.
+The content of a `block` element is parsed into block-level content. In other words, the text is rendered as
+paragraphs, headers, lists, blockquotes, etc. Any inline syntax within those elements is processed as well.
-<p markdown="1" name="DefaultSpanMode">
-This text *is not* wrapped in additional `p` tags.
-</p>
+For example, the following:
-The tail of the `DefaultSpanMode` subelement.
+```
+<section markdown="block">
+# A header.
-<div markdown="span" name="SpanModeOverride">
-This `div` block is not wrapped in paragraph tags.
-Note: Subelements are not required to have tail text.
-</div>
+A *Markdown* paragraph.
-<p markdown="block" name="BlockModeOverride">
-This `p` block *is* foolishly wrapped in further paragraph tags.
-</p>
+* A list item.
+* A second list item.
-The tail of the `BlockModeOverride` subelement.
+</section>
+```
+
+... is rendered as:
+
+``` html
+<section>
+<h1>A header.</h1>
+<p>A <em>Markdown</em> paragraph.</p>
+<ul>
+<li>A list item.</li>
+<li>A second list item.</li>
+</ul>
+</section>
+```
+
+!!! warning
+
+ Forcing elements to be parsed as `block` elements when they are not by default could result in invalid HTML.
+ For example, one could force a `<p>` element to be nested within another `<p>` element. In most cases it is
+ recommended to use the default behavior of `markdown="1"`. Explicitly setting `markdown="block"` should be
+ reserved for advanced users who understand the HTML specification and how browsers parse and render HTML.
+
+### `markdown="span"` { #span }
+
+When the `markdown` attribute is set to `"span"`, then the parser will force the `span` behavior on the contents
+of the element so long as it is one of the `block` or `span` tags.
+
+The content of a `span` element is not parsed into block-level content. In other words, the content will not be
+rendered as paragraphs, headers, etc. Only inline syntax will be rendered, such as links, strong, emphasis, etc.
+
+For example, the following:
-<div name="RawHtml">
-Raw HTML blocks may also be nested.
+```
+<div markdown="span">
+# *Not* a header
</div>
+```
+... is rendered as:
+
+``` html
+<div>
+# <em>Not</em> a header
</div>
+```
+
+### Ignored Elements
+
+The following tags are always ignored, regardless of any `markdown` attribute: `canvas`, `math`, `option`, `pre`,
+`script`, `style`, and `textarea`. All other raw HTML tags are treated as span-level tags and are not affected by this
+extension.
+
+### Nesting
-This text is after the markdown in HTML.
+When nesting multiple levels of raw HTML elements, a `markdown` attribute must be defined for each block-level
+element. For any block-level element which does not have a `markdown` attribute, everything inside that element is
+ignored, including child elements with `markdown` attributes.
+
+For example, the following:
+
+```
+<article id="my-article" markdown="1">
+# Article Title
+
+A Markdown paragraph.
+
+<section id="section-1" markdown="1">
+## Section 1 Title
+
+<p>Custom raw **HTML** which gets ignored.</p>
+
+</section>
+
+<section id="section-2" markdown="1">
+## Section 2 Title
+
+<p markdown="1">**Markdown** content.</p>
+
+</section>
+
+</article>
```
-#### Complex Result:
+... is rendered as:
```html
-<div name="Example">
-<p>The text of the <code>Example</code> element.</p>
-<div name="DefaultBlockMode">
-<p>This text gets wrapped in <code>p</code> tags.</p>
+<article id="my-article">
+<h1>Article Title</h1>
+<p>A Markdown paragraph.</p>
+<section id="section-1">
+<h2>Section 1 Title</h2>
+<p>Custom raw **HTML** which gets ignored.</p>
+</section>
+<section id="section-2">
+<h2>Section 2 Title</h2>
+<p><strong>Markdown</strong> content.</p>
+</section>
+</article>
+```
+
+When the value of an element's `markdown` attribute is more permissive that its parent, then the parent's stricter
+behavior is enforced. For example, a `block` element nested within a `span` element will be parsed using the `span`
+behavior. However, if the value of an element's `markdown` attribute is the same as, or more restrictive than, its
+parent, the the child element's behavior is observed. For example, a `block` element may contain either `block`
+elements or `span` elements as children and each element will be parsed using the specified behavior.
+
+### Tag Normalization
+
+While the default behavior is for Markdown to not alter raw HTML, as this extension is parsing the content of raw HTML elements, it will do some normalization of the tags of block-level elements. For example, the following raw HTML:
+
+```
+<div markdown="1">
+<p markdown="1">A Markdown paragraph with *no* closing tag.
+<p>A raw paragraph with *no* closing tag.
</div>
-<p>The tail of the <code>DefaultBlockMode</code> subelement.</p>
-<p name="DefaultSpanMode">
-This text <em>is not</em> wrapped in additional <code>p</code> tags.</p>
-<p>The tail of the <code>DefaultSpanMode</code> subelement.</p>
-<div name="SpanModeOverride">
-This <code>div</code> block is not wrapped in paragraph tags.
-Note: Subelements are not required to have tail text.</div>
-<p name="BlockModeOverride">
-<p>This <code>p</code> block <em>is</em> foolishly wrapped in further paragraph tags.</p>
+```
+
+... is rendered as:
+
+``` html
+<div>
+<p>A Markdown paragraph with <em>no</em> closing tag.
+</p>
+<p>A raw paragraph with *no* closing tag.
</p>
-<p>The tail of the <code>BlockModeOverride</code> subelement.</p>
-<div name="RawHtml">
-Raw HTML blocks may also be nested.
</div>
+```
-</div>
-<p>This text is after the markdown in HTML.</p>
+Notice that the parser properly recognizes that an unclosed `<p>` tag ends when another `<p>` tag begins or when the
+parent element ends. In both cases, a closing `</p>` was added to the end of the element, regardless of whether a
+`markdown` attribute was assigned to the element.
+
+To avoid any normalization, an element must not be a descendant of any block-level element which has a `markdown`
+attribute defined.
+
+!!! warning
+
+ The normalization behavior is only documented here so that document authors are not surprised when their carefully
+ crafted raw HTML is altered by Markdown. This extension should not be relied on to normalize and generate valid
+ HTML. For the best results, always include valid raw HTML (with both opening and closing tags) in your Markdown
+ documents.
+
+## Usage
+
+From the Python interpreter:
+
+``` pycon
+>>> import markdown
+>>> html = markdown.markdown(text, extensions=['md_in_html'])
```
diff --git a/markdown/blockprocessors.py b/markdown/blockprocessors.py
index e81f83c..742f174 100644
--- a/markdown/blockprocessors.py
+++ b/markdown/blockprocessors.py
@@ -51,6 +51,7 @@ def build_block_parser(md, **kwargs):
parser.blockprocessors.register(OListProcessor(parser), 'olist', 40)
parser.blockprocessors.register(UListProcessor(parser), 'ulist', 30)
parser.blockprocessors.register(BlockQuoteProcessor(parser), 'quote', 20)
+ parser.blockprocessors.register(ReferenceProcessor(parser), 'reference', 15)
parser.blockprocessors.register(ParagraphProcessor(parser), 'paragraph', 10)
return parser
@@ -554,6 +555,35 @@ class EmptyBlockProcessor(BlockProcessor):
)
+class ReferenceProcessor(BlockProcessor):
+ """ Process link references. """
+ RE = re.compile(
+ r'^[ ]{0,3}\[([^\]]*)\]:[ ]*\n?[ ]*([^\s]+)[ ]*\n?[ ]*((["\'])(.*)\4|\((.*)\))?[ ]*$', re.MULTILINE
+ )
+
+ def test(self, parent, block):
+ return True
+
+ def run(self, parent, blocks):
+ block = blocks.pop(0)
+ m = self.RE.search(block)
+ if m:
+ id = m.group(1).strip().lower()
+ link = m.group(2).lstrip('<').rstrip('>')
+ title = m.group(5) or m.group(6)
+ self.parser.md.references[id] = (link, title)
+ if block[m.end():].strip():
+ # Add any content after match back to blocks as separate block
+ blocks.insert(0, block[m.end():].lstrip('\n'))
+ if block[:m.start()].strip():
+ # Add any content before match back to blocks as separate block
+ blocks.insert(0, block[:m.start()].rstrip('\n'))
+ return True
+ # No match. Restore block.
+ blocks.insert(0, block)
+ return False
+
+
class ParagraphProcessor(BlockProcessor):
""" Process Paragraph blocks. """
diff --git a/markdown/extensions/abbr.py b/markdown/extensions/abbr.py
index b53f2c4..9879314 100644
--- a/markdown/extensions/abbr.py
+++ b/markdown/extensions/abbr.py
@@ -17,48 +17,53 @@ License: [BSD](https://opensource.org/licenses/bsd-license.php)
'''
from . import Extension
-from ..preprocessors import Preprocessor
+from ..blockprocessors import BlockProcessor
from ..inlinepatterns import InlineProcessor
from ..util import AtomicString
import re
import xml.etree.ElementTree as etree
-# Global Vars
-ABBR_REF_RE = re.compile(r'[*]\[(?P<abbr>[^\]]*)\][ ]?:\s*(?P<title>.*)')
-
class AbbrExtension(Extension):
""" Abbreviation Extension for Python-Markdown. """
def extendMarkdown(self, md):
""" Insert AbbrPreprocessor before ReferencePreprocessor. """
- md.preprocessors.register(AbbrPreprocessor(md), 'abbr', 12)
+ md.parser.blockprocessors.register(AbbrPreprocessor(md.parser), 'abbr', 16)
-class AbbrPreprocessor(Preprocessor):
+class AbbrPreprocessor(BlockProcessor):
""" Abbreviation Preprocessor - parse text for abbr references. """
- def run(self, lines):
+ RE = re.compile(r'^[*]\[(?P<abbr>[^\]]*)\][ ]?:[ ]*\n?[ ]*(?P<title>.*)$', re.MULTILINE)
+
+ def test(self, parent, block):
+ return True
+
+ def run(self, parent, blocks):
'''
Find and remove all Abbreviation references from the text.
Each reference is set as a new AbbrPattern in the markdown instance.
'''
- new_text = []
- for line in lines:
- m = ABBR_REF_RE.match(line)
- if m:
- abbr = m.group('abbr').strip()
- title = m.group('title').strip()
- self.md.inlinePatterns.register(
- AbbrInlineProcessor(self._generate_pattern(abbr), title), 'abbr-%s' % abbr, 2
- )
- # Preserve the line to prevent raw HTML indexing issue.
- # https://github.com/Python-Markdown/markdown/issues/584
- new_text.append('')
- else:
- new_text.append(line)
- return new_text
+ block = blocks.pop(0)
+ m = self.RE.search(block)
+ if m:
+ abbr = m.group('abbr').strip()
+ title = m.group('title').strip()
+ self.parser.md.inlinePatterns.register(
+ AbbrInlineProcessor(self._generate_pattern(abbr), title), 'abbr-%s' % abbr, 2
+ )
+ if block[m.end():].strip():
+ # Add any content after match back to blocks as separate block
+ blocks.insert(0, block[m.end():].lstrip('\n'))
+ if block[:m.start()].strip():
+ # Add any content before match back to blocks as separate block
+ blocks.insert(0, block[:m.start()].rstrip('\n'))
+ return True
+ # No match. Restore block.
+ blocks.insert(0, block)
+ return False
def _generate_pattern(self, text):
'''
diff --git a/markdown/extensions/footnotes.py b/markdown/extensions/footnotes.py
index beab919..f6f4c85 100644
--- a/markdown/extensions/footnotes.py
+++ b/markdown/extensions/footnotes.py
@@ -14,7 +14,7 @@ License: [BSD](https://opensource.org/licenses/bsd-license.php)
"""
from . import Extension
-from ..preprocessors import Preprocessor
+from ..blockprocessors import BlockProcessor
from ..inlinepatterns import InlineProcessor
from ..treeprocessors import Treeprocessor
from ..postprocessors import Postprocessor
@@ -26,8 +26,6 @@ import xml.etree.ElementTree as etree
FN_BACKLINK_TEXT = util.STX + "zz1337820767766393qq" + util.ETX
NBSP_PLACEHOLDER = util.STX + "qq3936677670287331zz" + util.ETX
-DEF_RE = re.compile(r'[ ]{0,3}\[\^([^\]]*)\]:\s*(.*)')
-TABBED_RE = re.compile(r'((\t)|( ))(.*)')
RE_REF_ID = re.compile(r'(fnref)(\d+)')
@@ -72,8 +70,8 @@ class FootnoteExtension(Extension):
md.registerExtension(self)
self.parser = md.parser
self.md = md
- # Insert a preprocessor before ReferencePreprocessor
- md.preprocessors.register(FootnotePreprocessor(self), 'footnote', 15)
+ # Insert a blockprocessor before ReferencePreprocessor
+ md.parser.blockprocessors.register(FootnoteBlockProcessor(self), 'footnote', 17)
# Insert an inline pattern before ImageReferencePattern
FOOTNOTE_RE = r'\[\^([^\]]*)\]' # blah blah [^1] blah
@@ -202,106 +200,92 @@ class FootnoteExtension(Extension):
return div
-class FootnotePreprocessor(Preprocessor):
+class FootnoteBlockProcessor(BlockProcessor):
""" Find all footnote references and store for later use. """
+ RE = re.compile(r'^[ ]{0,3}\[\^([^\]]*)\]:[ ]*(.*)$', re.MULTILINE)
+
def __init__(self, footnotes):
+ super().__init__(footnotes.parser)
self.footnotes = footnotes
- def run(self, lines):
- """
- Loop through lines and find, set, and remove footnote definitions.
-
- Keywords:
-
- * lines: A list of lines of text
-
- Return: A list of lines of text with footnote definitions removed.
-
- """
- newlines = []
- i = 0
- while True:
- m = DEF_RE.match(lines[i])
- if m:
- fn, _i = self.detectTabbed(lines[i+1:])
- fn.insert(0, m.group(2))
- i += _i-1 # skip past footnote
- footnote = "\n".join(fn)
- self.footnotes.setFootnote(m.group(1), footnote.rstrip())
- # Preserve a line for each block to prevent raw HTML indexing issue.
- # https://github.com/Python-Markdown/markdown/issues/584
- num_blocks = (len(footnote.split('\n\n')) * 2)
- newlines.extend([''] * (num_blocks))
+ def test(self, parent, block):
+ return True
+
+ def run(self, parent, blocks):
+ """ Find, set, and remove footnote definitions. """
+ block = blocks.pop(0)
+ m = self.RE.search(block)
+ if m:
+ id = m.group(1)
+ fn_blocks = [m.group(2)]
+
+ # Handle rest of block
+ therest = block[m.end():].lstrip('\n')
+ m2 = self.RE.search(therest)
+ if m2:
+ # Another footnote exists in the rest of this block.
+ # Any content before match is continuation of this footnote, which may be lazily indented.
+ before = therest[:m2.start()].rstrip('\n')
+ fn_blocks[0] = '\n'.join([fn_blocks[0], self.detab(before)]).lstrip('\n')
+ # Add back to blocks everything from begining of match forward for next iteration.
+ blocks.insert(0, therest[m2.start():])
else:
- newlines.append(lines[i])
- if len(lines) > i+1:
- i += 1
- else:
- break
- return newlines
+ # All remaining lines of block are continuation of this footnote, which may be lazily indented.
+ fn_blocks[0] = '\n'.join([fn_blocks[0], self.detab(therest)]).strip('\n')
- def detectTabbed(self, lines):
- """ Find indented text and remove indent before further proccesing.
+ # Check for child elements in remaining blocks.
+ fn_blocks.extend(self.detectTabbed(blocks))
- Keyword arguments:
+ footnote = "\n\n".join(fn_blocks)
+ self.footnotes.setFootnote(id, footnote.rstrip())
- * lines: an array of strings
+ if block[:m.start()].strip():
+ # Add any content before match back to blocks as separate block
+ blocks.insert(0, block[:m.start()].rstrip('\n'))
+ return True
+ # No match. Restore block.
+ blocks.insert(0, block)
+ return False
- Returns: a list of post processed items and the index of last line.
+ def detectTabbed(self, blocks):
+ """ Find indented text and remove indent before further proccesing.
+ Returns: a list of blocks with indentation removed.
"""
- items = []
- blank_line = False # have we encountered a blank line yet?
- i = 0 # to keep track of where we are
-
- def detab(line):
- match = TABBED_RE.match(line)
- if match:
- return match.group(4)
-
- for line in lines:
- if line.strip(): # Non-blank line
- detabbed_line = detab(line)
- if detabbed_line:
- items.append(detabbed_line)
- i += 1
- continue
- elif not blank_line and not DEF_RE.match(line):
- # not tabbed but still part of first par.
- items.append(line)
- i += 1
- continue
- else:
- return items, i+1
-
- else: # Blank line: _maybe_ we are done.
- blank_line = True
- i += 1 # advance
-
- # Find the next non-blank line
- for j in range(i, len(lines)):
- if lines[j].strip():
- next_line = lines[j]
- break
- else:
- # Include extreaneous padding to prevent raw HTML
- # parsing issue: https://github.com/Python-Markdown/markdown/issues/584
- items.append("")
- i += 1
+ fn_blocks = []
+ while blocks:
+ if blocks[0].startswith(' '*4):
+ block = blocks.pop(0)
+ # Check for new footnotes within this block and split at new footnote.
+ m = self.RE.search(block)
+ if m:
+ # Another footnote exists in this block.
+ # Any content before match is continuation of this footnote, which may be lazily indented.
+ before = block[:m.start()].rstrip('\n')
+ fn_blocks.append(self.detab(before))
+ # Add back to blocks everything from begining of match forward for next iteration.
+ blocks.insert(0, block[m.start():])
+ # End of this footnote.
+ break
else:
- break # There is no more text; we are done.
+ # Entire block is part of this footnote.
+ fn_blocks.append(self.detab(block))
+ else:
+ # End of this footnote.
+ break
+ return fn_blocks
- # Check if the next non-blank line is tabbed
- if detab(next_line): # Yes, more work to do.
- items.append("")
- continue
- else:
- break # No, we are done.
- else:
- i += 1
+ def detab(self, block):
+ """ Remove one level of indent from a block.
- return items, i
+ Preserve lazily indented blocks by only removing indent from indented lines.
+ """
+ lines = block.split('\n')
+ for i, line in enumerate(lines):
+ if line.startswith(' '*4):
+ lines[i] = line[4:]
+ return '\n'.join(lines)
class FootnoteInlineProcessor(InlineProcessor):
diff --git a/markdown/extensions/md_in_html.py b/markdown/extensions/md_in_html.py
index 500c166..3518d05 100644
--- a/markdown/extensions/md_in_html.py
+++ b/markdown/extensions/md_in_html.py
@@ -16,68 +16,251 @@ License: [BSD](https://opensource.org/licenses/bsd-license.php)
from . import Extension
from ..blockprocessors import BlockProcessor
+from ..preprocessors import Preprocessor
from .. import util
-import re
+from ..htmlparser import HTMLExtractor
import xml.etree.ElementTree as etree
+# Block-level tags in which the content only gets span level parsing
+span_tags = ['address', 'dd', 'dt', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'legend', 'li', 'p', 'td', 'th']
+
+# Block-level tags in which the content gets parsed as blocks
+block_tags = [
+ 'address', 'article', 'aside', 'blockquote', 'body', 'colgroup', 'details', 'div', 'dl', 'fieldset',
+ 'figcaption', 'figure', 'footer', 'form', 'iframe', 'header', 'hr', 'main', 'menu', 'nav', 'map',
+ 'noscript', 'object', 'ol', 'section', 'table', 'tbody', 'thead', 'tfoot', 'tr', 'ul'
+]
+
+# Block-level tags which never get their content parsed.
+raw_tags = ['canvas', 'math', 'option', 'pre', 'script', 'style', 'textarea']
+
+block_level_tags = span_tags + block_tags + raw_tags
+
+
+class HTMLExtractorExtra(HTMLExtractor):
+ """
+ Override HTMLExtractor and create etree Elements for any elements which should have content parsed as Markdown.
+ """
+
+ def reset(self):
+ """Reset this instance. Loses all unprocessed data."""
+ self.mdstack = [] # When markdown=1, stack contains a list of tags
+ self.treebuilder = etree.TreeBuilder()
+ self.mdstate = [] # one of 'block', 'span', 'off', or None
+ super().reset()
+
+ def close(self):
+ """Handle any buffered data."""
+ super().close()
+ # Handle any unclosed tags.
+ if self.mdstack:
+ # Close the outermost parent. handle_endtag will close all unclosed children.
+ self.handle_endtag(self.mdstack[0])
+
+ def get_element(self):
+ """ Return element from treebuilder and reset treebuilder for later use. """
+ element = self.treebuilder.close()
+ self.treebuilder = etree.TreeBuilder()
+ return element
+
+ def get_state(self, tag, attrs):
+ """ Return state from tag and `markdown` attr. One of 'block', 'span', or 'off'. """
+ md_attr = attrs.get('markdown', '0')
+ if md_attr == 'markdown':
+ # `<tag markdown>` is the same as `<tag markdown='1'>`.
+ md_attr = '1'
+ parent_state = self.mdstate[-1] if self.mdstate else None
+ if parent_state == 'off' or (parent_state == 'span' and md_attr != '0'):
+ # Only use the parent state if it is more restrictive than the markdown attribute.
+ md_attr = parent_state
+ if ((md_attr == '1' and tag in block_tags) or
+ (md_attr == 'block' and tag in span_tags + block_tags)):
+ return 'block'
+ elif ((md_attr == '1' and tag in span_tags) or
+ (md_attr == 'span' and tag in span_tags + block_tags)):
+ return 'span'
+ elif tag in block_level_tags:
+ return 'off'
+ else: # pragma: no cover
+ return None
+
+ def handle_starttag(self, tag, attrs):
+ if tag in block_level_tags:
+ # Valueless attr (ex: `<tag checked>`) results in `[('checked', None)]`.
+ # Convert to `{'checked': 'checked'}`.
+ attrs = {key: value if value is not None else key for key, value in attrs}
+ state = self.get_state(tag, attrs)
+
+ if self.inraw or (state in [None, 'off'] and not self.mdstack):
+ # fall back to default behavior
+ attrs.pop('markdown', None)
+ super().handle_starttag(tag, attrs)
+ else:
+ if 'p' in self.mdstack and tag in block_level_tags:
+ # Close unclosed 'p' tag
+ self.handle_endtag('p')
+ self.mdstate.append(state)
+ self.mdstack.append(tag)
+ attrs['markdown'] = state
+ self.treebuilder.start(tag, attrs)
+ else:
+ # Span level tag
+ if self.inraw:
+ super().handle_starttag(tag, attrs)
+ else:
+ text = self.get_starttag_text()
+ self.handle_data(text)
+
+ def handle_endtag(self, tag):
+ if tag in block_level_tags:
+ if self.inraw:
+ super().handle_endtag(tag)
+ elif tag in self.mdstack:
+ # Close element and any unclosed children
+ while self.mdstack:
+ item = self.mdstack.pop()
+ self.mdstate.pop()
+ self.treebuilder.end(item)
+ if item == tag:
+ break
+ if not self.mdstack:
+ # Last item in stack is closed. Stash it
+ element = self.get_element()
+ self.cleandoc.append(self.md.htmlStash.store(element))
+ self.cleandoc.append('\n\n')
+ self.state = []
+ else:
+ # Treat orphan closing tag as a span level tag.
+ text = self.get_endtag_text(tag)
+ self.handle_data(text)
+ else:
+ # Span level tag
+ if self.inraw:
+ super().handle_endtag(tag)
+ else:
+ text = self.get_endtag_text(tag)
+ self.handle_data(text)
+
+ def handle_data(self, data):
+ if self.inraw or not self.mdstack:
+ super().handle_data(data)
+ else:
+ self.treebuilder.data(data)
+
+ def handle_empty_tag(self, data, is_block):
+ if self.inraw or not self.mdstack:
+ super().handle_empty_tag(data, is_block)
+ else:
+ if self.at_line_start() and is_block:
+ self.handle_data('\n' + self.md.htmlStash.store(data) + '\n\n')
+ else:
+ self.handle_data(data)
+
+
+class HtmlBlockPreprocessor(Preprocessor):
+ """Remove html blocks from the text and store them for later retrieval."""
+
+ def run(self, lines):
+ source = '\n'.join(lines)
+ parser = HTMLExtractorExtra(self.md)
+ parser.feed(source)
+ parser.close()
+ return ''.join(parser.cleandoc).split('\n')
+
+
class MarkdownInHtmlProcessor(BlockProcessor):
- """Process Markdown Inside HTML Blocks."""
+ """Process Markdown Inside HTML Blocks which have been stored in the HtmlStash."""
+
def test(self, parent, block):
- return block == util.TAG_PLACEHOLDER % \
- str(self.parser.blockprocessors.tag_counter + 1)
-
- def _process_nests(self, element, block):
- """Process the element's child elements in self.run."""
- # Build list of indexes of each nest within the parent element.
- nest_index = [] # a list of tuples: (left index, right index)
- i = self.parser.blockprocessors.tag_counter + 1
- while len(self._tag_data) > i and self._tag_data[i]['left_index']:
- left_child_index = self._tag_data[i]['left_index']
- right_child_index = self._tag_data[i]['right_index']
- nest_index.append((left_child_index - 1, right_child_index))
- i += 1
-
- # Create each nest subelement.
- for i, (left_index, right_index) in enumerate(nest_index[:-1]):
- self.run(element, block[left_index:right_index],
- block[right_index:nest_index[i + 1][0]], True)
- self.run(element, block[nest_index[-1][0]:nest_index[-1][1]], # last
- block[nest_index[-1][1]:], True) # nest
-
- def run(self, parent, blocks, tail=None, nest=False):
- self._tag_data = self.parser.md.htmlStash.tag_data
-
- self.parser.blockprocessors.tag_counter += 1
- tag = self._tag_data[self.parser.blockprocessors.tag_counter]
-
- # Create Element
- markdown_value = tag['attrs'].pop('markdown')
- element = etree.SubElement(parent, tag['tag'], tag['attrs'])
-
- # Slice Off Block
- if nest:
- self.parser.parseBlocks(parent, tail) # Process Tail
- block = blocks[1:]
- else: # includes nests since a third level of nesting isn't supported
- block = blocks[tag['left_index'] + 1: tag['right_index']]
- del blocks[:tag['right_index']]
-
- # Process Text
- if (self.parser.blockprocessors.contain_span_tags.match( # Span Mode
- tag['tag']) and markdown_value != 'block') or \
- markdown_value == 'span':
- element.text = '\n'.join(block)
- else: # Block Mode
- i = self.parser.blockprocessors.tag_counter + 1
- if len(self._tag_data) > i and self._tag_data[i]['left_index']:
- first_subelement_index = self._tag_data[i]['left_index'] - 1
- self.parser.parseBlocks(
- element, block[:first_subelement_index])
- if not nest:
- block = self._process_nests(element, block)
- else:
- self.parser.parseBlocks(element, block)
+ # ALways return True. `run` will return `False` it not a valid match.
+ return True
+
+ def parse_element_content(self, element):
+ """
+ Resursively parse the text content of an etree Element as Markdown.
+
+ Any block level elements generated from the Markdown will be inserted as children of the element in place
+ of the text content. All `markdown` attributes are removed. For any elements in which Markdown parsing has
+ been dissabled, the text content of it and its chidlren are wrapped in an `AtomicString`.
+ """
+
+ md_attr = element.attrib.pop('markdown', 'off')
+
+ if md_attr == 'block':
+ # Parse content as block level
+ # The order in which the different parts are parsed (text, children, tails) is important here as the
+ # order of elements needs to be preserved. We can't be inserting items at a later point in the current
+ # iteration as we don't want to do raw processing on elements created from parsing Markdown text (for
+ # example). Therefore, the order of operations is children, tails, text.
+
+ # Recursively parse existing children from raw HTML
+ for child in list(element):
+ self.parse_element_content(child)
+
+ # Parse Markdown text in tail of children. Do this seperate to avoid raw HTML parsing.
+ # Save the position of each item to be inserted later in reverse.
+ tails = []
+ for pos, child in enumerate(element):
+ if child.tail:
+ block = child.tail.rstrip('\n')
+ child.tail = ''
+ # Use a dummy placeholder element.
+ dummy = etree.Element('div')
+ self.parser.parseBlocks(dummy, block.split('\n\n'))
+ children = list(dummy)
+ children.reverse()
+ tails.append((pos + 1, children))
+
+ # Insert the elements created from the tails in reverse.
+ tails.reverse()
+ for pos, tail in tails:
+ for item in tail:
+ element.insert(pos, item)
+
+ # Parse Markdown text content. Do this last to avoid raw HTML parsing.
+ if element.text:
+ block = element.text.rstrip('\n')
+ element.text = ''
+ # Use a dummy placeholder element as the content needs to get inserted before existing children.
+ dummy = etree.Element('div')
+ self.parser.parseBlocks(dummy, block.split('\n\n'))
+ children = list(dummy)
+ children.reverse()
+ for child in children:
+ element.insert(0, child)
+
+ elif md_attr == 'span':
+ # Span level parsing will be handled by inlineprocessors.
+ # Walk children here to remove any `markdown` attributes.
+ for child in list(element):
+ self.parse_element_content(child)
+
+ else:
+ # Disable inline parsing for everything else
+ element.text = util.AtomicString(element.text)
+ for child in list(element):
+ self.parse_element_content(child)
+ if child.tail:
+ child.tail = util.AtomicString(child.tail)
+
+ def run(self, parent, blocks):
+ m = util.HTML_PLACEHOLDER_RE.match(blocks[0])
+ if m:
+ index = int(m.group(1))
+ element = self.parser.md.htmlStash.rawHtmlBlocks[index]
+ if isinstance(element, etree.Element):
+ # We have a matched element. Process it.
+ blocks.pop(0)
+ self.parse_element_content(element)
+ parent.append(element)
+ # Cleanup stash. Replace element with empty string to avoid confusing postprocessor.
+ self.parser.md.htmlStash.rawHtmlBlocks.pop(index)
+ self.parser.md.htmlStash.rawHtmlBlocks.insert(index, '')
+ # Comfirm the match to the blockparser.
+ return True
+ # No match found.
+ return False
class MarkdownInHtmlExtension(Extension):
@@ -86,14 +269,12 @@ class MarkdownInHtmlExtension(Extension):
def extendMarkdown(self, md):
""" Register extension instances. """
- # Turn on processing of markdown text within raw html
- md.preprocessors['html_block'].markdown_in_raw = True
+ # Replace raw HTML preprocessor
+ md.preprocessors.register(HtmlBlockPreprocessor(md), 'html_block', 20)
+ # Add blockprocessor which handles the placeholders for etree elements
md.parser.blockprocessors.register(
MarkdownInHtmlProcessor(md.parser), 'markdown_block', 105
)
- md.parser.blockprocessors.tag_counter = -1
- md.parser.blockprocessors.contain_span_tags = re.compile(
- r'^(p|h[1-6]|li|dd|dt|td|th|legend|address)$', re.IGNORECASE)
def makeExtension(**kwargs): # pragma: no cover
diff --git a/markdown/htmlparser.py b/markdown/htmlparser.py
new file mode 100644
index 0000000..f83ddea
--- /dev/null
+++ b/markdown/htmlparser.py
@@ -0,0 +1,202 @@
+"""
+Python Markdown
+
+A Python implementation of John Gruber's Markdown.
+
+Documentation: https://python-markdown.github.io/
+GitHub: https://github.com/Python-Markdown/markdown/
+PyPI: https://pypi.org/project/Markdown/
+
+Started by Manfred Stienstra (http://www.dwerg.net/).
+Maintained for a few years by Yuri Takhteyev (http://www.freewisdom.org).
+Currently maintained by Waylan Limberg (https://github.com/waylan),
+Dmitry Shachnev (https://github.com/mitya57) and Isaac Muse (https://github.com/facelessuser).
+
+Copyright 2007-2020 The Python Markdown Project (v. 1.7 and later)
+Copyright 2004, 2005, 2006 Yuri Takhteyev (v. 0.2-1.6b)
+Copyright 2004 Manfred Stienstra (the original version)
+
+License: BSD (see LICENSE.md for details).
+"""
+
+import re
+import importlib
+import sys
+
+
+# Import a copy of the html.parser lib as `htmlparser` so we can monkeypatch it.
+# Users can still do `from html import parser` and get the default behavior.
+spec = importlib.util.find_spec('html.parser')
+htmlparser = importlib.util.module_from_spec(spec)
+spec.loader.exec_module(htmlparser)
+sys.modules['htmlparser'] = htmlparser
+
+# Monkeypatch HTMLParser to only accept `?>` to close Processing Instructions.
+htmlparser.piclose = re.compile(r'\?>')
+# Monkeypatch HTMLParser to only recognize entity references with a closing semicolon.
+htmlparser.entityref = re.compile(r'&([a-zA-Z][-.a-zA-Z0-9]*);')
+# Monkeypatch HTMLParser to no longer support partial entities. We are always feeding a complete block,
+# so the 'incomplete' functionality is unnecessary. As the entityref regex is run right before incomplete,
+# and the two regex are the same, then incomplete will simply never match and we avoid the logic within.
+htmlparser.incomplete = htmlparser.entityref
+
+# Match a blank line at the start of a block of text (two newlines).
+# The newlines may be preceded by additional whitespace.
+blank_line_re = re.compile(r'^([ ]*\n){2}')
+
+
+class HTMLExtractor(htmlparser.HTMLParser):
+ """
+ Extract raw HTML from text.
+
+ The raw HTML is stored in the `htmlStash` of the Markdown instance passed
+ to `md` and the remaining text is stored in `cleandoc` as a list of strings.
+ """
+
+ def __init__(self, md, *args, **kwargs):
+ if 'convert_charrefs' not in kwargs:
+ kwargs['convert_charrefs'] = False
+ # This calls self.reset
+ super().__init__(*args, **kwargs)
+ self.md = md
+
+ def reset(self):
+ """Reset this instance. Loses all unprocessed data."""
+ self.inraw = False
+ self.intail = False
+ self.stack = [] # When inraw==True, stack contains a list of tags
+ self._cache = []
+ self.cleandoc = []
+ super().reset()
+
+ def close(self):
+ """Handle any buffered data."""
+ super().close()
+ # Handle any unclosed tags.
+ if len(self._cache):
+ self.cleandoc.append(self.md.htmlStash.store(''.join(self._cache)))
+ self._cache = []
+
+ @property
+ def line_offset(self):
+ """Returns char index in self.rawdata for the start of the current line. """
+ if self.lineno > 1:
+ return re.match(r'([^\n]*\n){{{}}}'.format(self.lineno-1), self.rawdata).end()
+ return 0
+
+ def at_line_start(self):
+ """
+ Returns True if current position is at start of line.
+
+ Allows for up to three blank spaces at start of line.
+ """
+ if self.offset == 0:
+ return True
+ if self.offset > 3:
+ return False
+ # Confirm up to first 3 chars are whitespace
+ return self.rawdata[self.line_offset:self.line_offset + self.offset].strip() == ''
+
+ def get_endtag_text(self, tag):
+ """
+ Returns the text of the end tag.
+
+ If it fails to extract the actual text from the raw data, it builds a closing tag with `tag`.
+ """
+ # Attempt to extract actual tag from raw source text
+ start = self.line_offset + self.offset
+ m = htmlparser.endendtag.search(self.rawdata, start)
+ if m:
+ return self.rawdata[start:m.end()]
+ else: # pragma: no cover
+ # Failed to extract from raw data. Assume well formed and lowercase.
+ return '</{}>'.format(tag)
+
+ def handle_starttag(self, tag, attrs):
+ if self.md.is_block_level(tag) and (self.intail or (self.at_line_start() and not self.inraw)):
+ # Started a new raw block. Prepare stack.
+ self.inraw = True
+ self.cleandoc.append('\n')
+
+ text = self.get_starttag_text()
+ if self.inraw:
+ self.stack.append(tag)
+ self._cache.append(text)
+ else:
+ self.cleandoc.append(text)
+
+ def handle_endtag(self, tag):
+ text = self.get_endtag_text(tag)
+
+ if self.inraw:
+ self._cache.append(text)
+ if tag in self.stack:
+ # Remove tag from stack
+ while self.stack:
+ if self.stack.pop() == tag:
+ break
+ if len(self.stack) == 0:
+ # End of raw block.
+ if blank_line_re.match(self.rawdata[self.line_offset + self.offset + len(text):]):
+ # Preserve blank line and end of raw block.
+ self._cache.append('\n')
+ else:
+ # More content exists after endtag.
+ self.intail = True
+ # Reset stack.
+ self.inraw = False
+ self.cleandoc.append(self.md.htmlStash.store(''.join(self._cache)))
+ # Insert blank line between this and next line.
+ self.cleandoc.append('\n\n')
+ self._cache = []
+ else:
+ self.cleandoc.append(text)
+
+ def handle_data(self, data):
+ if self.intail and '\n' in data:
+ self.intail = False
+ if self.inraw:
+ self._cache.append(data)
+ else:
+ self.cleandoc.append(data)
+
+ def handle_empty_tag(self, data, is_block):
+ """ Handle empty tags (`<data>`). """
+ if self.inraw or self.intail:
+ # Append this to the existing raw block
+ self._cache.append(data)
+ elif self.at_line_start() and is_block:
+ # Handle this as a standalone raw block
+ if blank_line_re.match(self.rawdata[self.line_offset + self.offset + len(data):]):
+ # Preserve blank line after tag in raw block.
+ data += '\n'
+ else:
+ # More content exists after tag.
+ self.intail = True
+ self.cleandoc.append(self.md.htmlStash.store(data))
+ # Insert blank line between this and next line.
+ self.cleandoc.append('\n\n')
+ else:
+ self.cleandoc.append(data)
+
+ def handle_startendtag(self, tag, attrs):
+ self.handle_empty_tag(self.get_starttag_text(), is_block=self.md.is_block_level(tag))
+
+ def handle_charref(self, name):
+ self.handle_empty_tag('&#{};'.format(name), is_block=False)
+
+ def handle_entityref(self, name):
+ self.handle_empty_tag('&{};'.format(name), is_block=False)
+
+ def handle_comment(self, data):
+ self.handle_empty_tag('<!--{}-->'.format(data), is_block=True)
+
+ def handle_decl(self, data):
+ self.handle_empty_tag('<!{}>'.format(data), is_block=True)
+
+ def handle_pi(self, data):
+ self.handle_empty_tag('<?{}?>'.format(data), is_block=True)
+
+ def unknown_decl(self, data):
+ end = ']]>' if data.startswith('CDATA[') else ']>'
+ self.handle_empty_tag('<![{}{}'.format(data, end), is_block=True)
diff --git a/markdown/postprocessors.py b/markdown/postprocessors.py
index 95b85cd..cd32687 100644
--- a/markdown/postprocessors.py
+++ b/markdown/postprocessors.py
@@ -71,9 +71,8 @@ class RawHtmlPostprocessor(Postprocessor):
for i in range(self.md.htmlStash.html_counter):
html = self.md.htmlStash.rawHtmlBlocks[i]
if self.isblocklevel(html):
- replacements["<p>%s</p>" %
- (self.md.htmlStash.get_placeholder(i))] = \
- html + "\n"
+ replacements["<p>{}</p>".format(
+ self.md.htmlStash.get_placeholder(i))] = html
replacements[self.md.htmlStash.get_placeholder(i)] = html
if replacements:
diff --git a/markdown/preprocessors.py b/markdown/preprocessors.py
index f12a02a..e1023c5 100644
--- a/markdown/preprocessors.py
+++ b/markdown/preprocessors.py
@@ -26,6 +26,7 @@ complicated.
"""
from . import util
+from .htmlparser import HTMLExtractor
import re
@@ -34,7 +35,6 @@ def build_preprocessors(md, **kwargs):
preprocessors = util.Registry()
preprocessors.register(NormalizeWhitespace(md), 'normalize_whitespace', 30)
preprocessors.register(HtmlBlockPreprocessor(md), 'html_block', 20)
- preprocessors.register(ReferencePreprocessor(md), 'reference', 10)
return preprocessors
@@ -74,297 +74,9 @@ class NormalizeWhitespace(Preprocessor):
class HtmlBlockPreprocessor(Preprocessor):
"""Remove html blocks from the text and store them for later retrieval."""
- right_tag_patterns = ["</%s>", "%s>"]
- attrs_pattern = r"""
- \s+(?P<attr>[^>"'/= ]+)=(?P<q>['"])(?P<value>.*?)(?P=q) # attr="value"
- | # OR
- \s+(?P<attr1>[^>"'/= ]+)=(?P<value1>[^> ]+) # attr=value
- | # OR
- \s+(?P<attr2>[^>"'/= ]+) # attr
- """
- left_tag_pattern = r'^\<(?P<tag>[^> ]+)(?P<attrs>(%s)*)\s*\/?\>?' % \
- attrs_pattern
- attrs_re = re.compile(attrs_pattern, re.VERBOSE)
- left_tag_re = re.compile(left_tag_pattern, re.VERBOSE)
- markdown_in_raw = False
-
- def _get_left_tag(self, block):
- m = self.left_tag_re.match(block)
- if m:
- tag = m.group('tag')
- raw_attrs = m.group('attrs')
- attrs = {}
- if raw_attrs:
- for ma in self.attrs_re.finditer(raw_attrs):
- if ma.group('attr'):
- if ma.group('value'):
- attrs[ma.group('attr').strip()] = ma.group('value')
- else:
- attrs[ma.group('attr').strip()] = ""
- elif ma.group('attr1'):
- if ma.group('value1'):
- attrs[ma.group('attr1').strip()] = ma.group(
- 'value1'
- )
- else:
- attrs[ma.group('attr1').strip()] = ""
- elif ma.group('attr2'):
- attrs[ma.group('attr2').strip()] = ""
- return tag, len(m.group(0)), attrs
- else:
- tag = block[1:].split(">", 1)[0].lower()
- return tag, len(tag)+2, {}
-
- def _recursive_tagfind(self, ltag, rtag, start_index, block):
- while 1:
- i = block.find(rtag, start_index)
- if i == -1:
- return -1
- j = block.find(ltag, start_index)
- # if no ltag, or rtag found before another ltag, return index
- if (j > i or j == -1):
- return i + len(rtag)
- # another ltag found before rtag, use end of ltag as starting
- # point and search again
- j = block.find('>', j)
- start_index = self._recursive_tagfind(ltag, rtag, j + 1, block)
- if start_index == -1:
- # HTML potentially malformed- ltag has no corresponding
- # rtag
- return -1
-
- def _get_right_tag(self, left_tag, left_index, block):
- for p in self.right_tag_patterns:
- tag = p % left_tag
- i = self._recursive_tagfind(
- "<%s" % left_tag, tag, left_index, block
- )
- if i > 2:
- return tag.lstrip("<").rstrip(">"), i
- return block.rstrip()[-left_index:-1].lower(), len(block)
-
- def _equal_tags(self, left_tag, right_tag):
- if left_tag[0] in ['?', '@', '%']: # handle PHP, etc.
- return True
- if ("/" + left_tag) == right_tag:
- return True
- if (right_tag == "--" and left_tag == "--"):
- return True
- elif left_tag == right_tag[1:] and right_tag[0] == "/":
- return True
- else:
- return False
-
- def _is_oneliner(self, tag):
- return (tag in ['hr', 'hr/'])
-
- def _stringindex_to_listindex(self, stringindex, items):
- """
- Same effect as concatenating the strings in items,
- finding the character to which stringindex refers in that string,
- and returning the index of the item in which that character resides.
- """
- items.append('dummy')
- i, count = 0, 0
- while count <= stringindex:
- count += len(items[i])
- i += 1
- return i - 1
-
- def _nested_markdown_in_html(self, items):
- """Find and process html child elements of the given element block."""
- for i, item in enumerate(items):
- if self.left_tag_re.match(item):
- left_tag, left_index, attrs = \
- self._get_left_tag(''.join(items[i:]))
- right_tag, data_index = self._get_right_tag(
- left_tag, left_index, ''.join(items[i:]))
- right_listindex = \
- self._stringindex_to_listindex(data_index, items[i:]) + i
- if 'markdown' in attrs.keys():
- items[i] = items[i][left_index:] # remove opening tag
- placeholder = self.md.htmlStash.store_tag(
- left_tag, attrs, i + 1, right_listindex + 1)
- items.insert(i, placeholder)
- if len(items) - right_listindex <= 1: # last nest, no tail
- right_listindex -= 1
- items[right_listindex] = items[right_listindex][
- :-len(right_tag) - 2] # remove closing tag
- else: # raw html
- if len(items) - right_listindex <= 1: # last element
- right_listindex -= 1
- if right_listindex <= i:
- right_listindex = i + 1
- placeholder = self.md.htmlStash.store('\n\n'.join(
- items[i:right_listindex]))
- del items[i:right_listindex]
- items.insert(i, placeholder)
- return items
-
- def run(self, lines):
- text = "\n".join(lines)
- new_blocks = []
- text = text.rsplit("\n\n")
- items = []
- left_tag = ''
- right_tag = ''
- in_tag = False # flag
-
- while text:
- block = text[0]
- if block.startswith("\n"):
- block = block[1:]
- text = text[1:]
-
- if block.startswith("\n"):
- block = block[1:]
-
- if not in_tag:
- if block.startswith("<") and len(block.strip()) > 1:
-
- if block[1:4] == "!--":
- # is a comment block
- left_tag, left_index, attrs = "--", 2, {}
- else:
- left_tag, left_index, attrs = self._get_left_tag(block)
- right_tag, data_index = self._get_right_tag(left_tag,
- left_index,
- block)
- # keep checking conditions below and maybe just append
-
- if data_index < len(block) and (self.md.is_block_level(left_tag) or left_tag == '--'):
- text.insert(0, block[data_index:])
- block = block[:data_index]
-
- if not (self.md.is_block_level(left_tag) or block[1] in ["!", "?", "@", "%"]):
- new_blocks.append(block)
- continue
-
- if self._is_oneliner(left_tag):
- new_blocks.append(block.strip())
- continue
-
- if block.rstrip().endswith(">") \
- and self._equal_tags(left_tag, right_tag):
- if self.markdown_in_raw and 'markdown' in attrs.keys():
- block = block[left_index:-len(right_tag) - 2]
- new_blocks.append(self.md.htmlStash.
- store_tag(left_tag, attrs, 0, 2))
- new_blocks.extend([block])
- else:
- new_blocks.append(
- self.md.htmlStash.store(block.strip()))
- continue
- else:
- # if is block level tag and is not complete
- if (not self._equal_tags(left_tag, right_tag)) and \
- (self.md.is_block_level(left_tag) or left_tag == "--"):
- items.append(block.strip())
- in_tag = True
- else:
- new_blocks.append(
- self.md.htmlStash.store(block.strip())
- )
- continue
-
- else:
- new_blocks.append(block)
-
- else:
- items.append(block)
-
- # Need to evaluate all items so we can calculate relative to the left index.
- right_tag, data_index = self._get_right_tag(left_tag, left_index, ''.join(items))
- # Adjust data_index: relative to items -> relative to last block
- prev_block_length = 0
- for item in items[:-1]:
- prev_block_length += len(item)
- data_index -= prev_block_length
-
- if self._equal_tags(left_tag, right_tag):
- # if find closing tag
-
- if data_index < len(block):
- # we have more text after right_tag
- items[-1] = block[:data_index]
- text.insert(0, block[data_index:])
-
- in_tag = False
- if self.markdown_in_raw and 'markdown' in attrs.keys():
- items[0] = items[0][left_index:]
- items[-1] = items[-1][:-len(right_tag) - 2]
- if items[len(items) - 1]: # not a newline/empty string
- right_index = len(items) + 3
- else:
- right_index = len(items) + 2
- new_blocks.append(self.md.htmlStash.store_tag(
- left_tag, attrs, 0, right_index))
- placeholderslen = len(self.md.htmlStash.tag_data)
- new_blocks.extend(
- self._nested_markdown_in_html(items))
- nests = len(self.md.htmlStash.tag_data) - \
- placeholderslen
- self.md.htmlStash.tag_data[-1 - nests][
- 'right_index'] += nests - 2
- else:
- new_blocks.append(
- self.md.htmlStash.store('\n\n'.join(items)))
- items = []
-
- if items:
- if self.markdown_in_raw and 'markdown' in attrs.keys():
- items[0] = items[0][left_index:]
- items[-1] = items[-1][:-len(right_tag) - 2]
- if items[len(items) - 1]: # not a newline/empty string
- right_index = len(items) + 3
- else:
- right_index = len(items) + 2
- new_blocks.append(
- self.md.htmlStash.store_tag(
- left_tag, attrs, 0, right_index))
- placeholderslen = len(self.md.htmlStash.tag_data)
- new_blocks.extend(self._nested_markdown_in_html(items))
- nests = len(self.md.htmlStash.tag_data) - placeholderslen
- self.md.htmlStash.tag_data[-1 - nests][
- 'right_index'] += nests - 2
- else:
- new_blocks.append(
- self.md.htmlStash.store('\n\n'.join(items)))
- new_blocks.append('\n')
-
- new_text = "\n\n".join(new_blocks)
- return new_text.split("\n")
-
-
-class ReferencePreprocessor(Preprocessor):
- """ Remove reference definitions from text and store for later use. """
-
- TITLE = r'[ ]*(\"(.*)\"|\'(.*)\'|\((.*)\))[ ]*'
- RE = re.compile(
- r'^[ ]{0,3}\[([^\]]*)\]:\s*([^ ]*)[ ]*(%s)?$' % TITLE, re.DOTALL
- )
- TITLE_RE = re.compile(r'^%s$' % TITLE)
-
def run(self, lines):
- new_text = []
- while lines:
- line = lines.pop(0)
- m = self.RE.match(line)
- if m:
- id = m.group(1).strip().lower()
- link = m.group(2).lstrip('<').rstrip('>')
- t = m.group(5) or m.group(6) or m.group(7)
- if not t:
- # Check next line for title
- tm = self.TITLE_RE.match(lines[0])
- if tm:
- lines.pop(0)
- t = tm.group(2) or tm.group(3) or tm.group(4)
- self.md.references[id] = (link, t)
- # Preserve the line to prevent raw HTML indexing issue.
- # https://github.com/Python-Markdown/markdown/issues/584
- new_text.append('')
- else:
- new_text.append(line)
-
- return new_text # + "\n"
+ source = '\n'.join(lines)
+ parser = HTMLExtractor(self.md)
+ parser.feed(source)
+ parser.close()
+ return ''.join(parser.cleandoc).split('\n')
diff --git a/tests/basic/inline-html-advanced.html b/tests/basic/inline-html-advanced.html
deleted file mode 100644
index af1dec1..0000000
--- a/tests/basic/inline-html-advanced.html
+++ /dev/null
@@ -1,12 +0,0 @@
-<p>Simple block on one line:</p>
-<div>foo</div>
-
-<p>And nested without indentation:</p>
-<div>
-<div>
-<div>
-foo
-</div>
-</div>
-<div>bar</div>
-</div> \ No newline at end of file
diff --git a/tests/basic/inline-html-advanced.txt b/tests/basic/inline-html-advanced.txt
deleted file mode 100644
index 9d71ddc..0000000
--- a/tests/basic/inline-html-advanced.txt
+++ /dev/null
@@ -1,14 +0,0 @@
-Simple block on one line:
-
-<div>foo</div>
-
-And nested without indentation:
-
-<div>
-<div>
-<div>
-foo
-</div>
-</div>
-<div>bar</div>
-</div>
diff --git a/tests/basic/inline-html-comments.html b/tests/basic/inline-html-comments.html
deleted file mode 100644
index 0d4cad9..0000000
--- a/tests/basic/inline-html-comments.html
+++ /dev/null
@@ -1,11 +0,0 @@
-<p>Paragraph one.</p>
-<!-- This is a simple comment -->
-
-<!--
- This is another comment.
--->
-
-<p>Paragraph two.</p>
-<!-- one comment block -- -- with two comments -->
-
-<p>The end.</p> \ No newline at end of file
diff --git a/tests/basic/inline-html-comments.txt b/tests/basic/inline-html-comments.txt
deleted file mode 100644
index 41d830d..0000000
--- a/tests/basic/inline-html-comments.txt
+++ /dev/null
@@ -1,13 +0,0 @@
-Paragraph one.
-
-<!-- This is a simple comment -->
-
-<!--
- This is another comment.
--->
-
-Paragraph two.
-
-<!-- one comment block -- -- with two comments -->
-
-The end.
diff --git a/tests/basic/inline-html-simple.html b/tests/basic/inline-html-simple.html
deleted file mode 100644
index 0f2633c..0000000
--- a/tests/basic/inline-html-simple.html
+++ /dev/null
@@ -1,61 +0,0 @@
-<p>Here's a simple block:</p>
-<div>
- foo
-</div>
-
-<p>This should be a code block, though:</p>
-<pre><code>&lt;div&gt;
- foo
-&lt;/div&gt;
-</code></pre>
-<p>As should this:</p>
-<pre><code>&lt;div&gt;foo&lt;/div&gt;
-</code></pre>
-<p>Now, nested:</p>
-<div>
- <div>
- <div>
- foo
- </div>
- </div>
-</div>
-
-<p>This should just be an HTML comment:</p>
-<!-- Comment -->
-
-<p>Multiline:</p>
-<!--
-Blah
-Blah
--->
-
-<p>Code block:</p>
-<pre><code>&lt;!-- Comment --&gt;
-</code></pre>
-<p>Just plain comment, with trailing spaces on the line:</p>
-<!-- foo -->
-
-<p>Code:</p>
-<pre><code>&lt;hr /&gt;
-</code></pre>
-<p>Hr's:</p>
-<hr>
-
-<hr/>
-
-<hr />
-
-<hr>
-
-<hr/>
-
-<hr />
-
-<hr class="foo" id="bar" />
-
-<hr class="foo" id="bar"/>
-
-<hr class="foo" id="bar" >
-
-<p><some <a href="http://example.com">weird</a> stuff></p>
-<p><some>&gt; &lt;<unbalanced>&gt; &lt;<brackets></p> \ No newline at end of file
diff --git a/tests/basic/inline-html-simple.txt b/tests/basic/inline-html-simple.txt
deleted file mode 100644
index 359aca4..0000000
--- a/tests/basic/inline-html-simple.txt
+++ /dev/null
@@ -1,72 +0,0 @@
-Here's a simple block:
-
-<div>
- foo
-</div>
-
-This should be a code block, though:
-
- <div>
- foo
- </div>
-
-As should this:
-
- <div>foo</div>
-
-Now, nested:
-
-<div>
- <div>
- <div>
- foo
- </div>
- </div>
-</div>
-
-This should just be an HTML comment:
-
-<!-- Comment -->
-
-Multiline:
-
-<!--
-Blah
-Blah
--->
-
-Code block:
-
- <!-- Comment -->
-
-Just plain comment, with trailing spaces on the line:
-
-<!-- foo -->
-
-Code:
-
- <hr />
-
-Hr's:
-
-<hr>
-
-<hr/>
-
-<hr />
-
-<hr>
-
-<hr/>
-
-<hr />
-
-<hr class="foo" id="bar" />
-
-<hr class="foo" id="bar"/>
-
-<hr class="foo" id="bar" >
-
-<some [weird](http://example.com) stuff>
-
-<some>> <<unbalanced>> <<brackets> \ No newline at end of file
diff --git a/tests/extensions/extra/abbr.html b/tests/extensions/extra/abbr.html
deleted file mode 100644
index 456524e..0000000
--- a/tests/extensions/extra/abbr.html
+++ /dev/null
@@ -1,4 +0,0 @@
-<p>An <abbr title="Abbreviation">ABBR</abbr>: "<abbr title="Reference">REF</abbr>".
-ref and REFERENCE should be ignored.</p>
-<p>The <abbr title="Hyper Text Markup Language">HTML</abbr> specification
-is maintained by the <abbr title="World Wide Web Consortium">W3C</abbr>.</p> \ No newline at end of file
diff --git a/tests/extensions/extra/abbr.txt b/tests/extensions/extra/abbr.txt
deleted file mode 100644
index 991bf15..0000000
--- a/tests/extensions/extra/abbr.txt
+++ /dev/null
@@ -1,13 +0,0 @@
-An ABBR: "REF".
-ref and REFERENCE should be ignored.
-
-*[REF]: Reference
-*[ABBR]: This gets overriden by the next one.
-*[ABBR]: Abbreviation
-
-The HTML specification
-is maintained by the W3C.
-
-*[HTML]: Hyper Text Markup Language
-*[W3C]: World Wide Web Consortium
-
diff --git a/tests/extensions/extra/raw-html.html b/tests/extensions/extra/raw-html.html
index ac367d7..ef94cb3 100644
--- a/tests/extensions/extra/raw-html.html
+++ b/tests/extensions/extra/raw-html.html
@@ -14,11 +14,13 @@
</div>
<p>The tail of the <code>DefaultBlockMode</code> subelement.</p>
<p name="DefaultSpanMode">
-This text <em>is not</em> wrapped in additional <code>p</code> tags.</p>
+This text <em>is not</em> wrapped in additional <code>p</code> tags.
+</p>
<p>The tail of the <code>DefaultSpanMode</code> subelement.</p>
<div name="SpanModeOverride">
This <code>div</code> block is not wrapped in paragraph tags.
-Note: Subelements are not required to have tail text.</div>
+Note: Subelements are not required to have tail text.
+</div>
<p name="BlockModeOverride">
<p>This <code>p</code> block <em>is</em> foolishly wrapped in further paragraph tags.</p>
</p>
@@ -26,7 +28,6 @@ Note: Subelements are not required to have tail text.</div>
<div name="RawHtml">
Raw html blocks may also be nested.
</div>
-
</div>
<p>This text is after the markdown in html.</p>
<div name="issue308">
@@ -38,14 +39,12 @@ Raw html blocks may also be nested.
<div name="RawHtml">
Raw html blocks may also be nested.
</div>
-
<p>Markdown is <em>still</em> active here.</p>
</div>
<p>Markdown is <em>active again</em> here.</p>
<div>
<p>foo bar</p>
-<p><em>bar</em>
-</p>
+<p><em>bar</em></p>
</div>
<div name="issue584">
<div>
diff --git a/tests/extensions/github_flavored.html b/tests/extensions/github_flavored.html
index b39165a..98dc82a 100644
--- a/tests/extensions/github_flavored.html
+++ b/tests/extensions/github_flavored.html
@@ -30,7 +30,6 @@
+ CONTEXT_DIFF_LINE_PATTERN,
+```
</code></pre>
-
<p>Test support for foo+bar lexer names.</p>
<pre><code class="language-html+jinja">&lt;title&gt;{% block title %}{% endblock %}&lt;/title&gt;
&lt;ul&gt;
diff --git a/tests/misc/ampersand.html b/tests/misc/ampersand.html
deleted file mode 100644
index 94ed80c..0000000
--- a/tests/misc/ampersand.html
+++ /dev/null
@@ -1,2 +0,0 @@
-<p>&amp;</p>
-<p>AT&amp;T</p> \ No newline at end of file
diff --git a/tests/misc/ampersand.txt b/tests/misc/ampersand.txt
deleted file mode 100644
index 367d32c..0000000
--- a/tests/misc/ampersand.txt
+++ /dev/null
@@ -1,5 +0,0 @@
-&
-
-AT&T
-
-
diff --git a/tests/misc/block_html5.html b/tests/misc/block_html5.html
deleted file mode 100644
index b7a2fd3..0000000
--- a/tests/misc/block_html5.html
+++ /dev/null
@@ -1,16 +0,0 @@
-<section>
- <header>
- <hgroup>
- <h1>Hello :-)</h1>
- </hgroup>
- </header>
- <figure>
- <img src="image.png" alt="" />
- <figcaption>Caption</figcaption>
- </figure>
- <footer>
- <p>Some footer</p>
- </footer>
-</section>
-
-<figure></figure> \ No newline at end of file
diff --git a/tests/misc/block_html5.txt b/tests/misc/block_html5.txt
deleted file mode 100644
index 2b24cad..0000000
--- a/tests/misc/block_html5.txt
+++ /dev/null
@@ -1,14 +0,0 @@
-<section>
- <header>
- <hgroup>
- <h1>Hello :-)</h1>
- </hgroup>
- </header>
- <figure>
- <img src="image.png" alt="" />
- <figcaption>Caption</figcaption>
- </figure>
- <footer>
- <p>Some footer</p>
- </footer>
-</section><figure></figure>
diff --git a/tests/misc/block_html_attr.html b/tests/misc/block_html_attr.html
deleted file mode 100644
index d1c9efc..0000000
--- a/tests/misc/block_html_attr.html
+++ /dev/null
@@ -1,27 +0,0 @@
-<blockquote>
-Raw HTML processing should not confuse this with the blockquote below
-</blockquote>
-
-<div id="current-content">
- <div id="primarycontent" class="hfeed">
- <div id="post-">
- <div class="page-head">
- <h2>Header2</h2>
- </div>
- <div class="entry-content">
- <h3>Header3</h3>
- <p>Paragraph</p>
- <h3>Header3</h3>
- <p>Paragraph</p>
- <blockquote>
- <p>Paragraph</p>
- </blockquote>
- <p>Paragraph</p>
- <p><a href="/somelink">linktext</a></p>
- </div>
- </div><!-- #post-ID -->
- <!-- add contact form here -->
- </div><!-- #primarycontent -->
-</div>
-
-<!-- #current-content --> \ No newline at end of file
diff --git a/tests/misc/block_html_attr.txt b/tests/misc/block_html_attr.txt
deleted file mode 100644
index b2603cc..0000000
--- a/tests/misc/block_html_attr.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-<blockquote>
-Raw HTML processing should not confuse this with the blockquote below
-</blockquote>
-<div id="current-content">
- <div id="primarycontent" class="hfeed">
- <div id="post-">
- <div class="page-head">
- <h2>Header2</h2>
- </div>
- <div class="entry-content">
- <h3>Header3</h3>
- <p>Paragraph</p>
- <h3>Header3</h3>
- <p>Paragraph</p>
- <blockquote>
- <p>Paragraph</p>
- </blockquote>
- <p>Paragraph</p>
- <p><a href="/somelink">linktext</a></p>
- </div>
- </div><!-- #post-ID -->
- <!-- add contact form here -->
- </div><!-- #primarycontent -->
-</div><!-- #current-content -->
diff --git a/tests/misc/block_html_simple.html b/tests/misc/block_html_simple.html
deleted file mode 100644
index dce68bc..0000000
--- a/tests/misc/block_html_simple.html
+++ /dev/null
@@ -1,10 +0,0 @@
-<p>foo</p>
-
-<ul>
-<li>
-<p>bar</p>
-</li>
-<li>
-<p>baz</p>
-</li>
-</ul> \ No newline at end of file
diff --git a/tests/misc/block_html_simple.txt b/tests/misc/block_html_simple.txt
deleted file mode 100644
index d108c50..0000000
--- a/tests/misc/block_html_simple.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-<p>foo</p>
-<ul>
-<li>
-<p>bar</p>
-</li>
-<li>
-<p>baz</p>
-</li>
-</ul>
diff --git a/tests/misc/comments.html b/tests/misc/comments.html
deleted file mode 100644
index 2240ab9..0000000
--- a/tests/misc/comments.html
+++ /dev/null
@@ -1,9 +0,0 @@
-<p>X&lt;0</p>
-<p>X&gt;0</p>
-<!-- A comment -->
-
-<div>as if</div>
-
-<!-- comment -->
-
-<p><strong>no blank line</strong></p> \ No newline at end of file
diff --git a/tests/misc/comments.txt b/tests/misc/comments.txt
deleted file mode 100644
index d9186f0..0000000
--- a/tests/misc/comments.txt
+++ /dev/null
@@ -1,10 +0,0 @@
-X<0
-
-X>0
-
-<!-- A comment -->
-
-<div>as if</div>
-
-<!-- comment -->
-__no blank line__
diff --git a/tests/misc/div.html b/tests/misc/div.html
deleted file mode 100644
index cb6a759..0000000
--- a/tests/misc/div.html
+++ /dev/null
@@ -1,10 +0,0 @@
-<div id="sidebar">
-
- _foo_
-
-</div>
-
-<p>And now in uppercase:</p>
-<DIV>
-foo
-</DIV> \ No newline at end of file
diff --git a/tests/misc/div.txt b/tests/misc/div.txt
deleted file mode 100644
index 4ff972e..0000000
--- a/tests/misc/div.txt
+++ /dev/null
@@ -1,11 +0,0 @@
-<div id="sidebar">
-
- _foo_
-
-</div>
-
-And now in uppercase:
-
-<DIV>
-foo
-</DIV>
diff --git a/tests/misc/html-comments.html b/tests/misc/html-comments.html
deleted file mode 100644
index 7b36246..0000000
--- a/tests/misc/html-comments.html
+++ /dev/null
@@ -1,2 +0,0 @@
-<p>Here is HTML <!-- **comment** -->
-and once more <p><!--comment--></p></p> \ No newline at end of file
diff --git a/tests/misc/html-comments.txt b/tests/misc/html-comments.txt
deleted file mode 100644
index cac4da5..0000000
--- a/tests/misc/html-comments.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-Here is HTML <!-- **comment** -->
-and once more <p><!--comment--></p>
diff --git a/tests/misc/html.html b/tests/misc/html.html
deleted file mode 100644
index 293e6cc..0000000
--- a/tests/misc/html.html
+++ /dev/null
@@ -1,29 +0,0 @@
-<h1>Block level html</h1>
-
-<p>Some inline <b>stuff<b>. </p>
-<p>Now some <arbitrary>arbitrary tags</arbitrary>.</p>
-<div>More block level html.</div>
-
-<div class="foo bar" title="with 'quoted' text." valueless_attr weirdness="<i>foo</i>">
-Html with various attributes.
-</div>
-
-<div>
- <div>
- Div with a blank line
-
- in the middle.
- </div>
- <div>
- This gets treated as HTML.
- </div>
-</div>
-
-<p>And of course <script>blah</script>.</p>
-<p><a href="&lt;script&gt;stuff&lt;/script&gt;">this <script>link</a></p>
-<p>Some funky <x\]> inline stuff with markdown escaping syntax.</p>
-<p><img scr="foo.png" title="Only one inline element on a line." /></p>
-<p>And now a line with only an opening bracket:</p>
-<p>&lt;</p>
-<p>And one with other stuff but no closing bracket:</p>
-<p>&lt; foo</p> \ No newline at end of file
diff --git a/tests/misc/html.txt b/tests/misc/html.txt
deleted file mode 100644
index 8f18fa7..0000000
--- a/tests/misc/html.txt
+++ /dev/null
@@ -1,40 +0,0 @@
-
-<h1>Block level html</h1>
-
-Some inline <b>stuff<b>.
-
-Now some <arbitrary>arbitrary tags</arbitrary>.
-
-<div>More block level html.</div>
-
-<div class="foo bar" title="with 'quoted' text." valueless_attr weirdness="<i>foo</i>">
-Html with various attributes.
-</div>
-
-<div>
- <div>
- Div with a blank line
-
- in the middle.
- </div>
- <div>
- This gets treated as HTML.
- </div>
-</div>
-
-And of course <script>blah</script>.
-
-[this <script>link](<script>stuff</script>)
-
-Some funky <x\]> inline stuff with markdown escaping syntax.
-
-<img scr="foo.png" title="Only one inline element on a line." />
-
-And now a line with only an opening bracket:
-
-<
-
-And one with other stuff but no closing bracket:
-
-< foo
-
diff --git a/tests/misc/markup-inside-p.html b/tests/misc/markup-inside-p.html
deleted file mode 100644
index 1b6b420..0000000
--- a/tests/misc/markup-inside-p.html
+++ /dev/null
@@ -1,21 +0,0 @@
-<p>
-
-_foo_
-
-</p>
-
-<p>
-_foo_
-</p>
-
-<p>_foo_</p>
-
-<p>
-
-_foo_
-</p>
-
-<p>
-_foo_
-
-</p> \ No newline at end of file
diff --git a/tests/misc/markup-inside-p.txt b/tests/misc/markup-inside-p.txt
deleted file mode 100644
index ab7dd0f..0000000
--- a/tests/misc/markup-inside-p.txt
+++ /dev/null
@@ -1,21 +0,0 @@
-<p>
-
-_foo_
-
-</p>
-
-<p>
-_foo_
-</p>
-
-<p>_foo_</p>
-
-<p>
-
-_foo_
-</p>
-
-<p>
-_foo_
-
-</p>
diff --git a/tests/misc/mismatched-tags.html b/tests/misc/mismatched-tags.html
deleted file mode 100644
index 06bd57f..0000000
--- a/tests/misc/mismatched-tags.html
+++ /dev/null
@@ -1,14 +0,0 @@
-<p>Some text</p>
-
-<div>some more text</div>
-
-<p>and a bit more</p>
-<p>And this output</p>
-
-<p><em>Compatible with PHP Markdown Extra 1.2.2 and Markdown.pl1.0.2b8:</em></p>
-<!-- comment -->
-
-<p><div>text</div><br /></p>
-
-<p><br /></p>
-<p>Should be in p</p> \ No newline at end of file
diff --git a/tests/misc/mismatched-tags.txt b/tests/misc/mismatched-tags.txt
deleted file mode 100644
index 8e6a52f..0000000
--- a/tests/misc/mismatched-tags.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-<p>Some text</p><div>some more text</div>
-
-and a bit more
-
-<p>And this output</p> *Compatible with PHP Markdown Extra 1.2.2 and Markdown.pl1.0.2b8:*
-
-<!-- comment --><p><div>text</div><br /></p><br />
-
-Should be in p
diff --git a/tests/misc/more_comments.html b/tests/misc/more_comments.html
deleted file mode 100644
index 5ca6731..0000000
--- a/tests/misc/more_comments.html
+++ /dev/null
@@ -1,8 +0,0 @@
-<!asd@asdfd.com>
-
-<p>Foo</p>
-<p><asd!@asdfd.com></p>
-<p>Bar</p>
-<!--asd@asdfd.com>
-
-Still in unclosed comment \ No newline at end of file
diff --git a/tests/misc/more_comments.txt b/tests/misc/more_comments.txt
deleted file mode 100644
index ddc5bd3..0000000
--- a/tests/misc/more_comments.txt
+++ /dev/null
@@ -1,11 +0,0 @@
-<!asd@asdfd.com>
-
-Foo
-
-<asd!@asdfd.com>
-
-Bar
-
-<!--asd@asdfd.com>
-
-Still in unclosed comment
diff --git a/tests/misc/multi-line-tags.html b/tests/misc/multi-line-tags.html
deleted file mode 100644
index 69899aa..0000000
--- a/tests/misc/multi-line-tags.html
+++ /dev/null
@@ -1,13 +0,0 @@
-<div>
-
-asdf asdfasd
-
-</div>
-
-<div>
-
-foo bar
-
-</div>
-
-<p>No blank line.</p> \ No newline at end of file
diff --git a/tests/misc/multi-line-tags.txt b/tests/misc/multi-line-tags.txt
deleted file mode 100644
index 9056473..0000000
--- a/tests/misc/multi-line-tags.txt
+++ /dev/null
@@ -1,13 +0,0 @@
-
-<div>
-
-asdf asdfasd
-
-</div>
-
-<div>
-
-foo bar
-
-</div>
-No blank line.
diff --git a/tests/misc/multiline-comments.html b/tests/misc/multiline-comments.html
deleted file mode 100644
index 4bdd5d0..0000000
--- a/tests/misc/multiline-comments.html
+++ /dev/null
@@ -1,37 +0,0 @@
-<!--
-
-foo
-
--->
-
-<p>
-
-foo
-
-</p>
-
-<div>
-
-foo
-
-</div>
-
-<!-- foo
-
--->
-
-<!-- <tag>
-
--->
-
-<!--
-
-foo -->
-
-<!--
-
-<tag> -->
-
-<!-- unclosed comment
-
-__Still__ a comment (browsers see it that way) \ No newline at end of file
diff --git a/tests/misc/multiline-comments.txt b/tests/misc/multiline-comments.txt
deleted file mode 100644
index eb567dd..0000000
--- a/tests/misc/multiline-comments.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-<!--
-
-foo
-
--->
-
-<p>
-
-foo
-
-</p>
-
-
-<div>
-
-foo
-
-</div>
-
-<!-- foo
-
--->
-
-<!-- <tag>
-
--->
-
-<!--
-
-foo -->
-
-<!--
-
-<tag> -->
-
-<!-- unclosed comment
-
-__Still__ a comment (browsers see it that way)
diff --git a/tests/misc/php.html b/tests/misc/php.html
deleted file mode 100644
index 8cd4ed5..0000000
--- a/tests/misc/php.html
+++ /dev/null
@@ -1,11 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<p><b>This should have a p tag</b></p>
-<!--This is a comment -->
-
-<div>This shouldn't</div>
-
-<?php echo "block_level";?>
-
-<p>&lt;?php echo "not_block_level";?&gt;</p> \ No newline at end of file
diff --git a/tests/misc/php.txt b/tests/misc/php.txt
deleted file mode 100644
index ca5be45..0000000
--- a/tests/misc/php.txt
+++ /dev/null
@@ -1,13 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<b>This should have a p tag</b>
-
-<!--This is a comment -->
-
-<div>This shouldn't</div>
-
-<?php echo "block_level";?>
-
- <?php echo "not_block_level";?>
-
diff --git a/tests/misc/pre.html b/tests/misc/pre.html
deleted file mode 100644
index a44ae12..0000000
--- a/tests/misc/pre.html
+++ /dev/null
@@ -1,13 +0,0 @@
-<pre>
-
-aaa
-
-bbb
-</pre>
-
-<pre>
-* and this is pre-formatted content
-* and it should be printed just like this
-* and not formatted as a list
-
-</pre> \ No newline at end of file
diff --git a/tests/misc/pre.txt b/tests/misc/pre.txt
deleted file mode 100644
index 31243b5..0000000
--- a/tests/misc/pre.txt
+++ /dev/null
@@ -1,14 +0,0 @@
-<pre>
-
-aaa
-
-bbb
-</pre>
-
-<pre>
-* and this is pre-formatted content
-* and it should be printed just like this
-* and not formatted as a list
-
-</pre>
-
diff --git a/tests/misc/raw_whitespace.html b/tests/misc/raw_whitespace.html
deleted file mode 100644
index 7a6f131..0000000
--- a/tests/misc/raw_whitespace.html
+++ /dev/null
@@ -1,8 +0,0 @@
-<p>Preserve whitespace in raw html</p>
-<pre>
-class Foo():
- bar = 'bar'
-
- def baz(self):
- print self.bar
-</pre> \ No newline at end of file
diff --git a/tests/misc/raw_whitespace.txt b/tests/misc/raw_whitespace.txt
deleted file mode 100644
index bbc7cec..0000000
--- a/tests/misc/raw_whitespace.txt
+++ /dev/null
@@ -1,10 +0,0 @@
-Preserve whitespace in raw html
-
-<pre>
-class Foo():
- bar = 'bar'
-
- def baz(self):
- print self.bar
-</pre>
-
diff --git a/tests/test_syntax/blocks/test_html_blocks.py b/tests/test_syntax/blocks/test_html_blocks.py
new file mode 100644
index 0000000..0a2092d
--- /dev/null
+++ b/tests/test_syntax/blocks/test_html_blocks.py
@@ -0,0 +1,1319 @@
+# -*- coding: utf-8 -*-
+"""
+Python Markdown
+
+A Python implementation of John Gruber's Markdown.
+
+Documentation: https://python-markdown.github.io/
+GitHub: https://github.com/Python-Markdown/markdown/
+PyPI: https://pypi.org/project/Markdown/
+
+Started by Manfred Stienstra (http://www.dwerg.net/).
+Maintained for a few years by Yuri Takhteyev (http://www.freewisdom.org).
+Currently maintained by Waylan Limberg (https://github.com/waylan),
+Dmitry Shachnev (https://github.com/mitya57) and Isaac Muse (https://github.com/facelessuser).
+
+Copyright 2007-2018 The Python Markdown Project (v. 1.7 and later)
+Copyright 2004, 2005, 2006 Yuri Takhteyev (v. 0.2-1.6b)
+Copyright 2004 Manfred Stienstra (the original version)
+
+License: BSD (see LICENSE.md for details).
+"""
+
+from markdown.test_tools import TestCase
+
+
+class TestHTMLBlocks(TestCase):
+
+ def test_raw_paragraph(self):
+ self.assertMarkdownRenders(
+ '<p>A raw paragraph.</p>',
+ '<p>A raw paragraph.</p>'
+ )
+
+ def test_raw_skip_inline_markdown(self):
+ self.assertMarkdownRenders(
+ '<p>A *raw* paragraph.</p>',
+ '<p>A *raw* paragraph.</p>'
+ )
+
+ def test_raw_indent_one_space(self):
+ self.assertMarkdownRenders(
+ ' <p>A *raw* paragraph.</p>',
+ '<p>A *raw* paragraph.</p>'
+ )
+
+ def test_raw_indent_two_spaces(self):
+ self.assertMarkdownRenders(
+ ' <p>A *raw* paragraph.</p>',
+ '<p>A *raw* paragraph.</p>'
+ )
+
+ def test_raw_indent_three_spaces(self):
+ self.assertMarkdownRenders(
+ ' <p>A *raw* paragraph.</p>',
+ '<p>A *raw* paragraph.</p>'
+ )
+
+ def test_raw_indent_four_spaces(self):
+ self.assertMarkdownRenders(
+ ' <p>code block</p>',
+ self.dedent(
+ """
+ <pre><code>&lt;p&gt;code block&lt;/p&gt;
+ </code></pre>
+ """
+ )
+ )
+
+ def test_raw_span(self):
+ self.assertMarkdownRenders(
+ '<span>*inline*</span>',
+ '<p><span><em>inline</em></span></p>'
+ )
+
+ def test_code_span(self):
+ self.assertMarkdownRenders(
+ '`<p>code span</p>`',
+ '<p><code>&lt;p&gt;code span&lt;/p&gt;</code></p>'
+ )
+
+ def test_code_span_open_gt(self):
+ self.assertMarkdownRenders(
+ '*bar* `<` *foo*',
+ '<p><em>bar</em> <code>&lt;</code> <em>foo</em></p>'
+ )
+
+ def test_raw_empty(self):
+ self.assertMarkdownRenders(
+ '<p></p>',
+ '<p></p>'
+ )
+
+ def test_raw_empty_space(self):
+ self.assertMarkdownRenders(
+ '<p> </p>',
+ '<p> </p>'
+ )
+
+ def test_raw_empty_newline(self):
+ self.assertMarkdownRenders(
+ '<p>\n</p>',
+ '<p>\n</p>'
+ )
+
+ def test_raw_empty_blank_line(self):
+ self.assertMarkdownRenders(
+ '<p>\n\n</p>',
+ '<p>\n\n</p>'
+ )
+
+ def test_raw_uppercase(self):
+ self.assertMarkdownRenders(
+ '<DIV>*foo*</DIV>',
+ '<DIV>*foo*</DIV>'
+ )
+
+ def test_raw_uppercase_multiline(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <DIV>
+ *foo*
+ </DIV>
+ """
+ ),
+ self.dedent(
+ """
+ <DIV>
+ *foo*
+ </DIV>
+ """
+ )
+ )
+
+ def test_multiple_raw_single_line(self):
+ self.assertMarkdownRenders(
+ '<p>*foo*</p><div>*bar*</div>',
+ self.dedent(
+ """
+ <p>*foo*</p>
+ <div>*bar*</div>
+ """
+ )
+ )
+
+ def test_multiple_raw_single_line_with_pi(self):
+ self.assertMarkdownRenders(
+ "<p>*foo*</p><?php echo '>'; ?>",
+ self.dedent(
+ """
+ <p>*foo*</p>
+ <?php echo '>'; ?>
+ """
+ )
+ )
+
+ def test_multiline_raw(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <p>
+ A raw paragraph
+ with multiple lines.
+ </p>
+ """
+ ),
+ self.dedent(
+ """
+ <p>
+ A raw paragraph
+ with multiple lines.
+ </p>
+ """
+ )
+ )
+
+ def test_blank_lines_in_raw(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <p>
+
+ A raw paragraph...
+
+ with many blank lines.
+
+ </p>
+ """
+ ),
+ self.dedent(
+ """
+ <p>
+
+ A raw paragraph...
+
+ with many blank lines.
+
+ </p>
+ """
+ )
+ )
+
+ def test_raw_surrounded_by_Markdown(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ Some *Markdown* text.
+
+ <p>*Raw* HTML.</p>
+
+ More *Markdown* text.
+ """
+ ),
+ self.dedent(
+ """
+ <p>Some <em>Markdown</em> text.</p>
+ <p>*Raw* HTML.</p>
+
+ <p>More <em>Markdown</em> text.</p>
+ """
+ )
+ )
+
+ def test_raw_surrounded_by_text_without_blank_lines(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ Some *Markdown* text.
+ <p>*Raw* HTML.</p>
+ More *Markdown* text.
+ """
+ ),
+ self.dedent(
+ """
+ <p>Some <em>Markdown</em> text.</p>
+ <p>*Raw* HTML.</p>
+ <p>More <em>Markdown</em> text.</p>
+ """
+ )
+ )
+
+ def test_multiline_markdown_with_code_span(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ A paragraph with a block-level
+ `<p>code span</p>`, which is
+ at the start of a line.
+ """
+ ),
+ self.dedent(
+ """
+ <p>A paragraph with a block-level
+ <code>&lt;p&gt;code span&lt;/p&gt;</code>, which is
+ at the start of a line.</p>
+ """
+ )
+ )
+
+ def test_raw_block_preceded_by_markdown_code_span_with_unclosed_block_tag(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ A paragraph with a block-level code span: `<div>`.
+
+ <p>*not markdown*</p>
+
+ This is *markdown*
+ """
+ ),
+ self.dedent(
+ """
+ <p>A paragraph with a block-level code span: <code>&lt;div&gt;</code>.</p>
+ <p>*not markdown*</p>
+
+ <p>This is <em>markdown</em></p>
+ """
+ )
+ )
+
+ def test_raw_one_line_followed_by_text(self):
+ self.assertMarkdownRenders(
+ '<p>*foo*</p>*bar*',
+ self.dedent(
+ """
+ <p>*foo*</p>
+ <p><em>bar</em></p>
+ """
+ )
+ )
+
+ def test_raw_one_line_followed_by_span(self):
+ self.assertMarkdownRenders(
+ "<p>*foo*</p><span>*bar*</span>",
+ self.dedent(
+ """
+ <p>*foo*</p>
+ <p><span><em>bar</em></span></p>
+ """
+ )
+ )
+
+ def test_raw_with_markdown_blocks(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div>
+ Not a Markdown paragraph.
+
+ * Not a list item.
+ * Another non-list item.
+
+ Another non-Markdown paragraph.
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ Not a Markdown paragraph.
+
+ * Not a list item.
+ * Another non-list item.
+
+ Another non-Markdown paragraph.
+ </div>
+ """
+ )
+ )
+
+ def test_adjacent_raw_blocks(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <p>A raw paragraph.</p>
+ <p>A second raw paragraph.</p>
+ """
+ ),
+ self.dedent(
+ """
+ <p>A raw paragraph.</p>
+ <p>A second raw paragraph.</p>
+ """
+ )
+ )
+
+ def test_adjacent_raw_blocks_with_blank_lines(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <p>A raw paragraph.</p>
+
+ <p>A second raw paragraph.</p>
+ """
+ ),
+ self.dedent(
+ """
+ <p>A raw paragraph.</p>
+
+ <p>A second raw paragraph.</p>
+ """
+ )
+ )
+
+ def test_nested_raw_one_line(self):
+ self.assertMarkdownRenders(
+ '<div><p>*foo*</p></div>',
+ '<div><p>*foo*</p></div>'
+ )
+
+ def test_nested_raw_block(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div>
+ <p>A raw paragraph.</p>
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p>A raw paragraph.</p>
+ </div>
+ """
+ )
+ )
+
+ def test_nested_indented_raw_block(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div>
+ <p>A raw paragraph.</p>
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p>A raw paragraph.</p>
+ </div>
+ """
+ )
+ )
+
+ def test_nested_raw_blocks(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div>
+ <p>A raw paragraph.</p>
+ <p>A second raw paragraph.</p>
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p>A raw paragraph.</p>
+ <p>A second raw paragraph.</p>
+ </div>
+ """
+ )
+ )
+
+ def test_nested_raw_blocks_with_blank_lines(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div>
+
+ <p>A raw paragraph.</p>
+
+ <p>A second raw paragraph.</p>
+
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+
+ <p>A raw paragraph.</p>
+
+ <p>A second raw paragraph.</p>
+
+ </div>
+ """
+ )
+ )
+
+ def test_nested_inline_one_line(self):
+ self.assertMarkdownRenders(
+ '<p><em>foo</em><br></p>',
+ '<p><em>foo</em><br></p>'
+ )
+
+ def test_raw_nested_inline(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div>
+ <p>
+ <span>*text*</span>
+ </p>
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p>
+ <span>*text*</span>
+ </p>
+ </div>
+ """
+ )
+ )
+
+ def test_raw_nested_inline_with_blank_lines(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div>
+
+ <p>
+
+ <span>*text*</span>
+
+ </p>
+
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+
+ <p>
+
+ <span>*text*</span>
+
+ </p>
+
+ </div>
+ """
+ )
+ )
+
+ def test_raw_html5(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <section>
+ <header>
+ <hgroup>
+ <h1>Hello :-)</h1>
+ </hgroup>
+ </header>
+ <figure>
+ <img src="image.png" alt="" />
+ <figcaption>Caption</figcaption>
+ </figure>
+ <footer>
+ <p>Some footer</p>
+ </footer>
+ </section>
+ """
+ ),
+ self.dedent(
+ """
+ <section>
+ <header>
+ <hgroup>
+ <h1>Hello :-)</h1>
+ </hgroup>
+ </header>
+ <figure>
+ <img src="image.png" alt="" />
+ <figcaption>Caption</figcaption>
+ </figure>
+ <footer>
+ <p>Some footer</p>
+ </footer>
+ </section>
+ """
+ )
+ )
+
+ def test_raw_pre_tag(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ Preserve whitespace in raw html
+
+ <pre>
+ class Foo():
+ bar = 'bar'
+
+ @property
+ def baz(self):
+ return self.bar
+ </pre>
+ """
+ ),
+ self.dedent(
+ """
+ <p>Preserve whitespace in raw html</p>
+ <pre>
+ class Foo():
+ bar = 'bar'
+
+ @property
+ def baz(self):
+ return self.bar
+ </pre>
+ """
+ )
+ )
+
+ def test_raw_pre_tag_nested_escaped_html(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <pre>
+ &lt;p&gt;foo&lt;/p&gt;
+ </pre>
+ """
+ ),
+ self.dedent(
+ """
+ <pre>
+ &lt;p&gt;foo&lt;/p&gt;
+ </pre>
+ """
+ )
+ )
+
+ def test_raw_p_no_end_tag(self):
+ self.assertMarkdownRenders(
+ '<p>*text*',
+ '<p>*text*'
+ )
+
+ def test_raw_multiple_p_no_end_tag(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <p>*text*'
+
+ <p>more *text*
+ """
+ ),
+ self.dedent(
+ """
+ <p>*text*'
+
+ <p>more *text*
+ """
+ )
+ )
+
+ def test_raw_p_no_end_tag_followed_by_blank_line(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <p>*raw text*'
+
+ Still part of *raw* text.
+ """
+ ),
+ self.dedent(
+ """
+ <p>*raw text*'
+
+ Still part of *raw* text.
+ """
+ )
+ )
+
+ def test_raw_nested_p_no_end_tag(self):
+ self.assertMarkdownRenders(
+ '<div><p>*text*</div>',
+ '<div><p>*text*</div>'
+ )
+
+ def test_raw_open_bracket_only(self):
+ self.assertMarkdownRenders(
+ '<',
+ '<p>&lt;</p>'
+ )
+
+ def test_raw_open_bracket_followed_by_space(self):
+ self.assertMarkdownRenders(
+ '< foo',
+ '<p>&lt; foo</p>'
+ )
+
+ def test_raw_missing_close_bracket(self):
+ self.assertMarkdownRenders(
+ '<foo',
+ '<p>&lt;foo</p>'
+ )
+
+ def test_raw_attributes(self):
+ self.assertMarkdownRenders(
+ '<p id="foo", class="bar baz", style="margin: 15px; line-height: 1.5; text-align: center;">text</p>',
+ '<p id="foo", class="bar baz", style="margin: 15px; line-height: 1.5; text-align: center;">text</p>'
+ )
+
+ def test_raw_attributes_nested(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div id="foo, class="bar", style="background: #ffe7e8; border: 2px solid #e66465;">
+ <p id="baz", style="margin: 15px; line-height: 1.5; text-align: center;">
+ <img scr="../foo.jpg" title="with 'quoted' text." valueless_attr weirdness="<i>foo</i>" />
+ </p>
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div id="foo, class="bar", style="background: #ffe7e8; border: 2px solid #e66465;">
+ <p id="baz", style="margin: 15px; line-height: 1.5; text-align: center;">
+ <img scr="../foo.jpg" title="with 'quoted' text." valueless_attr weirdness="<i>foo</i>" />
+ </p>
+ </div>
+ """
+ )
+ )
+
+ def test_raw_comment_one_line(self):
+ self.assertMarkdownRenders(
+ '<!-- *foo* -->',
+ '<!-- *foo* -->'
+ )
+
+ def test_raw_comment_one_line_with_tag(self):
+ self.assertMarkdownRenders(
+ '<!-- <tag> -->',
+ '<!-- <tag> -->'
+ )
+
+ def test_comment_in_code_span(self):
+ self.assertMarkdownRenders(
+ '`<!-- *foo* -->`',
+ '<p><code>&lt;!-- *foo* --&gt;</code></p>'
+ )
+
+ def test_raw_comment_one_line_followed_by_text(self):
+ self.assertMarkdownRenders(
+ '<!-- *foo* -->*bar*',
+ self.dedent(
+ """
+ <!-- *foo* -->
+ <p><em>bar</em></p>
+ """
+ )
+ )
+
+ def test_raw_comment_one_line_followed_by_html(self):
+ self.assertMarkdownRenders(
+ '<!-- *foo* --><p>*bar*</p>',
+ self.dedent(
+ """
+ <!-- *foo* -->
+ <p>*bar*</p>
+ """
+ )
+ )
+
+ # Note: Trailing (insignificant) whitespace is not preserved, which does not match the
+ # reference implementation. However, it is not a change in behavior for Python-Markdown.
+ def test_raw_comment_trailing_whitespace(self):
+ self.assertMarkdownRenders(
+ '<!-- *foo* --> ',
+ '<!-- *foo* -->'
+ )
+
+ # Note: this is a change in behavior for Python-Markdown, which does *not* match the reference
+ # implementation. However, it does match the HTML5 spec. Declarations must start with either
+ # `<!DOCTYPE` or `<![`. Anything else that starts with `<!` is a comment. According to the
+ # HTML5 spec, a comment without the hyphens is a "bogus comment", but a comment nonetheless.
+ # See https://www.w3.org/TR/html52/syntax.html#markup-declaration-open-state.
+ # If we wanted to change this behavior, we could override `HTMLParser.parse_bogus_comment()`.
+ def test_bogus_comment(self):
+ self.assertMarkdownRenders(
+ '<!*foo*>',
+ '<!--*foo*-->'
+ )
+
+ def test_raw_multiline_comment(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!--
+ *foo*
+ -->
+ """
+ ),
+ self.dedent(
+ """
+ <!--
+ *foo*
+ -->
+ """
+ )
+ )
+
+ def test_raw_multiline_comment_with_tag(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!--
+ <tag>
+ -->
+ """
+ ),
+ self.dedent(
+ """
+ <!--
+ <tag>
+ -->
+ """
+ )
+ )
+
+ def test_raw_multiline_comment_first_line(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!-- *foo*
+ -->
+ """
+ ),
+ self.dedent(
+ """
+ <!-- *foo*
+ -->
+ """
+ )
+ )
+
+ def test_raw_multiline_comment_last_line(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!--
+ *foo* -->
+ """
+ ),
+ self.dedent(
+ """
+ <!--
+ *foo* -->
+ """
+ )
+ )
+
+ def test_raw_comment_with_blank_lines(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!--
+
+ *foo*
+
+ -->
+ """
+ ),
+ self.dedent(
+ """
+ <!--
+
+ *foo*
+
+ -->
+ """
+ )
+ )
+
+ def test_raw_comment_with_blank_lines_with_tag(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!--
+
+ <tag>
+
+ -->
+ """
+ ),
+ self.dedent(
+ """
+ <!--
+
+ <tag>
+
+ -->
+ """
+ )
+ )
+
+ def test_raw_comment_with_blank_lines_first_line(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!-- *foo*
+
+ -->
+ """
+ ),
+ self.dedent(
+ """
+ <!-- *foo*
+
+ -->
+ """
+ )
+ )
+
+ def test_raw_comment_with_blank_lines_last_line(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!--
+
+ *foo* -->
+ """
+ ),
+ self.dedent(
+ """
+ <!--
+
+ *foo* -->
+ """
+ )
+ )
+
+ def test_raw_comment_indented(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!--
+
+ *foo*
+
+ -->
+ """
+ ),
+ self.dedent(
+ """
+ <!--
+
+ *foo*
+
+ -->
+ """
+ )
+ )
+
+ def test_raw_comment_indented_with_tag(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!--
+
+ <tag>
+
+ -->
+ """
+ ),
+ self.dedent(
+ """
+ <!--
+
+ <tag>
+
+ -->
+ """
+ )
+ )
+
+ def test_raw_comment_nested(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div>
+ <!-- *foo* -->
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <!-- *foo* -->
+ </div>
+ """
+ )
+ )
+
+ def test_comment_in_code_block(self):
+ self.assertMarkdownRenders(
+ ' <!-- *foo* -->',
+ self.dedent(
+ """
+ <pre><code>&lt;!-- *foo* --&gt;
+ </code></pre>
+ """
+ )
+ )
+
+ # Note: This is a change in behavior. Previously, Python-Markdown interpreted this in the same manner
+ # as browsers and all text after the opening comment tag was considered to be in a comment. However,
+ # that did not match the reference implementation. The new behavior does.
+ def test_unclosed_comment_(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!-- unclosed comment
+
+ *not* a comment
+ """
+ ),
+ self.dedent(
+ """
+ <p>&lt;!-- unclosed comment</p>
+ <p><em>not</em> a comment</p>
+ """
+ )
+ )
+
+ def test_raw_processing_instruction_one_line(self):
+ self.assertMarkdownRenders(
+ "<?php echo '>'; ?>",
+ "<?php echo '>'; ?>"
+ )
+
+ # This is a change in behavior and does not match the reference implementation.
+ # We have no way to determine if text is on the same line, so we get this. TODO: reevaluate!
+ def test_raw_processing_instruction_one_line_followed_by_text(self):
+ self.assertMarkdownRenders(
+ "<?php echo '>'; ?>*bar*",
+ self.dedent(
+ """
+ <?php echo '>'; ?>
+ <p><em>bar</em></p>
+ """
+ )
+ )
+
+ def test_raw_multiline_processing_instruction(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <?php
+ echo '>';
+ ?>
+ """
+ ),
+ self.dedent(
+ """
+ <?php
+ echo '>';
+ ?>
+ """
+ )
+ )
+
+ def test_raw_processing_instruction_with_blank_lines(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <?php
+
+ echo '>';
+
+ ?>
+ """
+ ),
+ self.dedent(
+ """
+ <?php
+
+ echo '>';
+
+ ?>
+ """
+ )
+ )
+
+ def test_raw_processing_instruction_indented(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <?php
+
+ echo '>';
+
+ ?>
+ """
+ ),
+ self.dedent(
+ """
+ <?php
+
+ echo '>';
+
+ ?>
+ """
+ )
+ )
+
+ def test_raw_declaration_one_line(self):
+ self.assertMarkdownRenders(
+ '<!DOCTYPE html>',
+ '<!DOCTYPE html>'
+ )
+
+ # This is a change in behavior and does not match the reference implementation.
+ # We have no way to determine if text is on the same line, so we get this. TODO: reevaluate!
+ def test_raw_declaration_one_line_followed_by_text(self):
+ self.assertMarkdownRenders(
+ '<!DOCTYPE html>*bar*',
+ self.dedent(
+ """
+ <!DOCTYPE html>
+ <p><em>bar</em></p>
+ """
+ )
+ )
+
+ def test_raw_multiline_declaration(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <!DOCTYPE html PUBLIC
+ "-//W3C//DTD XHTML 1.1//EN"
+ "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
+ """
+ ),
+ self.dedent(
+ """
+ <!DOCTYPE html PUBLIC
+ "-//W3C//DTD XHTML 1.1//EN"
+ "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
+ """
+ )
+ )
+
+ def test_raw_cdata_one_line(self):
+ self.assertMarkdownRenders(
+ '<![CDATA[ document.write(">"); ]]>',
+ '<![CDATA[ document.write(">"); ]]>'
+ )
+
+ # Note: this is a change. Neither previous output nor this match reference implementation.
+ def test_raw_cdata_one_line_followed_by_text(self):
+ self.assertMarkdownRenders(
+ '<![CDATA[ document.write(">"); ]]>*bar*',
+ self.dedent(
+ """
+ <![CDATA[ document.write(">"); ]]>
+ <p><em>bar</em></p>
+ """
+ )
+ )
+
+ def test_raw_multiline_cdata(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <![CDATA[
+ document.write(">");
+ ]]>
+ """
+ ),
+ self.dedent(
+ """
+ <![CDATA[
+ document.write(">");
+ ]]>
+ """
+ )
+ )
+
+ def test_raw_cdata_with_blank_lines(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <![CDATA[
+
+ document.write(">");
+
+ ]]>
+ """
+ ),
+ self.dedent(
+ """
+ <![CDATA[
+
+ document.write(">");
+
+ ]]>
+ """
+ )
+ )
+
+ def test_raw_cdata_indented(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <![CDATA[
+
+ document.write(">");
+
+ ]]>
+ """
+ ),
+ self.dedent(
+ """
+ <![CDATA[
+
+ document.write(">");
+
+ ]]>
+ """
+ )
+ )
+
+ def test_charref(self):
+ self.assertMarkdownRenders(
+ '&sect;',
+ '<p>&sect;</p>'
+ )
+
+ def test_nested_charref(self):
+ self.assertMarkdownRenders(
+ '<p>&sect;</p>',
+ '<p>&sect;</p>'
+ )
+
+ def test_entityref(self):
+ self.assertMarkdownRenders(
+ '&#167;',
+ '<p>&#167;</p>'
+ )
+
+ def test_nested_entityref(self):
+ self.assertMarkdownRenders(
+ '<p>&#167;</p>',
+ '<p>&#167;</p>'
+ )
+
+ def test_amperstand(self):
+ self.assertMarkdownRenders(
+ 'AT&T & AT&amp;T',
+ '<p>AT&amp;T &amp; AT&amp;T</p>'
+ )
+
+ def test_startendtag(self):
+ self.assertMarkdownRenders(
+ '<hr>',
+ '<hr>'
+ )
+
+ def test_startendtag_with_attrs(self):
+ self.assertMarkdownRenders(
+ '<hr id="foo" class="bar">',
+ '<hr id="foo" class="bar">'
+ )
+
+ def test_startendtag_with_space(self):
+ self.assertMarkdownRenders(
+ '<hr >',
+ '<hr >'
+ )
+
+ def test_closed_startendtag(self):
+ self.assertMarkdownRenders(
+ '<hr />',
+ '<hr />'
+ )
+
+ def test_closed_startendtag_without_space(self):
+ self.assertMarkdownRenders(
+ '<hr/>',
+ '<hr/>'
+ )
+
+ def test_closed_startendtag_with_attrs(self):
+ self.assertMarkdownRenders(
+ '<hr id="foo" class="bar" />',
+ '<hr id="foo" class="bar" />'
+ )
+
+ def test_nested_startendtag(self):
+ self.assertMarkdownRenders(
+ '<div><hr></div>',
+ '<div><hr></div>'
+ )
+
+ def test_nested_closed_startendtag(self):
+ self.assertMarkdownRenders(
+ '<div><hr /></div>',
+ '<div><hr /></div>'
+ )
+
+ def test_auto_links_dont_break_parser(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <https://example.com>
+
+ <email@example.com>
+ """
+ ),
+ '<p><a href="https://example.com">https://example.com</a></p>\n'
+ '<p><a href="&#109;&#97;&#105;&#108;&#116;&#111;&#58;&#101;&#109;'
+ '&#97;&#105;&#108;&#64;&#101;&#120;&#97;&#109;&#112;&#108;&#101;'
+ '&#46;&#99;&#111;&#109;">&#101;&#109;&#97;&#105;&#108;&#64;&#101;'
+ '&#120;&#97;&#109;&#112;&#108;&#101;&#46;&#99;&#111;&#109;</a></p>'
+ )
+
+ def test_text_links_ignored(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ https://example.com
+
+ email@example.com
+ """
+ ),
+ self.dedent(
+ """
+ <p>https://example.com</p>
+ <p>email@example.com</p>
+ """
+ ),
+ )
+
+ def text_invalid_tags(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <some [weird](http://example.com) stuff>
+
+ <some>> <<unbalanced>> <<brackets>
+ """
+ ),
+ self.dedent(
+ """
+ <p><some <a href="http://example.com">weird</a> stuff></p>
+ <p><some>&gt; &lt;<unbalanced>&gt; &lt;<brackets></p>
+ """
+ )
+ )
diff --git a/tests/test_syntax/extensions/test_abbr.py b/tests/test_syntax/extensions/test_abbr.py
new file mode 100644
index 0000000..64388c2
--- /dev/null
+++ b/tests/test_syntax/extensions/test_abbr.py
@@ -0,0 +1,242 @@
+# -*- coding: utf-8 -*-
+"""
+Python Markdown
+
+A Python implementation of John Gruber's Markdown.
+
+Documentation: https://python-markdown.github.io/
+GitHub: https://github.com/Python-Markdown/markdown/
+PyPI: https://pypi.org/project/Markdown/
+
+Started by Manfred Stienstra (http://www.dwerg.net/).
+Maintained for a few years by Yuri Takhteyev (http://www.freewisdom.org).
+Currently maintained by Waylan Limberg (https://github.com/waylan),
+Dmitry Shachnev (https://github.com/mitya57) and Isaac Muse (https://github.com/facelessuser).
+
+Copyright 2007-2018 The Python Markdown Project (v. 1.7 and later)
+Copyright 2004, 2005, 2006 Yuri Takhteyev (v. 0.2-1.6b)
+Copyright 2004 Manfred Stienstra (the original version)
+
+License: BSD (see LICENSE.md for details).
+"""
+
+from markdown.test_tools import TestCase
+
+
+class TestAbbr(TestCase):
+
+ default_kwargs = {'extensions': ['abbr']}
+
+ def test_abbr_upper(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ ABBR
+
+ *[ABBR]: Abbreviation
+ """
+ ),
+ self.dedent(
+ """
+ <p><abbr title="Abbreviation">ABBR</abbr></p>
+ """
+ )
+ )
+
+ def test_abbr_lower(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ abbr
+
+ *[abbr]: Abbreviation
+ """
+ ),
+ self.dedent(
+ """
+ <p><abbr title="Abbreviation">abbr</abbr></p>
+ """
+ )
+ )
+
+ def test_abbr_multiple(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ The HTML specification
+ is maintained by the W3C.
+
+ *[HTML]: Hyper Text Markup Language
+ *[W3C]: World Wide Web Consortium
+ """
+ ),
+ self.dedent(
+ """
+ <p>The <abbr title="Hyper Text Markup Language">HTML</abbr> specification
+ is maintained by the <abbr title="World Wide Web Consortium">W3C</abbr>.</p>
+ """
+ )
+ )
+
+ def test_abbr_override(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ ABBR
+
+ *[ABBR]: Ignored
+ *[ABBR]: The override
+ """
+ ),
+ self.dedent(
+ """
+ <p><abbr title="The override">ABBR</abbr></p>
+ """
+ )
+ )
+
+ def test_abbr_no_blank_Lines(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ ABBR
+ *[ABBR]: Abbreviation
+ ABBR
+ """
+ ),
+ self.dedent(
+ """
+ <p><abbr title="Abbreviation">ABBR</abbr></p>
+ <p><abbr title="Abbreviation">ABBR</abbr></p>
+ """
+ )
+ )
+
+ def test_abbr_no_space(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ ABBR
+
+ *[ABBR]:Abbreviation
+ """
+ ),
+ self.dedent(
+ """
+ <p><abbr title="Abbreviation">ABBR</abbr></p>
+ """
+ )
+ )
+
+ def test_abbr_extra_space(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ ABBR
+
+ *[ABBR] : Abbreviation
+ """
+ ),
+ self.dedent(
+ """
+ <p><abbr title="Abbreviation">ABBR</abbr></p>
+ """
+ )
+ )
+
+ def test_abbr_line_break(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ ABBR
+
+ *[ABBR]:
+ Abbreviation
+ """
+ ),
+ self.dedent(
+ """
+ <p><abbr title="Abbreviation">ABBR</abbr></p>
+ """
+ )
+ )
+
+ def test_abbr_ignore_unmatched_case(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ ABBR abbr
+
+ *[ABBR]: Abbreviation
+ """
+ ),
+ self.dedent(
+ """
+ <p><abbr title="Abbreviation">ABBR</abbr> abbr</p>
+ """
+ )
+ )
+
+ def test_abbr_partial_word(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ ABBR ABBREVIATION
+
+ *[ABBR]: Abbreviation
+ """
+ ),
+ self.dedent(
+ """
+ <p><abbr title="Abbreviation">ABBR</abbr> ABBREVIATION</p>
+ """
+ )
+ )
+
+ def test_abbr_unused(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ foo bar
+
+ *[ABBR]: Abbreviation
+ """
+ ),
+ self.dedent(
+ """
+ <p>foo bar</p>
+ """
+ )
+ )
+
+ def test_abbr_double_quoted(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ ABBR
+
+ *[ABBR]: "Abbreviation"
+ """
+ ),
+ self.dedent(
+ """
+ <p><abbr title="&quot;Abbreviation&quot;">ABBR</abbr></p>
+ """
+ )
+ )
+
+ def test_abbr_single_quoted(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ ABBR
+
+ *[ABBR]: 'Abbreviation'
+ """
+ ),
+ self.dedent(
+ """
+ <p><abbr title="'Abbreviation'">ABBR</abbr></p>
+ """
+ )
+ )
diff --git a/tests/test_syntax/extensions/test_footnotes.py b/tests/test_syntax/extensions/test_footnotes.py
index 7785a2b..1a3a2b0 100644
--- a/tests/test_syntax/extensions/test_footnotes.py
+++ b/tests/test_syntax/extensions/test_footnotes.py
@@ -24,6 +24,247 @@ from markdown.test_tools import TestCase
class TestFootnotes(TestCase):
+ default_kwargs = {'extensions': ['footnotes']}
+ maxDiff = None
+
+ def test_basic_footnote(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ paragraph[^1]
+
+ [^1]: A Footnote
+ """
+ ),
+ '<p>paragraph<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>\n'
+ '<div class="footnote">\n'
+ '<hr />\n'
+ '<ol>\n'
+ '<li id="fn:1">\n'
+ '<p>A Footnote&#160;<a class="footnote-backref" href="#fnref:1"'
+ ' title="Jump back to footnote 1 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '</ol>\n'
+ '</div>'
+ )
+
+ def test_multiple_footnotes(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ foo[^1]
+
+ bar[^2]
+
+ [^1]: Footnote 1
+ [^2]: Footnote 2
+ """
+ ),
+ '<p>foo<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>\n'
+ '<p>bar<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup></p>\n'
+ '<div class="footnote">\n'
+ '<hr />\n'
+ '<ol>\n'
+ '<li id="fn:1">\n'
+ '<p>Footnote 1&#160;<a class="footnote-backref" href="#fnref:1"'
+ ' title="Jump back to footnote 1 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '<li id="fn:2">\n'
+ '<p>Footnote 2&#160;<a class="footnote-backref" href="#fnref:2"'
+ ' title="Jump back to footnote 2 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '</ol>\n'
+ '</div>'
+ )
+
+ def test_multiple_footnotes_multiline(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ foo[^1]
+
+ bar[^2]
+
+ [^1]: Footnote 1
+ line 2
+ [^2]: Footnote 2
+ """
+ ),
+ '<p>foo<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>\n'
+ '<p>bar<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup></p>\n'
+ '<div class="footnote">\n'
+ '<hr />\n'
+ '<ol>\n'
+ '<li id="fn:1">\n'
+ '<p>Footnote 1\nline 2&#160;<a class="footnote-backref" href="#fnref:1"'
+ ' title="Jump back to footnote 1 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '<li id="fn:2">\n'
+ '<p>Footnote 2&#160;<a class="footnote-backref" href="#fnref:2"'
+ ' title="Jump back to footnote 2 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '</ol>\n'
+ '</div>'
+ )
+
+ def test_footnote_multi_line(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ paragraph[^1]
+ [^1]: A Footnote
+ line 2
+ """
+ ),
+ '<p>paragraph<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>\n'
+ '<div class="footnote">\n'
+ '<hr />\n'
+ '<ol>\n'
+ '<li id="fn:1">\n'
+ '<p>A Footnote\nline 2&#160;<a class="footnote-backref" href="#fnref:1"'
+ ' title="Jump back to footnote 1 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '</ol>\n'
+ '</div>'
+ )
+
+ def test_footnote_multi_line_lazy_indent(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ paragraph[^1]
+ [^1]: A Footnote
+ line 2
+ """
+ ),
+ '<p>paragraph<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>\n'
+ '<div class="footnote">\n'
+ '<hr />\n'
+ '<ol>\n'
+ '<li id="fn:1">\n'
+ '<p>A Footnote\nline 2&#160;<a class="footnote-backref" href="#fnref:1"'
+ ' title="Jump back to footnote 1 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '</ol>\n'
+ '</div>'
+ )
+
+ def test_footnote_multi_line_complex(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ paragraph[^1]
+
+ [^1]:
+
+ A Footnote
+ line 2
+
+ * list item
+
+ > blockquote
+ """
+ ),
+ '<p>paragraph<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>\n'
+ '<div class="footnote">\n'
+ '<hr />\n'
+ '<ol>\n'
+ '<li id="fn:1">\n'
+ '<p>A Footnote\nline 2</p>\n'
+ '<ul>\n<li>list item</li>\n</ul>\n'
+ '<blockquote>\n<p>blockquote</p>\n</blockquote>\n'
+ '<p><a class="footnote-backref" href="#fnref:1"'
+ ' title="Jump back to footnote 1 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '</ol>\n'
+ '</div>'
+ )
+
+ def test_footnote_multple_complex(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ foo[^1]
+
+ bar[^2]
+
+ [^1]:
+
+ A Footnote
+ line 2
+
+ * list item
+
+ > blockquote
+
+ [^2]: Second footnote
+
+ paragraph 2
+ """
+ ),
+ '<p>foo<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>\n'
+ '<p>bar<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup></p>\n'
+ '<div class="footnote">\n'
+ '<hr />\n'
+ '<ol>\n'
+ '<li id="fn:1">\n'
+ '<p>A Footnote\nline 2</p>\n'
+ '<ul>\n<li>list item</li>\n</ul>\n'
+ '<blockquote>\n<p>blockquote</p>\n</blockquote>\n'
+ '<p><a class="footnote-backref" href="#fnref:1"'
+ ' title="Jump back to footnote 1 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '<li id="fn:2">\n'
+ '<p>Second footnote</p>\n'
+ '<p>paragraph 2&#160;<a class="footnote-backref" href="#fnref:2"'
+ ' title="Jump back to footnote 2 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '</ol>\n'
+ '</div>'
+ )
+
+ def test_footnote_multple_complex_no_blank_line_between(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ foo[^1]
+
+ bar[^2]
+
+ [^1]:
+
+ A Footnote
+ line 2
+
+ * list item
+
+ > blockquote
+ [^2]: Second footnote
+
+ paragraph 2
+ """
+ ),
+ '<p>foo<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>\n'
+ '<p>bar<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup></p>\n'
+ '<div class="footnote">\n'
+ '<hr />\n'
+ '<ol>\n'
+ '<li id="fn:1">\n'
+ '<p>A Footnote\nline 2</p>\n'
+ '<ul>\n<li>list item</li>\n</ul>\n'
+ '<blockquote>\n<p>blockquote</p>\n</blockquote>\n'
+ '<p><a class="footnote-backref" href="#fnref:1"'
+ ' title="Jump back to footnote 1 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '<li id="fn:2">\n'
+ '<p>Second footnote</p>\n'
+ '<p>paragraph 2&#160;<a class="footnote-backref" href="#fnref:2"'
+ ' title="Jump back to footnote 2 in the text">&#8617;</a></p>\n'
+ '</li>\n'
+ '</ol>\n'
+ '</div>'
+ )
+
def test_backlink_text(self):
"""Test backlink configuration."""
@@ -39,7 +280,6 @@ class TestFootnotes(TestCase):
'</li>\n'
'</ol>\n'
'</div>',
- extensions=['footnotes'],
extension_configs={'footnotes': {'BACKLINK_TEXT': 'back'}}
)
@@ -58,6 +298,5 @@ class TestFootnotes(TestCase):
'</li>\n'
'</ol>\n'
'</div>',
- extensions=['footnotes'],
extension_configs={'footnotes': {'SEPARATOR': '-'}}
)
diff --git a/tests/test_syntax/extensions/test_md_in_html.py b/tests/test_syntax/extensions/test_md_in_html.py
new file mode 100644
index 0000000..b68412c
--- /dev/null
+++ b/tests/test_syntax/extensions/test_md_in_html.py
@@ -0,0 +1,764 @@
+# -*- coding: utf-8 -*-
+"""
+Python Markdown
+
+A Python implementation of John Gruber's Markdown.
+
+Documentation: https://python-markdown.github.io/
+GitHub: https://github.com/Python-Markdown/markdown/
+PyPI: https://pypi.org/project/Markdown/
+
+Started by Manfred Stienstra (http://www.dwerg.net/).
+Maintained for a few years by Yuri Takhteyev (http://www.freewisdom.org).
+Currently maintained by Waylan Limberg (https://github.com/waylan),
+Dmitry Shachnev (https://github.com/mitya57) and Isaac Muse (https://github.com/facelessuser).
+
+Copyright 2007-2018 The Python Markdown Project (v. 1.7 and later)
+Copyright 2004, 2005, 2006 Yuri Takhteyev (v. 0.2-1.6b)
+Copyright 2004 Manfred Stienstra (the original version)
+
+License: BSD (see LICENSE.md for details).
+"""
+
+from unittest import TestSuite
+from markdown.test_tools import TestCase
+from ..blocks.test_html_blocks import TestHTMLBlocks
+
+
+class TestDefaultwMdInHTML(TestHTMLBlocks):
+ """ Ensure the md_in_html extension does not break the default behavior. """
+
+ default_kwargs = {'extensions': ['md_in_html']}
+
+
+class TestMdInHTML(TestCase):
+
+ default_kwargs = {'extensions': ['md_in_html']}
+
+ def test_md1_paragraph(self):
+ self.assertMarkdownRenders(
+ '<p markdown="1">*foo*</p>',
+ '<p><em>foo</em></p>'
+ )
+
+ def test_md1_p_linebreaks(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <p markdown="1">
+ *foo*
+ </p>
+ """
+ ),
+ self.dedent(
+ """
+ <p>
+ <em>foo</em>
+ </p>
+ """
+ )
+ )
+
+ def test_md1_p_blank_lines(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <p markdown="1">
+
+ *foo*
+
+ </p>
+ """
+ ),
+ self.dedent(
+ """
+ <p>
+
+ <em>foo</em>
+
+ </p>
+ """
+ )
+ )
+
+ def test_md1_div(self):
+ self.assertMarkdownRenders(
+ '<div markdown="1">*foo*</div>',
+ self.dedent(
+ """
+ <div>
+ <p><em>foo</em></p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_div_linebreaks(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+ *foo*
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p><em>foo</em></p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_div_blank_lines(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+
+ *foo*
+
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p><em>foo</em></p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_div_multi(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+
+ *foo*
+
+ __bar__
+
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p><em>foo</em></p>
+ <p><strong>bar</strong></p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_div_nested(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+
+ <div markdown="1">
+ *foo*
+ </div>
+
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <div>
+ <p><em>foo</em></p>
+ </div>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_div_multi_nest(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+
+ <div markdown="1">
+ <p markdown="1">*foo*</p>
+ </div>
+
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <div>
+ <p><em>foo</em></p>
+ </div>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_mix(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+ A _Markdown_ paragraph before a raw child.
+
+ <p markdown="1">A *raw* child.</p>
+
+ A _Markdown_ tail to the raw child.
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p>A <em>Markdown</em> paragraph before a raw child.</p>
+ <p>A <em>raw</em> child.</p>
+ <p>A <em>Markdown</em> tail to the raw child.</p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_deep_mix(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+
+ A _Markdown_ paragraph before a raw child.
+
+ A second Markdown paragraph
+ with two lines.
+
+ <div markdown="1">
+
+ A *raw* child.
+
+ <p markdown="1">*foo*</p>
+
+ Raw child tail.
+
+ </div>
+
+ A _Markdown_ tail to the raw child.
+
+ A second tail item
+ with two lines.
+
+ <p markdown="1">More raw.</p>
+
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p>A <em>Markdown</em> paragraph before a raw child.</p>
+ <p>A second Markdown paragraph
+ with two lines.</p>
+ <div>
+ <p>A <em>raw</em> child.</p>
+ <p><em>foo</em></p>
+ <p>Raw child tail.</p>
+ </div>
+ <p>A <em>Markdown</em> tail to the raw child.</p>
+ <p>A second tail item
+ with two lines.</p>
+ <p>More raw.</p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_div_raw_inline(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+
+ <em>foo</em>
+
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p><em>foo</em></p>
+ </div>
+ """
+ )
+ )
+
+ def test_no_md1_paragraph(self):
+ self.assertMarkdownRenders(
+ '<p>*foo*</p>',
+ '<p>*foo*</p>'
+ )
+
+ def test_no_md1_nest(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+ A _Markdown_ paragraph before a raw child.
+
+ <p>A *raw* child.</p>
+
+ A _Markdown_ tail to the raw child.
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p>A <em>Markdown</em> paragraph before a raw child.</p>
+ <p>A *raw* child.</p>
+ <p>A <em>Markdown</em> tail to the raw child.</p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_nested_empty(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+ A _Markdown_ paragraph before a raw empty tag.
+
+ <img src="image.png" alt="An image" />
+
+ A _Markdown_ tail to the raw empty tag.
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p>A <em>Markdown</em> paragraph before a raw empty tag.</p>
+ <p><img src="image.png" alt="An image" /></p>
+ <p>A <em>Markdown</em> tail to the raw empty tag.</p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_nested_empty_block(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+ A _Markdown_ paragraph before a raw empty tag.
+
+ <hr />
+
+ A _Markdown_ tail to the raw empty tag.
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p>A <em>Markdown</em> paragraph before a raw empty tag.</p>
+ <hr />
+ <p>A <em>Markdown</em> tail to the raw empty tag.</p>
+ </div>
+ """
+ )
+ )
+
+ def test_md_span_paragraph(self):
+ self.assertMarkdownRenders(
+ '<p markdown="span">*foo*</p>',
+ '<p><em>foo</em></p>'
+ )
+
+ def test_md_block_paragraph(self):
+ self.assertMarkdownRenders(
+ '<p markdown="block">*foo*</p>',
+ self.dedent(
+ """
+ <p>
+ <p><em>foo</em></p>
+ </p>
+ """
+ )
+ )
+
+ def test_md_span_div(self):
+ self.assertMarkdownRenders(
+ '<div markdown="span">*foo*</div>',
+ '<div><em>foo</em></div>'
+ )
+
+ def test_md_block_div(self):
+ self.assertMarkdownRenders(
+ '<div markdown="block">*foo*</div>',
+ self.dedent(
+ """
+ <div>
+ <p><em>foo</em></p>
+ </div>
+ """
+ )
+ )
+
+ def test_md_span_nested_in_block(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="block">
+ <div markdown="span">*foo*</div>
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <div><em>foo</em></div>
+ </div>
+ """
+ )
+ )
+
+ def test_md_block_nested_in_span(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="span">
+ <div markdown="block">*foo*</div>
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <div><em>foo</em></div>
+ </div>
+ """
+ )
+ )
+
+ def test_md_block_after_span_nested_in_block(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="block">
+ <div markdown="span">*foo*</div>
+ <div markdown="block">*bar*</div>
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <div><em>foo</em></div>
+ <div>
+ <p><em>bar</em></p>
+ </div>
+ </div>
+ """
+ )
+ )
+
+ def test_nomd_nested_in_md1(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+ *foo*
+ <div>
+ *foo*
+ <p>*bar*</p>
+ *baz*
+ </div>
+ *bar*
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p><em>foo</em></p>
+ <div>
+ *foo*
+ <p>*bar*</p>
+ *baz*
+ </div>
+ <p><em>bar</em></p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_nested_in_nomd(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div>
+ <div markdown="1">*foo*</div>
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <div markdown="1">*foo*</div>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_single_quotes(self):
+ self.assertMarkdownRenders(
+ "<p markdown='1'>*foo*</p>",
+ '<p><em>foo</em></p>'
+ )
+
+ def test_md1_no_quotes(self):
+ self.assertMarkdownRenders(
+ '<p markdown=1>*foo*</p>',
+ '<p><em>foo</em></p>'
+ )
+
+ def test_md_no_value(self):
+ self.assertMarkdownRenders(
+ '<p markdown>*foo*</p>',
+ '<p><em>foo</em></p>'
+ )
+
+ def test_md1_preserve_attrs(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1" id="parent">
+
+ <div markdown="1" class="foo">
+ <p markdown="1" class="bar baz">*foo*</p>
+ </div>
+
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div id="parent">
+ <div class="foo">
+ <p class="bar baz"><em>foo</em></p>
+ </div>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_unclosed_div(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+
+ _foo_
+
+ <div class="unclosed">
+
+ _bar_
+
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p><em>foo</em></p>
+ <div class="unclosed">
+
+ _bar_
+
+ </div>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_orphan_endtag(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+
+ _foo_
+
+ </p>
+
+ _bar_
+
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p><em>foo</em></p>
+ </p>
+ <p><em>bar</em></p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_unclosed_p(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <p markdown="1">_foo_
+ <p markdown="1">_bar_
+ """
+ ),
+ self.dedent(
+ """
+ <p><em>foo</em>
+ </p>
+ <p><em>bar</em>
+
+ </p>
+ """
+ )
+ )
+
+ def test_md1_nested_unclosed_p(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+ <p markdown="1">_foo_
+ <p markdown="1">_bar_
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p><em>foo</em>
+ </p>
+ <p><em>bar</em>
+ </p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_nested_comment(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+ A *Markdown* paragraph.
+ <!-- foobar -->
+ A *Markdown* paragraph.
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <p>A <em>Markdown</em> paragraph.</p>
+ <!-- foobar -->
+ <p>A <em>Markdown</em> paragraph.</p>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_nested_link_ref(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+ [link]: http://example.com
+ <div markdown="1">
+ [link][link]
+ </div>
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <div>
+ <p><a href="http://example.com">link</a></p>
+ </div>
+ </div>
+ """
+ )
+ )
+
+ def test_md1_nested_abbr_ref(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+ *[abbr]: Abbreviation
+ <div markdown="1">
+ abbr
+ </div>
+ </div>
+ """
+ ),
+ self.dedent(
+ """
+ <div>
+ <div>
+ <p><abbr title="Abbreviation">abbr</abbr></p>
+ </div>
+ </div>
+ """
+ ),
+ extensions=['md_in_html', 'abbr']
+ )
+
+ def test_md1_nested_footnote_ref(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ <div markdown="1">
+ [^1]: The footnote.
+ <div markdown="1">
+ Paragraph with a footnote.[^1]
+ </div>
+ </div>
+ """
+ ),
+ '<div>\n'
+ '<div>\n'
+ '<p>Paragraph with a footnote.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>\n'
+ '</div>\n'
+ '</div>\n'
+ '<div class="footnote">\n'
+ '<hr />\n'
+ '<ol>\n'
+ '<li id="fn:1">\n'
+ '<p>The footnote.&#160;'
+ '<a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">&#8617;</a>'
+ '</p>\n'
+ '</li>\n'
+ '</ol>\n'
+ '</div>',
+ extensions=['md_in_html', 'footnotes']
+ )
+
+
+def load_tests(loader, tests, pattern):
+ ''' Ensure TestHTMLBlocks doesn't get run twice by excluding it here. '''
+ suite = TestSuite()
+ for test_class in [TestDefaultwMdInHTML, TestMdInHTML]:
+ tests = loader.loadTestsFromTestCase(test_class)
+ suite.addTests(tests)
+ return suite
diff --git a/tests/test_syntax/inline/test_links.py b/tests/test_syntax/inline/test_links.py
index be4237d..7a3e1c3 100644
--- a/tests/test_syntax/inline/test_links.py
+++ b/tests/test_syntax/inline/test_links.py
@@ -22,7 +22,7 @@ License: BSD (see LICENSE.md for details).
from markdown.test_tools import TestCase
-class TestAdvancedLinks(TestCase):
+class TestInlineLinks(TestCase):
def test_nested_square_brackets(self):
self.assertMarkdownRenders(
@@ -134,6 +134,186 @@ class TestAdvancedLinks(TestCase):
'<p><a href="http://example.com/?a=1&#x26;b=2">title</a></p>'
)
+
+class TestReferenceLinks(TestCase):
+
+ def test_ref_link(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]: http://example.com
+ """
+ ),
+ """<p><a href="http://example.com">Text</a></p>"""
+ )
+
+ def test_ref_link_angle_brackets(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]: <http://example.com>
+ """
+ ),
+ """<p><a href="http://example.com">Text</a></p>"""
+ )
+
+ def test_ref_link_no_space(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]:http://example.com
+ """
+ ),
+ """<p><a href="http://example.com">Text</a></p>"""
+ )
+
+ def test_ref_link_angle_brackets_no_space(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]:<http://example.com>
+ """
+ ),
+ """<p><a href="http://example.com">Text</a></p>"""
+ )
+
+ def test_ref_link_angle_brackets_title(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]: <http://example.com> "title"
+ """
+ ),
+ """<p><a href="http://example.com" title="title">Text</a></p>"""
+ )
+
+ def test_ref_link_title(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]: http://example.com "title"
+ """
+ ),
+ """<p><a href="http://example.com" title="title">Text</a></p>"""
+ )
+
+ def test_ref_link_angle_brackets_title_no_space(self):
+ # TODO: Maybe reevaluate this?
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]: <http://example.com>"title"
+ """
+ ),
+ """<p><a href="http://example.com&gt;&quot;title&quot;">Text</a></p>"""
+ )
+
+ def test_ref_link_title_no_space(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]: http://example.com"title"
+ """
+ ),
+ """<p><a href="http://example.com&quot;title&quot;">Text</a></p>"""
+ )
+
+ def test_ref_link_single_quoted_title(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]: http://example.com 'title'
+ """
+ ),
+ """<p><a href="http://example.com" title="title">Text</a></p>"""
+ )
+
+ def test_ref_link_title_nested_quote(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]: http://example.com "title'"
+ """
+ ),
+ """<p><a href="http://example.com" title="title'">Text</a></p>"""
+ )
+
+ def test_ref_link_single_quoted_title_nested_quote(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]: http://example.com 'title"'
+ """
+ ),
+ """<p><a href="http://example.com" title="title&quot;">Text</a></p>"""
+ )
+
+ def test_ref_link_override(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]: http://example.com 'ignore'
+ [Text]: https://example.com 'override'
+ """
+ ),
+ """<p><a href="https://example.com" title="override">Text</a></p>"""
+ )
+
+ def test_ref_link_title_no_blank_lines(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+ [Text]: http://example.com "title"
+ [Text]
+ """
+ ),
+ self.dedent(
+ """
+ <p><a href="http://example.com" title="title">Text</a></p>
+ <p><a href="http://example.com" title="title">Text</a></p>
+ """
+ )
+ )
+
+ def test_ref_link_multi_line(self):
+ self.assertMarkdownRenders(
+ self.dedent(
+ """
+ [Text]
+
+ [Text]:
+ http://example.com
+ "title"
+ """
+ ),
+ """<p><a href="http://example.com" title="title">Text</a></p>"""
+ )
+
def test_reference_newlines(self):
"""Test reference id whitespace cleanup."""