1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
|
title: Library Reference
prev_title: Installation
prev_url: install.html
next_title: Command Line
next_url: cli.html
Using Markdown as a Python Library
==================================
First and foremost, Python-Markdown is intended to be a python library module
used by various projects to convert Markdown syntax into HTML.
The Basics
----------
To use markdown as a module:
import markdown
html = markdown.markdown(your_text_string)
The Details
-----------
Python-Markdown provides two public functions (`markdown.markdown` and
`markdown.markdownFromFile`) both of which wrap the public class
`markdown.Markdown`. If you're processing one document at a time, the
functions will serve your needs. However, if you need to process
multiple documents, it may be advantageous to create a single instance
of the `markdown.Markdown` class and pass multiple documents through it.
### `markdown.markdown(text [, **kwargs])`
The following options are available on the `markdown.markdown` function:
* __`text`__ (required): The source text string.
Note that Python-Markdown expects **Unicode** as input (although
a simple ASCII string may work) and returns output as Unicode.
Do not pass encoded strings to it! If your input is encoded, (e.g. as
UTF-8), it is your responsibility to decode it. For example:
input_file = codecs.open("some_file.txt", mode="r", encoding="utf-8")
text = input_file.read()
html = markdown.markdown(text)
If you want to write the output to disk, you must encode it yourself:
output_file = codecs.open("some_file.html", "w",
encoding="utf-8",
errors="xmlcharrefreplace"
)
output_file.write(html)
* __`extensions`__: A list of extensions.
Python-Markdown provides an API for third parties to write extensions to
the parser adding their own additions or changes to the syntax. A few
commonly used extensions are shipped with the markdown library. See
the [extension documentation](extensions/index.html) for a list of
available extensions.
The list of extensions may contain instances of extensions or stings of
extension names. If an extension name is provided as a string, the
extension must be importable as a python module either within the
`markdown.extensions` package or on your PYTHONPATH with a name starting
with `mdx_`, followed by the name of the extension. Thus,
`extensions=['extra']` will first look for the module
`markdown.extensions.extra`, then a module named `mdx_extra`.
* __`extension-configs`__: A dictionary of configuration settings for extensions.
The dictionary must be of the following format:
extension-configs = {'extension_name_1':
[
('option_1', 'value_1'),
('option_2', 'value_2')
],
'extension_name_2':
[
('option_1', 'value_1')
]
}
See the documentation specific to the extension you are using for help in
specifying configuration settings for that extension.
* __`output_format`__: Format of output.
Supported formats are:
* `"xhtml1"`: Outputs XHTML 1.x. **Default**.
* `"xhtml5"`: Outputs XHTML style tags of HTML 5
* `"xhtml"`: Outputs latest supported version of XHTML (currently XHTML 1.1).
* `"html4"`: Outputs HTML 4
* `"html5"`: Outputs HTML style tags of HTML 5
* `"html"`: Outputs latest supported version of HTML (currently HTML 4).
Note that it is suggested that the more specific formats ("xhtml1",
"html5", & "html4") be used as "xhtml" or "html" may change in the future
if it makes sense at that time. The values can either be lowercase or
uppercase.
* __`safe_mode`__: Disallow raw html.
If you are using Markdown on a web system which will transform text
provided by untrusted users, you may want to use the "safe_mode"
option which ensures that the user's HTML tags are either replaced,
removed or escaped. (They can still create links using Markdown syntax.)
The following values are accepted:
* `False` (Default): Raw HTML is passed through unaltered.
* `replace`: Replace all HTML blocks with the text assigned to
`html_replacement_text` To maintain backward compatibility, setting
`safe_mode=True` will have the same effect as `safe_mode='replace'`.
To replace raw HTML with something other than the default, do:
md = markdown.Markdown(safe_mode='replace',
html_replacement_text='--RAW HTML NOT ALLOWED--')
* `remove`: All raw HTML will be completely stripped from the text with
no warning to the author.
* `escape`: All raw HTML will be escaped and included in the document.
For example, the following source:
Foo <b>bar</b>.
Will result in the following HTML:
<p>Foo <b>bar</b>.</p>
Note that "safe_mode" does not alter the `enable_attributes` option, which
could allow someone to inject javascript (i.e., `{@onclick=alert(1)}`). You
may also want to set `enable_attributes=False` when using "safe_mode".
* __`html_replacement_text`__: Text used when safe_mode is set to `replace`.
Defaults to `[HTML_REMOVED]`.
* __`tab_length`__: Length of tabs in the source. Default: 4
* __`enable_attributes`__: Enable the conversion of attributes. Default: True
* __`smart_emphasis`__: Treat `_connected_words_` intelligently Default: True
* __`lazy_ol`__: Ignore number of first item of ordered lists. Default: True
Given the following list:
4. Apples
5. Oranges
6. Pears
By default markdown will ignore the fact the the first line started
with item number "4" and the HTML list will start with a number "1".
If `lazy_ol` is set to `True`, then markdown will output the following
HTML:
<ol>
<li start="4">Apples</li>
<li>Oranges</li>
<li>Pears</li>
</ol>
### `markdown.markdownFromFile(**kwargs)`
With a few exceptions, `markdown.markdownFromFile` accepts the same options as
`markdown.markdown`. It does **not** accept a `text` (or Unicode) string.
Instead, it accepts the following required options:
* __`input`__ (required): The source text file.
`input` may be set to one of three options:
* a string which contains a path to a readable file on the file system,
* a readable file-like object,
* or `None` (default) which will read from `stdin`.
* __`output`__: The target which output is written to.
`output` may be set to one of three options:
* a string which contains a path to a writable file on the file system,
* a writable file-like object,
* or `None` (default) which will write to `stdout`.
* __`encoding`__: The encoding of the source text file. Defaults to
"utf-8". The same encoding will always be used for input and output.
The 'xmlcharrefreplace' error handler is used when encoding the output.
**Note:** This is the only place that decoding and encoding of unicode
takes place in Python-Markdown. If this rather naive solution does not
meet your specific needs, it is suggested that you write your own code
to handle your encoding/decoding needs.
### `markdown.Markdown([**kwargs])`
The same options are available when initializing the `markdown.Markdown` class
as on the `markdown.markdown` function, except that the class does **not**
accept a source text string on initialization. Rather, the source text string
must be passed to one of two instance methods:
* `Markdown.convert(source)`
The `source` text must meet the same requirements as the `text` argument
of the `markdown.markdown` function.
You should also use this method if you want to process multiple strings
without creating a new instance of the class for each string.
md = markdown.Markdown()
html1 = md.convert(text1)
html2 = md.convert(text2)
Note that depending on which options and/or extensions are being used,
the parser may need its state reset between each call to `convert`.
html1 = md.convert(text1)
md.reset()
html2 = md.convert(text2)
You can also change calls to `reset` togeather:
html3 = md.reset().convert(text3)
* `Markdown.convertFile(**kwargs)`
The arguments of this method are identical to the arguments of the same
name on the `markdown.markdownFromFile` function (`input`, `output`, and
`encoding`). As with the `convert` method, this method should be used to
process multiple files without creating a new instance of the class for
each document. State may need to be `reset` between each call to
`convertFile` as is the case with `convert`.
|