Setup::
>>> import lxml.html
We'll define a link translation function:
>>> base_href = 'http://old/base/path.html'
>>> try: import urlparse
... except ImportError: import urllib.parse as urlparse
>>> def relocate_href(link):
... link = urlparse.urljoin(base_href, link)
... if link.startswith('http://old'):
... return 'https://new' + link[len('http://old'):]
... else:
... return link
Now for content. First, to make it easier on us, we need to trim the
normalized HTML we get from these functions::
Some basics::
>>> from lxml.html import usedoctest, tostring
>>> from lxml.html import rewrite_links
>>> print(rewrite_links(
... 'link', relocate_href))
link
>>> print(rewrite_links(
... '', relocate_href))
>>> print(rewrite_links(
... '', relocate_href))
>>> print(rewrite_links('''\
...
...
... x\
... ''', relocate_href))
x
Links in CSS are also handled::
>>> print(rewrite_links('''
... ''', relocate_href))
>>> print(rewrite_links('''
... ''', relocate_href))
Those links in style attributes are also rewritten::
>>> print(rewrite_links('''
...
### Test disabled to support Py2.6 and earlier
#If the document contains invalid links, you may choose to "discard" or "ignore"
#them by passing the respective option into the ``handle_failures`` argument::
#
# >>> html = lxml.html.fromstring ('''\
# ...