From 37e4159cb49d2f7c8fdafa0268adca5a1e2017e4 Mon Sep 17 00:00:00 2001 From: Leonard Richardson Date: Sat, 28 Jul 2018 16:58:23 -0400 Subject: Correctly handle invalid HTML numeric character entities like “ which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933] --- NEWS.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'NEWS.txt') diff --git a/NEWS.txt b/NEWS.txt index 1aa0a42..acdcc04 100644 --- a/NEWS.txt +++ b/NEWS.txt @@ -12,6 +12,12 @@ * Fixed a problem where the html.parser tree builder interpreted a string like "&foo " as the character entity "&foo;" [bug=1728706] +* Correctly handle invalid HTML numeric character entities like “ + which reference code points that are not Unicode code points. Note + that this is only fixed when Beautiful Soup is used with the + html.parser parser -- html5lib already worked and I couldn't fix it + with lxml. [bug=1782933] + * Improved the warning given when no parser is specified. [bug=1780571] * Fixed code that was causing deprecation warnings in recent Python 3 -- cgit v1.2.1