<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/cpython-git.git/Modules/unicodedata.c, branch fix-namedexpr-comment</title>
<subtitle>github.com: python/cpython.git
</subtitle>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/cpython-git.git/'/>
<entry>
<title>bpo-37752: Delete redundant Py_CHARMASK in normalizestring() (GH-15095)</title>
<updated>2019-09-10T16:04:08+00:00</updated>
<author>
<name>Jordon Xu</name>
<email>46997731+qigangxu@users.noreply.github.com</email>
</author>
<published>2019-09-10T16:04:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/cpython-git.git/commit/?id=2ec70102066fe5534f1a62e8f496d2005e1697db'/>
<id>2ec70102066fe5534f1a62e8f496d2005e1697db</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>bpo-38043: Use `bool` for boolean flags on is_normalized_quickcheck. (GH-15711)</title>
<updated>2019-09-09T09:16:31+00:00</updated>
<author>
<name>Greg Price</name>
<email>gnprice@gmail.com</email>
</author>
<published>2019-09-09T09:16:31+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/cpython-git.git/commit/?id=7669cb8b21c7c9cef758609c44017c09d1ce4658'/>
<id>7669cb8b21c7c9cef758609c44017c09d1ce4658</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>closes bpo-37966: Fully implement the UAX #15 quick-check algorithm. (GH-15558)</title>
<updated>2019-09-04T02:45:44+00:00</updated>
<author>
<name>Greg Price</name>
<email>gnprice@gmail.com</email>
</author>
<published>2019-09-04T02:45:44+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/cpython-git.git/commit/?id=2f09413947d1ce0043de62ed2346f9a2b4e5880b'/>
<id>2f09413947d1ce0043de62ed2346f9a2b4e5880b</id>
<content type='text'>
The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX #15.

However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.

Implement the standard's algorithm.  This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.

At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:

  $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
      -- 'unicodedata.is_normalized("NFD", s)'
  50 loops, best of 5: 4.39 msec per loop

With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:

  $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
      -- 'unicodedata.is_normalized("NFD", s)'
  5000000 loops, best of 5: 58.2 nsec per loop

This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.

With this, that case is actually faster than in master!

$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
    -- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop

$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
    -- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX #15.

However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.

Implement the standard's algorithm.  This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.

At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:

  $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
      -- 'unicodedata.is_normalized("NFD", s)'
  50 loops, best of 5: 4.39 msec per loop

With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:

  $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
      -- 'unicodedata.is_normalized("NFD", s)'
  5000000 loops, best of 5: 58.2 nsec per loop

This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.

With this, that case is actually faster than in master!

$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
    -- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop

$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
    -- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
</pre>
</div>
</content>
</entry>
<entry>
<title>bpo-36974: tp_print -&gt; tp_vectorcall_offset and tp_reserved -&gt; tp_as_async (GH-13464)</title>
<updated>2019-05-31T02:13:39+00:00</updated>
<author>
<name>Jeroen Demeyer</name>
<email>J.Demeyer@UGent.be</email>
</author>
<published>2019-05-31T02:13:39+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/cpython-git.git/commit/?id=530f506ac91338b55cf2be71b1cdf50cb077512f'/>
<id>530f506ac91338b55cf2be71b1cdf50cb077512f</id>
<content type='text'>
Automatically replace
tp_print -&gt; tp_vectorcall_offset
tp_compare -&gt; tp_as_async
tp_reserved -&gt; tp_as_async
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Automatically replace
tp_print -&gt; tp_vectorcall_offset
tp_compare -&gt; tp_as_async
tp_reserved -&gt; tp_as_async
</pre>
</div>
</content>
</entry>
<entry>
<title>bpo-36642: make unicodedata const (GH-12855)</title>
<updated>2019-04-16T23:40:34+00:00</updated>
<author>
<name>Inada Naoki</name>
<email>songofacandy@gmail.com</email>
</author>
<published>2019-04-16T23:40:34+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/cpython-git.git/commit/?id=6fec905de5c139017f36b212e54cac46959808fe'/>
<id>6fec905de5c139017f36b212e54cac46959808fe</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>closes bpo-32285: Add unicodedata.is_normalized. (GH-4806)</title>
<updated>2018-11-04T23:58:24+00:00</updated>
<author>
<name>Max Bélanger</name>
<email>aeromax@gmail.com</email>
</author>
<published>2018-11-04T23:58:24+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/cpython-git.git/commit/?id=2810dd7be9876236f74ac80716d113572c9098dd'/>
<id>2810dd7be9876236f74ac80716d113572c9098dd</id>
<content type='text'>

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>

</pre>
</div>
</content>
</entry>
<entry>
<title>bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (GH-1958)</title>
<updated>2018-06-15T12:03:14+00:00</updated>
<author>
<name>Wonsup Yoon</name>
<email>pusnow@me.com</email>
</author>
<published>2018-06-15T12:03:14+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/cpython-git.git/commit/?id=d134809cd3764c6a634eab7bb8995e3e2eff14d5'/>
<id>d134809cd3764c6a634eab7bb8995e3e2eff14d5</id>
<content type='text'>
Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).</pre>
</div>
</content>
</entry>
<entry>
<title>update to Unicode 11.0.0 (closes bpo-33778) (GH-7439)</title>
<updated>2018-06-07T03:14:28+00:00</updated>
<author>
<name>Benjamin Peterson</name>
<email>benjamin@python.org</email>
</author>
<published>2018-06-07T03:14:28+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/cpython-git.git/commit/?id=7c69c1c0fba8c1c8ff3969bce4c1135736a4cc58'/>
<id>7c69c1c0fba8c1c8ff3969bce4c1135736a4cc58</id>
<content type='text'>
Also, standardize indentation of generated tables.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Also, standardize indentation of generated tables.</pre>
</div>
</content>
</entry>
<entry>
<title>Fix miscellaneous typos (#4275)</title>
<updated>2017-11-05T13:37:50+00:00</updated>
<author>
<name>luzpaz</name>
<email>luzpaz@users.noreply.github.com</email>
</author>
<published>2017-11-05T13:37:50+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/cpython-git.git/commit/?id=a5293b4ff2c1b5446947b4986f98ecf5d52432d4'/>
<id>a5293b4ff2c1b5446947b4986f98ecf5d52432d4</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>bpo-30736: upgrade to Unicode 10.0 (#2344)</title>
<updated>2017-06-23T05:31:08+00:00</updated>
<author>
<name>Benjamin Peterson</name>
<email>benjamin@python.org</email>
</author>
<published>2017-06-23T05:31:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/cpython-git.git/commit/?id=279a96206f3118a482d10826a1e32b272db4505d'/>
<id>279a96206f3118a482d10826a1e32b272db4505d</id>
<content type='text'>
Straightforward. While we're at it, though, strip trailing whitespace from generated tables.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Straightforward. While we're at it, though, strip trailing whitespace from generated tables.</pre>
</div>
</content>
</entry>
</feed>
