1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
|
Fallback Collator
=================
The zope.i18n.interfaces.locales.ICollator interface defines an API
for collating text. Why is this important? Simply sorting unicode
strings doesn't provide an ordering that users in a given locale will
fine useful. Various languages have text sorting conventions that
don't agree with the ordering of unicode code points. (This is even
true for English. :)
Text collation is a fairly involved process. Systems that need this,
will likely use something like ICU
(http://www-306.ibm.com/software/globalization/icu,
http://pyicu.osafoundation.org/). We don't want to introduce a
dependency on ICU and this time, so we are providing a fallback
collator that:
- Provides an implementation of the ICollator interface that can be
used for development, and
- Provides a small amount of value, at least for English speakers. :)
Application code should obtain a collator by adapting a locale to
ICollator. Here we just call the collator factory with None. The
fallback collator doesn't actually use the locale, although
application code should certainly *not* count on this.
>>> import zope.i18n.locales.fallbackcollator
>>> collator = zope.i18n.locales.fallbackcollator.FallbackCollator(None)
Now, we can pass the collator's key method to sort functions to sort
strings in a slightly friendly way:
>>> sorted([u'Sam', u'sally', u'Abe', u'alice', u'Terry', u'tim'],
... key=collator.key)
[u'Abe', u'alice', u'sally', u'Sam', u'Terry', u'tim']
The collator has a very simple algorithm. It normalizes strings and
then returns a tuple with the result of lower-casing the normalized
string and the normalized string. We can see this by calling the key
method, which converts unicode strings to collation keys:
>>> collator.key(u'Sam')
(u'sam', u'Sam')
>>> collator.key(u'\xc6\xf8a\u030a')
(u'\xe6\xf8\xe5', u'\xc6\xf8\xe5')
There is also a cmp function for comparing strings:
>>> collator.cmp(u'Terry', u'sally')
1
>>> collator.cmp(u'sally', u'Terry')
-1
>>> collator.cmp(u'terry', u'Terry')
1
>>> collator.cmp(u'terry', u'terry')
0
|