src/zope/i18n/locales/fallbackcollator.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

Fallback Collator
=================

The zope.i18n.interfaces.locales.ICollator interface defines an API
for collating text.  Why is this important?  Simply sorting unicode
strings doesn't provide an ordering that users in a given locale will
fine useful.  Various languages have text sorting conventions that
don't agree with the ordering of unicode code points. (This is even
true for English. :)

Text collation is a fairly involved process.  Systems that need this,
will likely use something like ICU
(http://www-306.ibm.com/software/globalization/icu,
http://pyicu.osafoundation.org/).  We don't want to introduce a
dependency on ICU and this time, so we are providing a fallback
collator that:

- Provides an implementation of the ICollator interface that can be
  used for development, and

- Provides a small amount of value, at least for English speakers. :)

Application code should obtain a collator by adapting a locale to
ICollator.  Here we just call the collator factory with None. The
fallback collator doesn't actually use the locale, although
application code should certainly *not* count on this.

    >>> import zope.i18n.locales.fallbackcollator
    >>> collator = zope.i18n.locales.fallbackcollator.FallbackCollator(None)

Now, we can pass the collator's key method to sort functions to sort
strings in a slightly friendly way:

    >>> sorted([u"Sam", u"sally", u"Abe", u"alice", u"Terry", u"tim"],
    ...        key=collator.key)
    [u'Abe', u'alice', u'sally', u'Sam', u'Terry', u'tim']


The collator has a very simple algorithm.  It normalizes strings and
then returns a tuple with the result of lower-casing the normalized
string and the normalized string.  We can see this by calling the key
method, which converts unicode strings to collation keys:

    >>> collator.key(u"Sam")
    (u'sam', u'Sam')

    >>> collator.key(u"\xc6\xf8a\u030a")
    (u'\xe6\xf8\xe5', u'\xc6\xf8\xe5')

There is also a cmp function for comparing strings:

    >>> collator.cmp(u"Terry", u"sally")
    1


    >>> collator.cmp(u"sally", u"Terry")
    -1

    >>> collator.cmp(u"terry", u"Terry")
    1

    >>> collator.cmp(u"terry", u"terry")
    0