Collations with nondeterministic comparison

This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
author: Peter Eisentraut <peter@eisentraut.org> 2019-03-22 12:09:32 +0100
committer: Peter Eisentraut <peter@eisentraut.org> 2019-03-22 12:12:43 +0100
commit: 5e1963fb764e9cc092e0f7b58b28985c311431d9 (patch)
tree: 544492f24e3d48d00bd2a19c11663f84f1e18ce4 /src/test/regress/expected/subselect.out
parent: 2ab6d28d233af17987ea323e3235b2bda89b4f2e (diff)
download: postgresql-5e1963fb764e9cc092e0f7b58b28985c311431d9.tar.gz
1 files changed, 19 insertions, 0 deletions
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index fe5fc64480..4a54104182 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -746,6 +746,25 @@ select * from outer_7597 where (f1, f2) not in (select * from inner_7597);
 (2 rows)
 
 --
+-- Similar test case using text that verifies that collation
+-- information is passed through by execTuplesEqual() in nodeSubplan.c
+-- (otherwise it would error in texteq())
+--
+create temp table outer_text (f1 text, f2 text);
+insert into outer_text values ('a', 'a');
+insert into outer_text values ('b', 'a');
+insert into outer_text values ('a', null);
+insert into outer_text values ('b', null);
+create temp table inner_text (c1 text, c2 text);
+insert into inner_text values ('a', null);
+select * from outer_text where (f1, f2) not in (select * from inner_text);
+ f1 | f2 
+----+----
+ b  | a
+ b  | 
+(2 rows)
+
+--
 -- Test case for premature memory release during hashing of subplan output
 --
 select '1'::text in (select '1'::name union all select '1'::name);
author	Peter Eisentraut <peter@eisentraut.org>	2019-03-22 12:09:32 +0100
committer	Peter Eisentraut <peter@eisentraut.org>	2019-03-22 12:12:43 +0100
commit	5e1963fb764e9cc092e0f7b58b28985c311431d9 (patch)
tree	544492f24e3d48d00bd2a19c11663f84f1e18ce4 /src/test/regress/expected/subselect.out
parent	2ab6d28d233af17987ea323e3235b2bda89b4f2e (diff)
download	postgresql-5e1963fb764e9cc092e0f7b58b28985c311431d9.tar.gz