MDEV-27266 Improve UCA collation performance for utf8mb3 and utf8mb4

Adding two levels of optimization: 1. For every bytes pair [00..FF][00..FF] which: a. consists of two ASCII characters or makes a well-formed two-byte character b. whose total weight string fits into 4 weights (concatenated weight string in case of two ASCII characters, or a single weight string in case of a two-byte character) c. whose weight is context independent (i.e. does not depend on contractions or previous context pairs) store weights in a separate array of MY_UCA_2BYTES_ITEM, so during scanner_next() we can scan two bytes at a time. Byte pairs that do not match the conditions a-c are marked in this array as not applicable for optimization and scanned as before. 2. For every byte pair which is applicable for optimization in #1, and which produces only one or two weights, store weights in one more array of MY_UCA_WEIGHT2. So in the beginning of strnncoll*() we can skip equal prefixes using an even more efficient loop. This loop consumes two bytes at a time. The loop scans while the two bytes on both sides produce weight strings of equal length (i.e. one weight on both sides, or two weight on both sides). This allows to compare efficiently: - Context independent sequences consisting of two ASCII characters - Context independent 2-byte characters - Contractions consisting of two ASCII characters, e.g. Czech "ch". - Some tricky cases: "ss" vs "SHARP S" ("ss" produces two weights, 0xC39F also produces two weights)
author: Alexander Barkov <bar@mariadb.com> 2022-02-25 13:54:59 +0400
committer: Oleksandr Byelkin <sanja@mariadb.com> 2022-08-10 15:04:50 +0200
commit: d8f172c11ce0483689335e61b72456621d8b544b (patch)
tree: 3b5dbfc8e7783e97db6390ebc335a78f0c5ed75f /unittest
parent: a0858b2cffb8152e3dd8dcdfea5ef07ef787989b (diff)
download: mariadb-git-d8f172c11ce0483689335e61b72456621d8b544b.tar.gz
1 files changed, 1 insertions, 1 deletions
diff --git a/unittest/strings/strings-t.c b/unittest/strings/strings-t.c
index 7532244b0a2..9636634fb8e 100644
--- a/unittest/strings/strings-t.c
+++ b/unittest/strings/strings-t.c
@@ -1341,7 +1341,7 @@ strnncollsp_char_one(CHARSET_INFO *cs, const STRNNCOLLSP_CHAR_PARAM *p)
   str2hex(ahex, sizeof(ahex), p->a.str, p->a.length);
   str2hex(bhex, sizeof(bhex), p->b.str, p->b.length);
   diag("%-25s %-12s %-12s %3d %7d %7d%s",
-       cs->cs_name.str, ahex, bhex, (int) p->nchars, p->res, res,
+       cs->coll_name.str, ahex, bhex, (int) p->nchars, p->res, res,
        eqres(res, p->res) ? "" : " FAILED");
   if (!eqres(res, p->res))
   {
author	Alexander Barkov <bar@mariadb.com>	2022-02-25 13:54:59 +0400
committer	Oleksandr Byelkin <sanja@mariadb.com>	2022-08-10 15:04:50 +0200
commit	d8f172c11ce0483689335e61b72456621d8b544b (patch)
tree	3b5dbfc8e7783e97db6390ebc335a78f0c5ed75f /unittest
parent	a0858b2cffb8152e3dd8dcdfea5ef07ef787989b (diff)
download	mariadb-git-d8f172c11ce0483689335e61b72456621d8b544b.tar.gz