summaryrefslogtreecommitdiff
path: root/strings/ctype-mb.c
Commit message (Collapse)AuthorAgeFilesLines
* A cleanup for MDEV-30695 Refactor case folding data types in Asian collationsAlexander Barkov2023-03-031-3/+3
| | | | Adding "const" qualifiers to casefold_info_st::page
* MDEV-30695 Refactor case folding data types in Asian collationsAlexander Barkov2023-02-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a non-functional change and should not change the server behavior. Casefolding information is now stored in items of a new data type MY_CASEFOLD_CHARACTER: typedef struct casefold_info_char_t { uint32 toupper; uint32 tolower; } MY_CASEFOLD_CHARACTER; Before this change, casefolding tables for Asian collations were stored in: typedef struct unicase_info_char_st { uint32 toupper; uint32 tolower; uint32 sort; } MY_UNICASE_CHARACTER; The "sort" member was not used in the code handling Asian collations, it only wasted space. (it's only used by Unicode _general_ci and _general_mysql500_ci collations). Unicode collations (at least UCA and _bin) should also be refactored later, but under terms of a separate task.
* MDEV-30661 UPPER() returns an empty string for U+0251 in uca1400 collations ↵Alexander Barkov2023-02-171-4/+4
| | | | | | | | | | | | | | | | | | | | | for utf8 String length growth during upper/lower conversion in Unicode collations depends only on the underlying MY_UNICASE_INFO used in the collation. Maintaining a separate member CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply duplicated this information and caused bugs like this (when MY_UNICASE_INFO and case??_multiply when out of sync because of incomplete CHARSET_INFO initialization). Fix: Changing CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply from members to virtual functions. The virtual functions in Unicode collations calculate case conversion growth factors from the MY_UNICASE_INFO. This guarantees that the growth factors are always in sync with the MY_UNICASE_INFO.
* Merge 10.5 into 10.6Marko Mäkelä2021-11-091-3/+3
|\
| * Merge 10.4 into 10.5Marko Mäkelä2021-11-091-3/+3
| |\
| | * Merge 10.2 into 10.3Marko Mäkelä2021-11-091-2/+2
| | |\
| | | * MDEV-24335 Unexpected question mark in the end of a TINYTEXT columnAlexander Barkov2021-11-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | my_copy_fix_mb() passed MIN(src_length,dst_length) to my_append_fix_badly_formed_tail(). It could break a multi-byte character in the middle, which put the question mark to the destination. Fixing the code to pass the true src_length to my_append_fix_badly_formed_tail().
* | | | MDEV-26669 Add MY_COLLATION_HANDLER functions min_str() and max_str()bb-10.6-bar-MDEV-26669Alexander Barkov2021-09-271-0/+41
|/ / /
* | | Merge branch '10.4' into 10.5Sujatha2020-09-291-1/+3
|\ \ \ | |/ /
| * | Merge branch '10.2' into 10.3Sujatha2020-09-281-2/+4
| |\ \ | | |/
| | * MDEV-22387: Do not violate __attribute__((nonnull))Marko Mäkelä2020-09-231-2/+4
| | | | | | | | | | | | | | | | | | | | | Passing a null pointer to a nonnull argument is not only undefined behaviour, but it also grants the compiler the permission to optimize away further checks whether the pointer is null. GCC -O2 at least starting with version 8 may do that, potentially causing SIGSEGV.
* | | MDEV-21581 Helper functions and methods for CHARSET_INFOAlexander Barkov2020-01-281-33/+26
|/ /
* | Merge 10.2 into 10.3Marko Mäkelä2019-05-141-1/+1
|\ \ | |/
| * Merge 10.1 into 10.2Marko Mäkelä2019-05-131-1/+1
| |\
| | * Merge branch '5.5' into 10.1Vicențiu Ciorbaru2019-05-111-1/+1
| | |\
| | | * Update FSF AddressVicențiu Ciorbaru2019-05-111-1/+1
| | | | | | | | | | | | | | | | * Update wrong zip-code
* | | | Merge 10.2 into 10.3Marko Mäkelä2018-08-031-86/+14
|\ \ \ \ | |/ / /
| * | | Merge 10.1 into 10.2Marko Mäkelä2018-08-021-86/+14
| |\ \ \ | | |/ /
| | * | Merge branch '10.0' into bb-10.1-merge-sanjaOleksandr Byelkin2018-07-251-86/+14
| | |\ \
| | | * | Simplify caseup() and casedn() in charsetsAlexander Barkov2018-07-191-86/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the MDEV-13118 fix there's no code in the server that wants caseup/casedn to change the argument in place for simple charsets. Let's remove this logic and always return the result in a new string for all charsets, both simple and complex. 1. Removing the optimization that *some* character sets used in casedn() and caseup(), which allowed (and required) to change the case in-place, overwriting the string passed as the "src" argument. Now all CHARSET_INFO's work in the same way: non of them change the source string in-place, all of them now convert case from the source string to the destination string, leaving the source string untouched. 2. Adding "const" qualifier to the "char *src" parameter to caseup() and casedn(). 3. Removing duplicate implementations in ctype-mb.c. Now both caseup() and casedn() implementations for all CJK character sets use internally the same function my_casefold_mb() (the former my_casefold_mb_varlen()). 4. Removing the "unused" attribute from parameters of some my_case{up|dn}_xxx() implementations, as the affected parameters are now *used* in the code. Previously these parameters were used only in DBUG_ASSERT().
* | | | | Misc. typosluz.paz2018-04-051-1/+1
|/ / / / | | | | | | | | | | | | Found via `codespell -i 3 -w --skip="./debian/po" -I ../mariadb-server-word-whitelist.txt ./cmake/ ./debian/ ./Docs/ ./include/ ./man/ ./plugin/ ./strings/`
* | | | Fix and reenable Windows compiler warning C4800 (size_t conversion).Vladislav Vaintroub2018-01-261-2/+2
| | | |
* | | | MDEV-14350 Index use with collation utf8mb4_unicode_nopad_ci on LIKE pattern ↵Alexander Barkov2017-12-081-5/+5
| | | | | | | | | | | | | | | | with wrong results
* | | | MDEV-13384 - misc Windows warnings fixedVladislav Vaintroub2017-09-281-1/+1
| | | |
* | | | MDEV-7769 MY_CHARSET_INFO refactoring# On branch 10.2Alexander Barkov2016-10-101-22/+0
| | | | | | | | | | | | | | | | Part 3 (final): removing MY_CHARSET_HANDLER::well_formed_len().
* | | | MDEV-9711 NO PAD collationsAlexander Barkov2016-09-061-20/+50
| | | | | | | | | | | | | | | | Based on the patch from Daniil Medvedev (a Google Summer of Code task)
* | | | MDEV-6353 my_ismbchar() and my_mbcharlen() refactoringAlexander Barkov2016-05-171-1/+1
| | | |
* | | | Removing my_strnncoll_mb_bin() and my_strnncollsp_mb_bin(),Alexander Barkov2016-03-251-87/+0
| | | | | | | | | | | | | | | | | | | | as they are not used any more. We now use function templates from strcoll.ic instead.
* | | | Fixing compilation warnings introduced in:Alexander Barkov2016-03-231-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > commit e09299511e83f11f7476f7ea6c81ee12b00d7050 > Author: Alexander Barkov <bar@mariadb.org> > Date: Wed Mar 16 10:55:12 2016 +0400 > > MDEV-9665 Remove cs->cset->ismbchar() > Using a more powerfull cs->cset->charlen() instead.
* | | | MDEV-9665 Remove cs->cset->ismbchar()Alexander Barkov2016-03-161-3/+2
|/ / / | | | | | | | | | Using a more powerfull cs->cset->charlen() instead.
* | | Adding MY_CHARSET_HANDLER::native_to_mb().Alexander Barkov2015-08-141-19/+2
| | | | | | | | | | | | | | | | | | | | | This is a pre-requisite patch for: - MDEV-8433 Make field<'broken-string' use indexes - MDEV-8625 Bad result set with ignorable characters when using a prefix key - MDEV-8626 Bad result set with contractions when using a prefix key
* | | MDEV-8215 Asian MB3 charsets: compare broken bytes as "greater than any ↵Alexander Barkov2015-07-031-16/+0
| | | | | | | | | | | | non-broken character"
* | | MDEV-6566 Different INSERT behaviour on bad bytes with and without character ↵Alexander Barkov2015-03-131-14/+84
| | | | | | | | | | | | set conversion
* | | A preparatory patch for MDEV-6566.Alexander Barkov2015-03-021-0/+23
| | | | | | | | | | | | | | | | | | Adding a new virtual function MY_CHARSET_HANDLER::copy_abort(). Moving character set specific code into the correspoding implementations (for simple, multi-byte and mbmaxlen>1 character sets).
* | | cleanup: s/const CHARSET_INFO/CHARSET_INFO/Sergei Golubchik2014-12-041-1/+1
|/ / | | | | | | | | as CHARSET_INFO is already const, using const on it is redundant and results in compiler warnings (on Windows)
* | MDEV-7086 main.ctype_cp932 fails in buildbot on a valgrind buildAlexander Barkov2014-11-181-2/+2
| | | | | | | | | | | | Removing a redundant and wrong condition which could access beyond the pattern string range.
* | 5.5.40+ mergeSergei Golubchik2014-10-091-2/+2
|\ \ | |/
| * mysql-5.5.40Sergei Golubchik2014-10-061-2/+2
| |\
| | * Bug #11755818 : LIKE DOESN'T MATCH WHEN CP932_BIN/SJIS_BINmithun2014-08-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | COLLATIONS ARE USED. ISSUE : ------- Code points of HALF WIDTH KATAKANA in SJIS/CP932 range from A1 to DF. In function my_wildcmp_mb_bin_impl while comparing such single byte code points, there is a code which compares signed character with unsigned character. Because of this, comparisons of two same code points representing a HALF WIDTH KATAKANA character always fails. Solution: --------- A code point of HALF WIDTH KATAKANA at-least need 8 bits. Promoting the variable from uchar to int will fix the issue. mysql-test/t/ctype_cp932.test: Tests which have conditions LIKE 'STRING PATTERN WITH HALF WIDTH KATAKANA'. strings/ctype-mb.c: A code point of HALF WIDTH KATAKANA at-least need 8 bits. Promoting the variable from uchar to int will fix the issue.
| | * Bug #17760379 COLLATIONS WITH CONTRACTIONS BUFFER-OVERFLOW THEMSELVES IN THE ↵Venkata Sidagam2014-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | FOOT Description: A typo in create_tailoring() causes the "contraction_flags" to be written into cs->contractions in the wrong place. This causes two problems: (1) Anyone relying on `contraction_flags` to decide "could this character be part of a contraction" is 100% broken. (2) Anyone relying on `contractions` to determine the weight of a contraction is mostly broken Analysis: When we are preparing the contraction in create_tailoring(), we are corrupting the cs->contractions memory location which is supposed to store the weights(8k) + contraction information(256 bytes). We started storing the contraction information after the 4k location. This is because of logic flaw in the code. Fix: When we create the contractions, we need to calculate the contraction with (char*) (cs->contractions + 0x40*0x40) from ((char*) cs->contractions) + 0x40*0x40. This makes the "cs->contractions" to move to 8k bytes and stores the contraction information from there. Similarly when we are calculating it for like range queries we need to calculate it from the 8k bytes onwards, this can be done by changing the logic to (const char*) (cs->contractions + 0x40*0x40). And for ucs2 charsets we need to modify the my_cs_can_be_contraction_head() and my_cs_can_be_contraction_tail() to point to 8k+ locations.
* | | MDEV-6255 DUPLICATE KEY Errors on SELECT .. GROUP BY that uses temporary and ↵Michael Widenius2014-09-111-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | filesort. The problem was that my_hash_sort didn't properly delete end-space characters properly, so strings that should compare identically was seen as different strings. (Space was handled correctly, but not NBSP) This caused duplicate key errors when a heap table was converted to Aria as part of overflow in group by. Fixed by removing all characters that compares as end space when creating a hash. Other things: - Fixed that --sorted_results also works for errors in mysqltest. - Speed up hash by not comparing strings that has different hash. - Speed up many my_hash_sort functions by using registers to calculate hash instead of pointers. This was previously done for some functions, but not for all. - Made a macro of the hash function, to simplify code and to be able to experiment with new hash functions. client/mysqltest.cc: Fixed that --sorted_results also works for error messages. mysql-test/r/ctype_partitions.result: New test to ensure that partitions on hash works mysql-test/suite/multi_source/gtid.result: Updated result mysql-test/suite/multi_source/gtid.test: Test that --sorted_result works for error messages mysql-test/suite/multi_source/gtid_ignore_duplicates.result: Updated result mysql-test/suite/multi_source/gtid_ignore_duplicates.test: Updated result mysql-test/suite/multi_source/load_data.result: Updated result mysql-test/suite/multi_source/load_data.test: Updated result mysql-test/t/ctype_partitions.test: New test to ensure that partitions on hash works storage/heap/hp_write.c: Speed up hash by not comparing strings that has different hash. storage/maria/ma_check.c: Extra debug strings/ctype-bin.c: Use macro for hash function strings/ctype-latin1.c: Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-mb.c: Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-simple.c: Use macro for hash function Use same variable names as in other my_hash_sort functions. Update my_hash_sort_simple() to properly remove end space (patch by Bar) strings/ctype-uca.c: Ignore duplicated space inside strings and end space in my_hash_sort_uca(). This fixed MDEV-6255 Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-ucs2.c: Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-utf8.c: Use macro for hash function Use registers to calculate hash (speedup) strings/strings_def.h: Made a macro of the hash function, to simplify code and to be able to experiment with new hash functions.
* | | minor cleanupSergei Golubchik2014-03-011-5/+3
| | |
* | | 10.0-base mergeSergei Golubchik2014-02-261-2/+2
|\ \ \ | |/ /
| * | MySQL-5.5.36 mergeSergei Golubchik2014-02-171-2/+2
| |\ \ | | |/ | | | | | | (without few incorrect bugfixes and with 1250 files where only a copyright year was changed)
| | * Updated/added copyright headersMurthy Narkedimilli2014-01-061-1/+1
| | |
* | | MDEV-5163 Merge WEIGHT_STRING function from MySQL-5.6Alexander Barkov2013-10-231-9/+100
| | |
* | | MDEV-4928 Merge collation customization improvements Alexander Barkov2013-10-021-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merging the following MySQL-5.6 changes: - WL#5624: Collation customization improvements http://dev.mysql.com/worklog/task/?id=5624 - WL#4013: Unicode german2 collation http://dev.mysql.com/worklog/task/?id=4013 - Bug#62429 XML: ExtractValue, UpdateXML max arg length 127 chars http://bugs.mysql.com/bug.php?id=62429 (required by WL#5624)
* | | 10.0-monty mergeSergei Golubchik2013-07-211-3/+3
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | includes: * remove some remnants of "Bug#14521864: MYSQL 5.1 TO 5.5 BUGS PARTITIONING" * introduce LOCK_share, now LOCK_ha_data is strictly for engines * rea_create_table() always creates .par file (even in "frm-only" mode) * fix a 5.6 bug, temp file leak on dummy ALTER TABLE
| * | Temporary commit of 10.0-mergeMichael Widenius2013-03-261-3/+3
| | |
* | | mysql-5.5.31 mergeSergei Golubchik2013-05-071-12/+39
|\ \ \ | |/ / |/| / | |/