summaryrefslogtreecommitdiff
path: root/lib/stdlib/uc_spec/gen_unicode_mod.escript
Commit message (Collapse)AuthorAgeFilesLines
* Update Unicode to version 15.0.0Maintenance App2023-03-061-1/+1
| | | | | | | | | This is an automated commit created by the Maintenance project https://github.com/eksperimental/maintenance Before merging, please read the release notes by visiting <http://www.unicode.org/versions/Unicode15.0.0/> and assess if additional changes are necessary in the code base.
* Update script to take 'update_tests' argumentDan Gudmundsson2023-03-061-10/+22
| | | | Make it possible to update the binary testfile when updating version.
* string, unicode_util: Stricten tests for integersBjörn Gustavsson2022-06-281-28/+34
|
* Improve guards and bad list inputDan Gudmundsson2022-06-231-12/+14
| | | | | | Do not return bad codepoints such as -1. Improve the guards and check that the code make errors for bad input in list strings.
* Adds codepoint category to lookup/1Dan Gudmundsson2022-06-231-39/+227
| | | | | | | | Category can be useful to user programs, such as in the terminal handling. Loaded code increases with these two commits with 21% Beam size increases from 23%
* Add a utility is_wide(grapheme cluster) functionDan Gudmundsson2022-06-091-10/+96
| | | | | | | | | | | | | | From a non east asian perspective we can limit the number of wide codepoints to ~120 ranges. We loose a lot of 'width' information but can keep the file size small. It should be ok to increase the file size with that size. This is useful when editing fix-width characters to count the number of columns displayed. Wide characters should take 2 columns in standard terminal. Emoji presentation sequences are not validated, if the presentation selector is included the sequence is assumed to be correct and wide.
* Merge pull request #5785 from chrrasmussen/stdlib/string-next_grapheme-fixDan Gudmundsson2022-03-301-9/+9
|\ | | | | | | stdlib: Keep the tail of the strings passed to string:next_grapheme/1 OTP-18009
| * Make sure to keep tail in strings containing emojis as wellChristian Rasmussen2022-03-121-6/+6
| |
| * Make sure to keep tail in string:next_grapheme/1Christian Rasmussen2022-03-121-3/+3
| |
* | Update copyright yearErlang/OTP2022-03-231-1/+1
| |
* | Update Unicode version in lib/stdlib/uc_spec/gen_unicode_mod.escriptEksperimental2022-02-141-1/+1
|/
* Update copyright yearRickard Green2021-12-131-1/+1
|
* Fix syntax for inlining a functionDániel Szoboszlay2021-10-051-1/+1
|
* Update Unicode to 13.0Eksperimental2021-04-021-1/+1
|
* Update to Unicode 12.1José Valim2019-08-051-1/+1
|
* unicode_util gc/1Dan Gudmundsson2019-05-021-7/+7
| | | | | Could expand binary to list for to many elements. Fix and add tests.
* Fix bug string:slice/3 on bad inputDan Gudmundsson2019-04-301-4/+8
| | | | | | | Fixed bug in slice which wrongly could return <<>> for non-utf8 binary input. Also give a better error reason when non-utf8 binaries are given as input to some functions.
* stdlib: Optimize handling of Unicode in the string moduleHans Bolinder2019-03-201-63/+134
| | | | Unroll some of the functions returning codepoints and grapheme clusters.
* stdlib: Optimize handling of Unicode in the string moduleHans Bolinder2019-03-201-10/+21
| | | | | The unicode_util:cp() function handles deep lists faster by returning the rest of the input more balanced to the right than before.
* Update to Unicode-11Dan Gudmundsson2018-09-281-74/+75
| | | | | | | | | | | | | | Update input files for the code-generator and tests. Added emoji-data.txt for new rule with how to handle emoji. Unicode have simpliefied the rules for emoji grapheme-clusters: From: GB10 (E_Base | EBG) Extend* × E_Modifier GB11 ZWJ × (Glue_After_Zwj | EBG) To: GB11 \p{Extended_Pictographic} Extend* ZWJ × \p{Extended_Pictographic} Update the code generator to handle the new way.
* Merge branch 'maint'Dan Gudmundsson2017-11-301-7/+24
|\ | | | | | | | | | | | | * maint: Avoid falling measurements testcases on slow machines stdlib: string optimize special case for ASCII stdlib: Minor unicode_util opts
| * stdlib: Minor unicode_util optsDan Gudmundsson2017-11-291-7/+24
| | | | | | | | Exit early for Latin-1
* | Merge branch 'siri/string-new-api'Siri Hansen2017-09-151-12/+12
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * siri/string-new-api: (28 commits) hipe (test): Do not use deprecated functions in string(3) dialyzer (test): Do not use deprecated functions in string(3) eunit (test): Do not use deprecated functions in string(3) system (test): Do not use deprecated functions in string(3) system (test): Do not use deprecated functions in string(3) mnesia (test): Do not use deprecated functions in string(3) Deprecate old string functions observer: Do not use deprecated functions in string(3) common_test: Do not use deprecated functions in string(3) eldap: Do not use deprecated functions in string(3) et: Do not use deprecated functions in string(3) os_mon: Do not use deprecated functions in string(3) debugger: Do not use deprecated functions in string(3) runtime_tools: Do not use deprecated functions in string(3) asn1: Do not use deprecated functions in string(3) compiler: Do not use deprecated functions in string(3) sasl: Do not use deprecated functions in string(3) reltool: Do not use deprecated functions in string(3) kernel: Do not use deprecated functions in string(3) hipe: Do not use deprecated functions in string(3) ... Conflicts: lib/eunit/src/eunit_lib.erl lib/observer/src/crashdump_viewer.erl lib/reltool/src/reltool_target.erl
| * | Deprecate old string functionsDan Gudmundsson2017-09-151-12/+12
| |/ | | | | | | They should not be used.
* | Update to Unicode 10José Valim2017-07-031-1/+1
|/
* Return error tuple on unicode normalization functionsJosé Valim2017-05-221-16/+33
| | | | | | | | | | | | | Prior to this patch, the normalization functions in the unicode module would raise a function clause error for non-utf8 binaries. This patch changes it so it returns {error, SoFar, Invalid} as characters_to_binary and characters_to_list does in the unicode module. Note string:next_codepoint/1 and string:next_grapheme had to be changed accordingly and also return an error tuple.
* Add unicode_utilDan Gudmundsson2017-04-241-0/+894
A base for unicode functions, not intended to be a user api. Whitespace returns a reasonable subset of non nobreak whitespace characters. Implementation notes: Make function clauses instead of using arrays and store tuples instead of maps to save space.