summaryrefslogtreecommitdiff
path: root/lib/util/charset
Commit message (Collapse)AuthorAgeFilesLines
* lib: Stay ASCII-compatible for toupper_m/tolower_mVolker Lendecke2022-04-041-6/+0
| | | | | | | | | | | | | | | | | This is an alternative patch for MR2339: It seems that Windows AD in turkish locale is ASCII-compatible with 'i'. Björn tells me that the turkish locale is the only one where upper/lower casing letters in the ASCII range is not compatible to ASCII. Simplify our code by not calling the locale-specific standard toupper/tolower for the ASCII range but rely on our tables. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Alexander Bokovoy <ab@samba.org> Reviewed-by: Andreas Schneider <asn@samba.org> Autobuild-User(master): Volker Lendecke <vl@samba.org> Autobuild-Date(master): Mon Apr 4 11:45:24 UTC 2022 on sn-devel-184
* charset_macosxfs.c: fix compilation on macOSAlex Richardson2021-10-131-1/+2
| | | | | | | | | | | The DEBUG macro was missing and the CFStringGetBytes() was triggering a -Werror,-Wpointer-sign build failure. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14862 Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* util/charset: warn loudly on unexpected E2BIGDouglas Bagnall2021-06-181-2/+2
| | | | | | | | Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Fri Jun 18 04:27:17 UTC 2021 on sn-devel-184
* util/iconv: reject improperly packed UTF-8Douglas Bagnall2021-06-181-11/+21
| | | | | | | | | | | If we allow a string that encodes say '\0' as a multi-byte sequence, we are open to confusion where we mix NUL terminated strings with sized data blobs, which is to say EVERYWHERE. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14684 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>
* lib: Fix 'charset' dependenciesVolker Lendecke2021-01-121-1/+1
| | | | | | | | | | With this, 'charset' could be a SAMBA_LIBRARY without any undefined symbols Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Tue Jan 12 01:19:26 UTC 2021 on sn-devel-184
* lib: Avoid "includes.h" in lib/util/charset/Volker Lendecke2021-01-126-6/+20
| | | | | Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* lib: Remove using talloc_stack from lib/util/charset/Volker Lendecke2021-01-121-16/+20
| | | | | | | | 'charset' should be as standalone as possible, and for this one use talloc_stackframe() is not really necessary. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* lib: Move utf16_len[_n]() to lib/util/charset/Volker Lendecke2021-01-122-0/+43
| | | | | | | util_unistr.c references it, avoid broken dependencies Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* build: Move weird.c and charset_macosxfs.c to ICONV_WRAPPERVolker Lendecke2021-01-121-2/+13
| | | | | | | | iconv.c directly references them, it does not make sense to have it without them. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* lib: Simplify "weird" charset codeVolker Lendecke2021-01-121-24/+21
| | | | | | | | Don't depend on DEBUG. This is a pure developer module, the developer should be able to figure out what's going on after this has abort()ed. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* lib: Move ucs2_align() to 'charset' subsystemVolker Lendecke2021-01-122-0/+10
| | | | | | | | Fix a circular dependency: util_str_common.c depends on 'charset', which depends on util_str_common.c. Fix that. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* lib: Use hex_byte() in ucs2hex_pull()Volker Lendecke2021-01-081-8/+7
| | | | | | Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Ralph Boehme <slow@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* lib/util: remove extra safe_string.h fileMatthew DeVore2020-08-282-0/+7
| | | | | | | | | | | | | | | | | | lib/util/safe_string.h is similar to source3/include/safe_string.h, but the former has fewer checks. It is missing bcopy, strcasecmp, and strncasecmp. Add the missing elements to lib/util/safe_string.h remove the other safe_string.h which is in the source3-specific path. To accomodate existing uses of str(n?)casecmp, add #undef lines to source files where they are used. Signed-off-by: Matthew DeVore <matvore@google.com> Reviewed-by: David Mulder <dmulder@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Fri Aug 28 02:18:40 UTC 2020 on sn-devel-184
* lib/util: add talloc_alpha_strcpy()Ralph Boehme2020-02-061-0/+3
| | | | Signed-off-by: Ralph Boehme <slow@samba.org> Reviewed-by: Andreas Schneider <asn@samba.org>
* CVE-2019-14907 lib/util: Do not print the failed to convert string into the logsAndrew Bartlett2020-01-211-18/+20
| | | | | | | | | | The string may be in another charset, or may be sensitive and certainly may not be terminated. It is not safe to just print. Found by Robert Święcki using a fuzzer he wrote for smbd. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14208 Signed-off-by: Andrew Bartlett <abartlet@samba.org>
* util: Free memory in charset torture test to satisfy sanitizerSwen Schillig2019-08-081-0/+10
| | | | | | Signed-off-by: Swen Schillig <swen@linux.ibm.com> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Matthias Dieter Wallnöfer <mdw@samba.org>
* charset: add tests for Unicode NFC <-> NFD conversionRalph Boehme2019-08-071-0/+228
| | | | | | | | Signed-off-by: Ralph Boehme <slow@samba.org> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Wed Aug 7 07:25:39 UTC 2019 on sn-devel-184
* charset: add support for Unicode normalisation with libicuRalph Boehme2019-08-073-1/+178
| | | | | | | | | | | | | | This adds a direct conversion hook using libicu to perform NFC <-> NFD conversion on UTF8 strings. The defined charset strings are "UTF8-NFC" and "UTF8-NFD", to convert from one to the other the caller calls smb_iconv_open() with the desired source and target charsets, eg smb_iconv_open("UTF8-NFD", "UTF8-NFC"); for converting from NFC to NFD. Signed-off-by: Ralph Boehme <slow@samba.org> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
* lib/util/charset: clang: Fix Value stored to 'reason' is never read warningNoel Power2019-06-111-4/+4
| | | | | | | | | Fixes: lib/util/charset/convert_string.c:301:5: warning: Value stored to 'reason' is never read <--[clang] Signed-off-by: Noel Power <noel.power@suse.com> Reviewed-by: Gary Lockyer gary@catalyst.net.nz
* util/charset/torture: ensure each cp850 high bytes is 3 utf8 bytesDouglas Bagnall2019-05-151-0/+52
| | | | | Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
* util/charset/convert: do not pretend to reallocDouglas Bagnall2019-05-151-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems very likely that our clever attempts to dynamically realloc the output buffer were never triggered. Two lines of reasoning lead to this conclusion: 1. We allocate 3 * srclen to start with, but no conversion we use will more than that. To be precise, from 8-bit charsets we will only deal with codepoints in the Unicode basic multilingual plane (up to 0xFFFF). These can all be expressed as 3 or fewer utf-8 bytes. In UTF16 they are naturally 2 bytes, while in the DOS codes they are 1 byte. We have checked the code tables, and can not find a plausible (e.g. not EBCDIC) DOS code page or unix charset that is outside this range. Clients cannot chose the code page, the only code pages we will use come from 'unix charset' and 'dos charset' smb.conf parameters. Therefore the worst that can possibly happen is we expand 1 byte into 3 (specifically, when converting some e.g. CP850 codepoints to UTF-8). 2. If the reallocation was ever used, the results would have been catastrophically wrong, as the input pointer was not reset. Therefore we skip the complication of the goto loop and let E2BIG be just another impossible error to report. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
* util/charset/convert: when retrying, retry from the startDouglas Bagnall2019-05-151-1/+2
| | | | | | | | iconv() advances the inbuf pointer; if we decide to realloc and re-iconv, we need to reset inbuf to the source string Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
* util/charset/convert: do not overflow dest len in corner caseDouglas Bagnall2019-05-151-2/+2
| | | | | | | | | | | | | | | | | | | Now, if destlen were SIZE_MAX - 1, destlen * 2 would wrap to SIZE_MAX - 3, which makes (destlen * 2 + 2) == SIZE_MAX - 1, the same number again. So we need the <= comparison in this case. As things stand, it is not actually possible for destlen to be SIZE_MAX (because it is always an even number after the first round, and the first round is constrained to be < SIZE_MAX / 2, but *if* destlen was SIZE_MAX, destlen * 2 + 2 would be 0, so that case is OK. Similarly the SIZE_MAX - 2 and smaller cases were covered by the original formula. We add the comment for people who are wondering WTF is going on with all this destlen manipulation. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
* util/charset/convert: do not overflow dest lenDouglas Bagnall2019-05-151-1/+10
| | | | | Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
* util/charset/convert_string: always set lengthDouglas Bagnall2019-05-151-0/+3
| | | | | | | | | | | In failure cases the destination string pointer is set to NULL, but the size is not changed. Some callers have not been checking the return value and passing the destination pointer and uninitialised length onto other functions. We can curse and blame those callers, but let's also keep them safe. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
* lib:util: Use C99 initializer for weird_table in charsetAndreas Schneider2019-01-281-3/+13
| | | | | Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
* lib:util: Use C99 initializer for builtin_functions in iconvAndreas Schneider2019-01-281-14/+69
| | | | | Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
* lib:util: Use #ifdef instead of #if for config.h definitionsAndreas Schneider2018-11-282-3/+3
| | | | | Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>
* lib:charset: Fix error messages from charset conversionChristof Schmitt via samba-technical2018-07-072-3/+1
| | | | | | | | | | | | | | When e.g. trying to access a filename through Samba that does not adhere to the encoding configured in 'unix charset', the log will show the encoding problem, followed by "strstr_m: src malloc fail". The problem is that strstr_m assumes that any failure from push/pull_ucs2_talloc is a memory allocation problem, which is not correct. Address this by removing the misleading messages and add a missing message in convert_string_talloc_handle. Signed-off-by: Christof Schmitt <cs@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* util/charset/iconv: use read_hex_bytes rather than sscanfDouglas Bagnall2018-05-311-3/+6
| | | | | Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
* lib:util: Add FALL_THROUGH statements in charset/charset_macosxfs.cAndreas Schneider2018-03-011-6/+8
| | | | | Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
* charset: fix str[n]casecmp_m() by comparing lower case valuesStefan Metzmacher2017-09-151-4/+28
| | | | | | | | | | | | | | | | | The commits c615ebed6e3d273a682806b952d543e834e5630d^..f19ab5d334e3fb15761fb009e5de876dfc6ea785 replaced Str[n]CaseCmp() by str[n]casecmp_m(). The logic we had in str[n]casecmp_w() used to compare the upper cased as well as the lower cased versions of the characters and returned the difference between the lower cased versions. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13018 Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Ralph Boehme <slow@samba.org> Autobuild-User(master): Ralph Böhme <slow@samba.org> Autobuild-Date(master): Fri Sep 15 02:23:29 CEST 2017 on sn-devel-144
* charset/tests: also tests the system str[n]casecmp()Stefan Metzmacher2017-09-141-0/+35
| | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=13018 Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Ralph Boehme <slow@samba.org>
* charset/tests: add more str[n]casecmp_m() tests to demonstrate the bugStefan Metzmacher2017-09-141-1/+7
| | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=13018 Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Ralph Boehme <slow@samba.org>
* charset/tests: assert the exact values of str[n]casecmp_m()Stefan Metzmacher2017-09-141-17/+17
| | | | | | | BUG: https://bugzilla.samba.org/show_bug.cgi?id=13018 Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Ralph Boehme <slow@samba.org>
* lib:charset: Remove use of talloc_autofree_context() for global_iconv_handleJeremy Allison2017-04-181-3/+9
| | | | | | | | All other callers use NULL here anyway, so there's no need to use a special context for get_iconv_handle(). Signed-off-by: Jeremy Allison <jra@samba.org> Reviewed-by: Andreas Schneider <asn@samba.org>
* lib:charset: Make global_iconv_handle privateJeremy Allison2017-04-182-2/+1
| | | | | Signed-off-by: Jeremy Allison <jra@samba.org> Reviewed-by: Andreas Schneider <asn@samba.org>
* lib:charset: Add utility functions reinit_iconv_handle() and ↵Jeremy Allison2017-04-182-0/+23
| | | | | | | | | free_iconv_handle(void) Not yet used. Will enable us to make global_iconv_handle private. Signed-off-by: Jeremy Allison <jra@samba.org> Reviewed-by: Andreas Schneider <asn@samba.org>
* lib: Avoid an includes.hVolker Lendecke2017-03-281-1/+3
| | | | | Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* lib: Avoid an includes.hVolker Lendecke2017-03-281-3/+5
| | | | | Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* util:charset: Return EILSEQ in smb_iconv() if newer libc is detectedAndreas Schneider2017-02-012-3/+23
| | | | | | | | | | This is the behaviour of glibc 2.24 and newer. Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Wed Feb 1 05:16:46 CET 2017 on sn-devel-144
* lib/util/charset: Optimize next_codepoint for the ascii caseVolker Lendecke2017-01-221-0/+4
| | | | | Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Ralph Boehme <slow@samba.org>
* util: Fix the documentation of push_utf8_talloc()Andreas Schneider2016-09-091-8/+17
| | | | | Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* lib: Give base64.c its own .hVolker Lendecke2016-05-041-0/+1
| | | | | Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* CVE-2015-5330: next_codepoint_handle_ext: don't short-circuit UTF16 low bytesDouglas Bagnall2015-12-091-1/+4
| | | | | | | | | | | | UTF16 contains zero bytes when it is encoding ASCII (for example), so we can't assume the absense of the 0x80 bit means a one byte encoding. No current callers use UTF16. Bug: https://bugzilla.samba.org/show_bug.cgi?id=11599 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Pair-programmed-with: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Ralph Boehme <slow@samba.org>
* CVE-2015-5330: strupper_talloc_n_handle(): properly count charactersDouglas Bagnall2015-12-091-1/+2
| | | | | | | | | | | When a codepoint eats more than one byte we really want to know, especially if the string is not NUL terminated. Bug: https://bugzilla.samba.org/show_bug.cgi?id=11599 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Pair-programmed-with: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Ralph Boehme <slow@samba.org>
* CVE-2015-5330: Fix handling of unicode near string endingsDouglas Bagnall2015-12-094-14/+25
| | | | | | | | | | | | | | Until now next_codepoint_ext() and next_codepoint_handle_ext() were using strnlen(str, 5) to determine how much string they should try to decode. This ended up looking past the end of the string when it was not null terminated and the final character looked like a multi-byte encoding. The fix is to let the caller say how long the string can be. Bug: https://bugzilla.samba.org/show_bug.cgi?id=11599 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Pair-programmed-with: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Ralph Boehme <slow@samba.org>
* Remove function name from callers of DBG_*Christof Schmitt2015-10-211-2/+2
| | | | | | | It is now added automatically. Signed-off-by: Christof Schmitt <cs@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
* lib/util/charset: reduce loglevel for push_ucs2_talloc errorRalph Boehme2015-07-141-2/+2
| | | | | | | | | | | push_ucs2_talloc() may have failed because of EILSEQ, not a failing malloc. Log the failure with DBG_WARNING instead of level 0. Signed-off-by: Ralph Boehme <slow@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Tue Jul 14 03:59:05 CEST 2015 on sn-devel-104
* lib/util:charset/tests: improve strlen_m[_term[_null]]() testingStefan Metzmacher2015-07-031-1/+13
| | | | | | | | | | They differ in their "" vs. NULL handling. Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Guenther Deschner <gd@samba.org> Autobuild-User(master): Günther Deschner <gd@samba.org> Autobuild-Date(master): Fri Jul 3 05:02:45 CEST 2015 on sn-devel-104