summaryrefslogtreecommitdiff
path: root/testdata/source-chars-bad.c
diff options
context:
space:
mode:
authorDouglas Bagnall <douglas.bagnall@catalyst.net.nz>2021-11-16 20:23:04 +0000
committerRalph Boehme <slow@samba.org>2021-11-17 04:36:36 +0000
commitfccb105e079df7bfe22b6887262128ab9e81064d (patch)
treea09968f9d1c1651b5de429a7725f72b47444bb40 /testdata/source-chars-bad.c
parent1c8ea2448eaacb84c1c134e9597a5f873779b0a4 (diff)
downloadsamba-fccb105e079df7bfe22b6887262128ab9e81064d.tar.gz
pytests: check that we don't have bad format characters
Unicode has format control characters that affect the appearance — including the apparent order — of other characters. Some of these, like the bidi controls (for mixing left-to-right scripts with right-to-left scripts) can be used make text that means one thing look very much like it means another thing. The potential for duplicity using these characters has recently been publicised under the name “Trojan Source”, and CVE-2021-42694. A specific example, as it affects the Rust language is CVE-2021-42574. We don't have many format control characters in our code — in fact, just the non-breaking space (\u200b) and the redundant BOM thing (\ufeff), and this test aims to ensure we keep it that way. The test uses a series of allow-lists and deny-lists to check most text files for unknown format control characters. The filtering is fairly conservative but not exhaustive. For example, XML and text files are checked, but UTF-16 files are not. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Ralph Boehme <slow@samba.org>
Diffstat (limited to 'testdata/source-chars-bad.c')
0 files changed, 0 insertions, 0 deletions