summaryrefslogtreecommitdiff
path: root/testdata
diff options
context:
space:
mode:
authorDouglas Bagnall <douglas.bagnall@catalyst.net.nz>2021-11-17 20:17:53 +0000
committerAndrew Bartlett <abartlet@samba.org>2021-12-03 18:53:43 +0000
commitdab828f63c0a6bf0bb96920fd36383f6cbe43179 (patch)
treefea5ec65dc1d59a852b37d691c386b039a40966d /testdata
parent0f7e58b0e29778711d3385adbba957c175c3bdef (diff)
downloadsamba-dab828f63c0a6bf0bb96920fd36383f6cbe43179.tar.gz
pytest/source_char: check for mixed direction text
As pointed out in https://lwn.net/Articles/875964, forbidding bidi marker characters is not always going to be enough to avoid right-to-left vs left-to-right confusion. Consider this: $ python -c's = "b = x # 2 * n * m"; print(s); print(s.replace("x", "א").replace("n", "ח"))' b = x # 2 * n * m b = א # 2 * ח * m Those two lines are semantically the same, with the Hebrew letters "א" and "ח" replacing "x" and "n". But they look like they mean different things. It is not enough to say we only allow these scripts (or indeed non-ascii) in strings and comments, as demonstrated in this example: $ python -c's = "b = \"x#\" # n"; print(s); print(s.replace("x", "א").replace("n", "ח"))' b = "x#" # n b = "א#" # ח where the second line is visually disordered but looks valid. Any series of neutral characters between teo RTL characters will be reversed (and possibly mirrored). In practice this affects one file, which is a text file for testing unicode normalisation. I think, for the reasons shown above, we are unlikely to see legitimate RTL code outside perhaps of documentation files — but if we do, we can add those files to the allow-list. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Dec 3 18:53:43 UTC 2021 on sn-devel-184
Diffstat (limited to 'testdata')
-rw-r--r--testdata/source-chars-bidi.py24
1 files changed, 24 insertions, 0 deletions
diff --git a/testdata/source-chars-bidi.py b/testdata/source-chars-bidi.py
new file mode 100644
index 00000000000..d728da503da
--- /dev/null
+++ b/testdata/source-chars-bidi.py
@@ -0,0 +1,24 @@
+# Used in samba.tests.source_chars to ensure bi-directional text is
+# caught. (make test TESTS=samba.tests.source_chars)
+
+x = א =2
+ח = n = 3
+
+a = x # 2 * n * m
+b = א # 2 * ח * m
+c = "x#" # n
+d = "א#" # ח
+e = f"x{x}n{n}"
+f = f"א{א}ח{ח}"
+
+print(a)
+print(b)
+print(c)
+print(d)
+print(e)
+print(f)
+
+assert a == b
+assert c == d.replace("א", "x")
+assert e[1] == f[1]
+assert e[3] == f[3]