diff options
author | Allan Sandfeld Jensen <allan.jensen@qt.io> | 2018-01-04 13:39:18 +0100 |
---|---|---|
committer | Allan Sandfeld Jensen <allan.jensen@qt.io> | 2018-01-05 09:48:31 +0000 |
commit | 5971c2e5153a73062e87ccf961166611385ac7a4 (patch) | |
tree | 0baf600d4cdfe81f261c8a548ff8ce2536157bb1 | |
parent | 28e6df60abee1df955e91614f558391b0f8141af (diff) | |
download | qtwebengine-chromium-5971c2e5153a73062e87ccf961166611385ac7a4.tar.gz |
[Backport] Change the script mixing policy to highly restrictive
The current script mixing policy (moderately restricitive) allows
mixing of Latin-ASCII and one non-Latin script (unless the non-Latin
script is Cyrillic or Greek).
This CL tightens up the policy to block mixing of Latin-ASCII and
a non-Latin script unless the non-Latin script is Chinese (Hanzi,
Bopomofo), Japanese (Kanji, Hiragana, Katakana) or Korean (Hangul,
Hanja).
Major gTLDs (.net/.org/.com) do not allow the registration of
a domain that has both Latin and a non-Latin script. The only
exception is names with Latin + Chinese/Japanese/Korean scripts.
The same is true of ccTLDs with IDNs.
Given the above registration rules of major gTLDs and ccTLDs, allowing
mixing of Latin and non-Latin other than CJK has no practical effect. In
the meantime, domain names in TLDs with a laxer policy on script mixing
would be subject to a potential spoofing attempt with the current
moderately restrictive script mixing policy. To protect users from those
risks, there are a few ad-hoc rules in place.
By switching to highly restrictive those ad-hoc rules can be removed
simplifying the IDN display policy implementation a bit.
This is also coordinated with Mozilla. See
https://bugzilla.mozilla.org/show_bug.cgi?id=1399939 .
BUG=726950, 756226, 756456, 756735, 770465
TEST=components_unittests --gtest_filter=*IDN*
Reviewed-on: https://chromium-review.googlesource.com/688825
Reviewed-by: Brett Wilson <brettw@chromium.org>
Reviewed-by: Lucas Garron <lgarron@chromium.org>
Commit-Queue: Jungshik Shin <jshin@chromium.org>
(CVE-2017-15424, CVE-2017-15425, CVE-2017-15426)
Change-Id: I8a79bf804c911c354a14dba34d7915c3e93ea59f
Reviewed-by: Michael BrĂ¼ning <michael.bruning@qt.io>
-rw-r--r-- | chromium/components/url_formatter/idn_spoof_checker.cc | 26 | ||||
-rw-r--r-- | chromium/components/url_formatter/url_formatter_unittest.cc | 12 |
2 files changed, 17 insertions, 21 deletions
diff --git a/chromium/components/url_formatter/idn_spoof_checker.cc b/chromium/components/url_formatter/idn_spoof_checker.cc index 376b2c65e05..fd4eb5a92b4 100644 --- a/chromium/components/url_formatter/idn_spoof_checker.cc +++ b/chromium/components/url_formatter/idn_spoof_checker.cc @@ -64,13 +64,14 @@ IDNSpoofChecker::IDNSpoofChecker() { // MIXED_SCRIPT_CONFUSABLE, WHOLE_SCRIPT_CONFUSABLE, MIXED_NUMBERS, ANY_CASE}) // This default configuration is adjusted below as necessary. - // Set the restriction level to moderate. It allows mixing Latin with another - // script (+ COMMON and INHERITED). Except for Chinese(Han + Bopomofo), - // Japanese(Hiragana + Katakana + Han), and Korean(Hangul + Han), only one - // script other than Common and Inherited can be mixed with Latin. Cyrillic - // and Greek are not allowed to mix with Latin. + // Set the restriction level to high. It allows mixing Latin with one logical + // CJK script (+ COMMON and INHERITED), but does not allow any other script + // mixing (e.g. Latin + Cyrillic, Latin + Armenian, Cyrillic + Greek). Note + // that each of {Han + Bopomofo} for Chinese, {Hiragana, Katakana, Han} for + // Japanese, and {Hangul, Han} for Korean is treated as a single logical + // script. // See http://www.unicode.org/reports/tr39/#Restriction_Level_Detection - uspoof_setRestrictionLevel(checker_, USPOOF_MODERATELY_RESTRICTIVE); + uspoof_setRestrictionLevel(checker_, USPOOF_HIGHLY_RESTRICTIVE); // Sets allowed characters in IDN labels and turns on USPOOF_CHAR_LIMIT. SetAllowedUnicodeSet(&status); @@ -234,14 +235,9 @@ bool IDNSpoofChecker::SafeToDisplayAsUnicode(base::StringPiece16 label, // label otherwise entirely in Katakna or Hiragana. // - Disallow U+0585 (Armenian Small Letter Oh) and U+0581 (Armenian Small // Letter Co) to be next to Latin. - // - Disallow Latin 'o' and 'g' next to Armenian. - // - Disalow mixing of Latin and Canadian Syllabary. - // - Disalow mixing of Latin and Tifinagh. // - Disallow combining diacritical mark (U+0300-U+0339) after a non-LGC // character. Other combining diacritical marks are not in the allowed // character set. - // - Disallow Arabic non-spacing marks after non-Arabic characters. - // - Disallow Hebrew non-spacing marks after non-Hebrew characters. // - Disallow U+0307 (dot above) after 'i', 'j', 'l' or dotless i (U+0131). // Dotless j (U+0237) is not in the allowed set to begin with. dangerous_pattern = new icu::RegexMatcher( @@ -254,15 +250,7 @@ bool IDNSpoofChecker::SafeToDisplayAsUnicode(base::StringPiece16 label, R"(^[\p{scx=kana}]+[\u3078-\u307a][\p{scx=kana}]+$|)" R"(^[\p{scx=hira}]+[\u30d8-\u30da][\p{scx=hira}]+$|)" R"([a-z]\u30fb|\u30fb[a-z]|)" - R"(^[\u0585\u0581]+[a-z]|[a-z][\u0585\u0581]+$|)" - R"([a-z][\u0585\u0581]+[a-z]|)" - R"(^[og]+[\p{scx=armn}]|[\p{scx=armn}][og]+$|)" - R"([\p{scx=armn}][og]+[\p{scx=armn}]|)" - R"([\p{sc=cans}].*[a-z]|[a-z].*[\p{sc=cans}]|)" - R"([\p{sc=tfng}].*[a-z]|[a-z].*[\p{sc=tfng}]|)" R"([^\p{scx=latn}\p{scx=grek}\p{scx=cyrl}][\u0300-\u0339]|)" - R"([^\p{scx=arab}][\u064b-\u0655\u0670]|)" - R"([^\p{scx=hebr}]\u05b4|)" R"([ijl\u0131]\u0307)", -1, US_INV), 0, status); diff --git a/chromium/components/url_formatter/url_formatter_unittest.cc b/chromium/components/url_formatter/url_formatter_unittest.cc index a2446efa2e0..7b484e02efb 100644 --- a/chromium/components/url_formatter/url_formatter_unittest.cc +++ b/chromium/components/url_formatter/url_formatter_unittest.cc @@ -205,10 +205,18 @@ const IDNTestCase idn_cases[] = { false}, // Devanagari + Latin {"xn--ab-3ofh8fqbj6h.in", L"ab\x0939\x093f\x0928\x094d\x0926\x0940.in", - true}, + false}, // Thai + Latin {"xn--ab-jsi9al4bxdb6n.th", - L"ab\x0e20\x0e32\x0e29\x0e32\x0e44\x0e17\x0e22.th", true}, + L"ab\x0e20\x0e32\x0e29\x0e32\x0e44\x0e17\x0e22.th", false}, + // Armenian + Latin + {"xn--bs-red.com", L"b\x057ds.com", false}, + // Tibetan + Latin + {"xn--foo-vkm.com", L"foo\x0f37.com", false}, + // Oriya + Latin + {"xn--fo-h3g.com", L"fo\x0b66.com", false}, + // Gujarati + Latin + {"xn--fo-isg.com", L"fo\x0ae6.com", false}, // <vitamin in Katakana>b1.com {"xn--b1-xi4a7cvc9f.com", L"\x30d3\x30bf\x30df\x30f3" |