diff options
author | Matthew Pickering <matthewtpickering@gmail.com> | 2021-06-11 10:48:25 +0100 |
---|---|---|
committer | Marge Bot <ben+marge-bot@smart-cactus.org> | 2021-06-23 02:58:35 -0400 |
commit | 7f6454fb8cd92b2b2ad4e88fa6d81e34d43edb9a (patch) | |
tree | 22dbe8c64e1761856913450fa297e2797c905fa2 /libraries/ghc-boot | |
parent | 87f57ecf2523e83d8dd9cad919a6f2010f630ad0 (diff) | |
download | haskell-7f6454fb8cd92b2b2ad4e88fa6d81e34d43edb9a.tar.gz |
Optimiser: Correctly deal with strings starting with unicode characters in exprConApp_maybe
For example:
"\0" is encoded to "C0 80", then the rule would correct use a decoding
function to work out the first character was "C0 80" but then just used
BS.tail so the rest of the string was "80". This resulted in
"\0" being transformed into '\C0\80' : unpackCStringUTF8# "80"
Which is obviously bogus.
I rewrote the function to call utf8UnconsByteString directly and avoid
the roundtrip through Faststring so now the head/tail is computed by the
same call.
Fixes #19976
Diffstat (limited to 'libraries/ghc-boot')
-rw-r--r-- | libraries/ghc-boot/GHC/Utils/Encoding.hs | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/libraries/ghc-boot/GHC/Utils/Encoding.hs b/libraries/ghc-boot/GHC/Utils/Encoding.hs index ba07784b0d..5eb3779b3b 100644 --- a/libraries/ghc-boot/GHC/Utils/Encoding.hs +++ b/libraries/ghc-boot/GHC/Utils/Encoding.hs @@ -22,6 +22,7 @@ module GHC.Utils.Encoding ( utf8CharStart, utf8DecodeChar, utf8DecodeByteString, + utf8UnconsByteString, utf8DecodeShortByteString, utf8CompareShortByteString, utf8DecodeStringLazy, @@ -169,6 +170,14 @@ utf8DecodeByteString :: ByteString -> [Char] utf8DecodeByteString (BS.PS fptr offset len) = utf8DecodeStringLazy fptr offset len +utf8UnconsByteString :: ByteString -> Maybe (Char, ByteString) +utf8UnconsByteString (BS.PS _ _ 0) = Nothing +utf8UnconsByteString (BS.PS fptr offset len) + = unsafeDupablePerformIO $ + withForeignPtr fptr $ \ptr -> do + let (c,n) = utf8DecodeChar (ptr `plusPtr` offset) + return $ Just (c, BS.PS fptr (offset + n) (len - n)) + utf8DecodeStringLazy :: ForeignPtr Word8 -> Int -> Int -> [Char] utf8DecodeStringLazy fp offset (I# len#) = unsafeDupablePerformIO $ do |