diff options
author | Sylvain Henry <sylvain@haskus.fr> | 2019-01-18 12:30:31 +0100 |
---|---|---|
committer | Marge Bot <ben+marge-bot@smart-cactus.org> | 2019-03-08 14:05:10 -0500 |
commit | 224a6b864c6aa0d851fcbf79469e5702b1116dbc (patch) | |
tree | 888b79e9f177c988d06365d0a218c41878225467 /compiler/basicTypes | |
parent | 5be7ad7861c8d39f60b7101fd8d8e816ff50353a (diff) | |
download | haskell-224a6b864c6aa0d851fcbf79469e5702b1116dbc.tar.gz |
TH: support raw bytes literals (#14741)
GHC represents String literals as ByteString internally for efficiency
reasons. However, until now it wasn't possible to efficiently create
large string literals with TH (e.g. to embed a file in a binary, cf #14741):
TH code had to unpack the bytes into a [Word8] that GHC then had to re-pack
into a ByteString.
This patch adds the possibility to efficiently create a "string" literal
from raw bytes. We get the following compile times for different sizes
of TH created literals:
|| Size || Before || After || Gain ||
|| 30K || 2.307s || 2.299 || 0% ||
|| 3M || 3.073s || 2.400s || 21% ||
|| 30M || 8.517s || 3.390s || 60% ||
Ticket #14741 can be fixed if the original code uses this new TH feature.
Diffstat (limited to 'compiler/basicTypes')
-rw-r--r-- | compiler/basicTypes/Literal.hs | 14 |
1 files changed, 14 insertions, 0 deletions
diff --git a/compiler/basicTypes/Literal.hs b/compiler/basicTypes/Literal.hs index bfc3783d2b..8dd6708eda 100644 --- a/compiler/basicTypes/Literal.hs +++ b/compiler/basicTypes/Literal.hs @@ -188,6 +188,20 @@ Note [Natural literals] ~~~~~~~~~~~~~~~~~~~~~~~ Similar to Integer literals. +Note [String literals] +~~~~~~~~~~~~~~~~~~~~~~ + +String literals are UTF-8 encoded and stored into ByteStrings in the following +ASTs: Haskell, Core, Stg, Cmm. TH can also emit ByteString based string literals +with the BytesPrimL constructor (see #14741). + +It wasn't true before as [Word8] was used in Cmm AST and in TH which was quite +bad for performance with large strings (see #16198 and #14741). + +To include string literals into output objects, the assembler code generator has +to embed the UTF-8 encoded binary blob. See Note [Embedding large binary blobs] +for more details. + -} instance Binary LitNumType where |