summaryrefslogtreecommitdiff
path: root/compiler/basicTypes
diff options
context:
space:
mode:
authorSylvain Henry <sylvain@haskus.fr>2019-01-18 12:30:31 +0100
committerMarge Bot <ben+marge-bot@smart-cactus.org>2019-03-08 14:05:10 -0500
commit224a6b864c6aa0d851fcbf79469e5702b1116dbc (patch)
tree888b79e9f177c988d06365d0a218c41878225467 /compiler/basicTypes
parent5be7ad7861c8d39f60b7101fd8d8e816ff50353a (diff)
downloadhaskell-224a6b864c6aa0d851fcbf79469e5702b1116dbc.tar.gz
TH: support raw bytes literals (#14741)
GHC represents String literals as ByteString internally for efficiency reasons. However, until now it wasn't possible to efficiently create large string literals with TH (e.g. to embed a file in a binary, cf #14741): TH code had to unpack the bytes into a [Word8] that GHC then had to re-pack into a ByteString. This patch adds the possibility to efficiently create a "string" literal from raw bytes. We get the following compile times for different sizes of TH created literals: || Size || Before || After || Gain || || 30K || 2.307s || 2.299 || 0% || || 3M || 3.073s || 2.400s || 21% || || 30M || 8.517s || 3.390s || 60% || Ticket #14741 can be fixed if the original code uses this new TH feature.
Diffstat (limited to 'compiler/basicTypes')
-rw-r--r--compiler/basicTypes/Literal.hs14
1 files changed, 14 insertions, 0 deletions
diff --git a/compiler/basicTypes/Literal.hs b/compiler/basicTypes/Literal.hs
index bfc3783d2b..8dd6708eda 100644
--- a/compiler/basicTypes/Literal.hs
+++ b/compiler/basicTypes/Literal.hs
@@ -188,6 +188,20 @@ Note [Natural literals]
~~~~~~~~~~~~~~~~~~~~~~~
Similar to Integer literals.
+Note [String literals]
+~~~~~~~~~~~~~~~~~~~~~~
+
+String literals are UTF-8 encoded and stored into ByteStrings in the following
+ASTs: Haskell, Core, Stg, Cmm. TH can also emit ByteString based string literals
+with the BytesPrimL constructor (see #14741).
+
+It wasn't true before as [Word8] was used in Cmm AST and in TH which was quite
+bad for performance with large strings (see #16198 and #14741).
+
+To include string literals into output objects, the assembler code generator has
+to embed the UTF-8 encoded binary blob. See Note [Embedding large binary blobs]
+for more details.
+
-}
instance Binary LitNumType where