summaryrefslogtreecommitdiff
path: root/compiler/GHC/Utils/Encoding.hs
Commit message (Collapse)AuthorAgeFilesLines
* Initial ShortText code and conversion of package db codeWander Hillen2020-10-131-526/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Metric Decrease: Naperian T10421 T10421a T10547 T12150 T12234 T12425 T13035 T18140 T18304 T5837 T6048 T13253-spj T18282 T18223 T3064 T9961 Metric Increase T13701 HFSKJH
* Make Z-encoding comment into a noteLeif Metcalf2020-09-171-1/+2
|
* Remove "Ord FastString" instanceSylvain Henry2020-09-011-0/+33
| | | | | | | | | | | | | | | | | | | FastStrings can be compared in 2 ways: by Unique or lexically. We don't want to bless one particular way with an "Ord" instance because it leads to bugs (#18562) or to suboptimal code (e.g. using lexical comparison while a Unique comparison would suffice). UTF-8 encoding has the advantage that sorting strings by their encoded bytes also sorts them by their Unicode code points, without having to decode the actual code points. BUT GHC uses Modified UTF-8 which diverges from UTF-8 by encoding \0 as 0xC080 instead of 0x00 (to avoid null bytes in the middle of a String so that the string can still be null-terminated). This patch adds a new `utf8CompareShortByteString` function that performs sorting by bytes but that also takes Modified UTF-8 into account. It is much more performant than decoding the strings into [Char] to perform comparisons (which we did in the previous patch). Bump haddock submodule
* Encoding: Reformat utf8EncodeShortByteString to be more consistentDaniel Gröber2020-07-221-5/+5
|
* Encoding: Remove redundant use of withForeignPtrDaniel Gröber2020-07-221-2/+3
|
* Use IO constructor instead of `stToIO . ST`Daniel Gröber2020-07-221-1/+1
|
* Encoding: Add comment about tricky ForeignPtr lifetimeDaniel Gröber2020-07-221-0/+4
|
* Pass specialised utf8DecodeChar# to utf8DecodeLazy# for performanceDaniel Gröber2020-07-221-13/+11
| | | | | | | Currently we're passing a indexWord8OffAddr# type function to utf8DecodeLazy# which then passes it on to utf8DecodeChar#. By passing one of utf8DecodeCharAddr# or utf8DecodeCharByteArray# instead we benefit from the inlining and specialization already done for those.
* Use ShortByteString for FastStringDaniel Gröber2020-07-221-66/+105
| | | | | | | | | | | | | | | There are multiple reasons we want this: - Fewer allocations: ByteString has 3 fields, ShortByteString just has one. - ByteString memory is pinned: - This can cause fragmentation issues (see for example #13110) but also - makes using FastStrings in compact regions impossible. Metric Decrease: T5837 T12150 T12234 T12425
* Clarify leaf module names for new module hierarchyTakenobu Tani2020-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | This updates comments only. This patch replaces leaf module names according to new module hierarchy [1][2] as followings: * Expand leaf names to easily find the module path: for instance, `Id.hs` to `GHC.Types.Id`. * Modify leaf names according to new module hierarchy: for instance, `Convert.hs` to `GHC.ThToHs`. * Fix typo: for instance, `GHC.Core.TyCo.Rep.hs` to `GHC.Core.TyCo.Rep` See also !3375 [1]: https://gitlab.haskell.org/ghc/ghc/-/wikis/Make-GHC-codebase-more-modular [2]: https://gitlab.haskell.org/ghc/ghc/issues/13009
* Modules: Utils and Data (#13009)Sylvain Henry2020-04-261-0/+450
Update Haddock submodule Metric Increase: haddock.compiler