summaryrefslogtreecommitdiff
path: root/compiler/codeGen
Commit message (Collapse)AuthorAgeFilesLines
* Rip out object splittingwip/kill-obj-splitterBen Gamari2018-11-273-49/+7
| | | | | | | | | | | | | | | The splitter is an evil Perl script that processes assembler code. Its job can be done better by the linker's --gc-sections flag. GHC passes this flag to the linker whenever -split-sections is passed on the command line. This is based on @DemiMarie's D2768. Fixes Trac #11315 Fixes Trac #9832 Fixes Trac #8964 Fixes Trac #8685 Fixes Trac #8629
* Revert "Remove redundant check in cgCase"Ömer Sinan Ağacan2018-11-261-4/+7
| | | | | | This reverts commit d13b7d60650cb84af11ee15b3f51c3511548cfdb. (See discussion in D5358)
* Implement late lambda liftSebastian Graf2018-11-233-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This implements a selective lambda-lifting pass late in the STG pipeline. Lambda lifting has the effect of avoiding closure allocation at the cost of having to make former free vars available at call sites, possibly enlarging closures surrounding call sites in turn. We identify beneficial cases by means of an analysis that estimates closure growth. There's a Wiki page at https://ghc.haskell.org/trac/ghc/wiki/LateLamLift. Reviewers: simonpj, bgamari, simonmar Reviewed By: simonpj Subscribers: rwbarton, carter GHC Trac Issues: #9476 Differential Revision: https://phabricator.haskell.org/D5224
* Fix unused-import warningsDavid Eichmann2018-11-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a fairly long-standing bug (dating back to 2015) in RdrName.bestImport, namely commit 9376249b6b78610db055a10d05f6592d6bbbea2f Author: Simon Peyton Jones <simonpj@microsoft.com> Date: Wed Oct 28 17:16:55 2015 +0000 Fix unused-import stuff in a better way In that patch got the sense of the comparison back to front, and thereby failed to implement the unused-import rules described in Note [Choosing the best import declaration] in RdrName This led to Trac #13064 and #15393 Fixing this bug revealed a bunch of unused imports in libraries; the ones in the GHC repo are part of this commit. The two important changes are * Fix the bug in bestImport * Modified the rules by adding (a) in Note [Choosing the best import declaration] in RdrName Reason: the previosu rules made Trac #5211 go bad again. And the new rule (a) makes sense to me. In unravalling this I also ended up doing a few other things * Refactor RnNames.ImportDeclUsage to use a [GlobalRdrElt] for the things that are used, rather than [AvailInfo]. This is simpler and more direct. * Rename greParentName to greParent_maybe, to follow GHC naming conventions * Delete dead code RdrName.greUsedRdrName Bumps a few submodules. Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27 Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5312
* LLVM: Use generic code for small size quot-rem opsPeter Trommler2018-11-221-2/+2
|
* Rename literal constructorsSylvain Henry2018-11-222-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | In a previous patch we replaced some built-in literal constructors (MachInt, MachWord, etc.) with a single LitNumber constructor. In this patch we replace the `Mach` prefix of the remaining constructors with `Lit` for consistency (e.g., LitChar, LitLabel, etc.). Sadly the name `LitString` was already taken for a kind of FastString and it would become misleading to have both `LitStr` (literal constructor renamed after `MachStr`) and `LitString` (FastString variant). Hence this patch renames the FastString variant `PtrString` (which is more accurate) and the literal string constructor now uses the least surprising `LitString` name. Both `Literal` and `LitString/PtrString` have recently seen breaking changes so doing this kind of renaming now shouldn't harm much. Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers Subscribers: tdammers, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4881
* Remove redundant check in cgCaseÖmer Sinan Ağacan2018-11-201-7/+4
| | | | | | | | | | | | | D5339 (part of D5324) removed the dead case binder analysis done during CoreToStg so this condition always holds now. Test Plan: Validated locally. Reviewers: sgraf, bgamari, simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5358
* Don't track free variables in STG syntax by defaultSebastian Graf2018-11-194-29/+35
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently, `CoreToStg` annotates `StgRhsClosure`s with their set of non-global free variables. This free variable information is only needed in the final code generation step (i.e. `StgCmm.codeGen`), which leads to transformations such as `StgCse` and `StgUnarise` having to maintain this information. This is tiresome and unnecessary, so this patch introduces a trees-to-grow-like approach that only introduces the free variable set into the syntax tree in the code gen pass, along with a free variable analysis on STG terms to generate that information. Fixes #15754. Reviewers: simonpj, osa1, bgamari, simonmar Reviewed By: osa1 Subscribers: rwbarton, carter GHC Trac Issues: #15754 Differential Revision: https://phabricator.haskell.org/D5324
* Introduce Int16# and Word16#Abhiroop Sarkar2018-11-172-0/+47
| | | | | | | | | | | | This builds off of D4475. Bumps binary submodule. Reviewers: carter, AndreasK, hvr, goldfire, bgamari, simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D5006
* Remove StgBinderInfo and related computation in CoreToStgÖmer Sinan Ağacan2018-11-124-97/+10
| | | | | | | | | | | | | | | | | | | | | | - The StgBinderInfo type was never used in the code gen, so the type, related computation in CoreToStg, and some comments about it are removed. See #15770 for more details. - Simplified CoreToStg after removing the StgBinderInfo computation: removed StgBinderInfo arguments and mfix stuff. The StgBinderInfo values were not used in the code gen, but I still run nofib just to make sure: 0.0% change in allocations and binary sizes. Test Plan: Validated locally Reviewers: simonpj, simonmar, bgamari, sgraf Reviewed By: sgraf Subscribers: AndreasK, sgraf, rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5232
* [LlvmCodeGen] Fixes for Int8#/Word8#Michal Terepeta2018-11-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes two isssues: - Using bitcast for MO_XX_Conv Arguments to a bitcast must be of the same size. We should be using `trunc` and `zext` instead. - Using unsupported MO_*_QuotRem for LLVM The two primops `MO_*_QuotRem` are not supported by the LLVM backend, so we shouldn't use them for `Int8#`/`Word8#` (just as we do not use them for `Int#`/`Word#`). Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: manually run tests with WAY=llvm Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, carter GHC Trac Issues: #15864 Differential Revision: https://phabricator.haskell.org/D5304
* Add Int8# and Word8#Michal Terepeta2018-11-022-14/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first step of implementing: https://github.com/ghc-proposals/ghc-proposals/pull/74 The main highlights/changes: primops.txt.pp gets two new sections for two new primitive types for signed and unsigned 8-bit integers (Int8# and Word8 respectively) along with basic arithmetic and comparison operations. PrimRep/RuntimeRep get two new constructors for them. All of the primops translate into the existing MachOPs. For CmmCalls the codegen will now zero-extend the values at call site (so that they can be moved to the right register) and then truncate them back their original width. x86 native codegen needed some updates, since it wasn't able to deal with the new widths, but all the changes are quite localized. LLVM backend seems to just work. This is the second attempt at merging this, after the first attempt in D4475 had to be backed out due to regressions on i386. Bumps binary submodule. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate (on both x86-{32,64}) Reviewers: bgamari, hvr, goldfire, simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5258
* Fix some broken links (#15733)Fangyi Zhou2018-10-251-1/+1
| | | | | | | | | | | | | | | | Change some URLs from hackage.haskell.org/trac to ghc.haskell.org/trac Test Plan: manually verify links work Reviewers: bgamari, simonmar, mpickering Reviewed By: bgamari, mpickering Subscribers: mpickering, rwbarton, carter GHC Trac Issues: #15733 Differential Revision: https://phabricator.haskell.org/D5257
* Add RubbishLit for absent bindings of UnliftedRepSebastian Graf2018-10-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Trac #9279 reminded us that the worker wrapper transformation copes really badly with absent unlifted boxed bindings. As `Note [Absent errors]` in WwLib.hs points out, we can't just use `absentError` for unlifted bindings because there is no bottom to hide the error in. So instead, we synthesise a new `RubbishLit` of type `forall (a :: TYPE 'UnliftedRep). a`, which code-gen may subsitute for any boxed value. We choose `()`, so that there is a good chance that the program crashes instead instead of leading to corrupt data, should absence analysis have been too optimistic (#11126). Reviewers: simonpj, hvr, goldfire, bgamari, simonmar Reviewed By: simonpj Subscribers: osa1, rwbarton, carter GHC Trac Issues: #15627, #9279, #4306, #11126 Differential Revision: https://phabricator.haskell.org/D5153
* Fix dataToTag# argument evaluationÖmer Sinan Ağacan2018-10-102-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | See #15696 for more details. We now always enter dataToTag# argument (done in generated Cmm, in StgCmmExpr). Any high-level optimisations on dataToTag# applications are done by the simplifier. Looking at tag bits (instead of reading the info table) for small types is left to another diff. Incorrect test T14626 is removed. We no longer do this optimisation (see comment:44, comment:45, comment:60). Comments and notes about special cases around dataToTag# are removed. We no longer have any special cases around it in Core. Other changes related to evaluating primops (seq# and dataToTag#) will be pursued in follow-up diffs. Test Plan: Validates with three regression tests Reviewers: simonpj, simonmar, hvr, bgamari, dfeuer Reviewed By: simonmar Subscribers: rwbarton, carter GHC Trac Issues: #15696 Differential Revision: https://phabricator.haskell.org/D5201
* Revert "Add Int8# and Word8#"Ben Gamari2018-10-092-62/+14
| | | | | | | | | This unfortunately broke i386 support since it introduced references to byte-sized registers that don't exist on that architecture. Reverts binary submodule This reverts commit 5d5307f943d7581d7013ffe20af22233273fba06.
* Add Int8# and Word8#Michal Terepeta2018-10-072-14/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first step of implementing: https://github.com/ghc-proposals/ghc-proposals/pull/74 The main highlights/changes: - `primops.txt.pp` gets two new sections for two new primitive types for signed and unsigned 8-bit integers (`Int8#` and `Word8` respectively) along with basic arithmetic and comparison operations. `PrimRep`/`RuntimeRep` get two new constructors for them. All of the primops translate into the existing `MachOP`s. - For `CmmCall`s the codegen will now zero-extend the values at call site (so that they can be moved to the right register) and then truncate them back their original width. - x86 native codegen needed some updates, since it wasn't able to deal with the new widths, but all the changes are quite localized. LLVM backend seems to just work. Bumps binary submodule. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate with new tests Reviewers: hvr, goldfire, bgamari, simonmar Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4475
* Add a missing write barrier to small array writesÖmer Sinan Ağacan2018-10-061-0/+1
| | | | | | | | | | | | | | | | Write barriers for large array writes were added in D2525, as a part of #12469. However it seems we forgot about small arrays. This patch adds the same write barrier to small array writes. Reviewers: simonmar, bgamari Reviewed By: simonmar Subscribers: rwbarton, carter GHC Trac Issues: #12469 Differential Revision: https://phabricator.haskell.org/D5209
* Finish stable splitDavid Feuer2018-08-291-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Long ago, the stable name table and stable pointer tables were one. Now, they are separate, and have significantly different implementations. I believe the time has come to finish the split that began in #7674. * Divide `rts/Stable` into `rts/StableName` and `rts/StablePtr`. * Give each table its own mutex. * Add FFI functions `hs_lock_stable_ptr_table` and `hs_unlock_stable_ptr_table` and document them. These are intended to replace the previously undocumented `hs_lock_stable_tables` and `hs_lock_stable_tables`, which are now documented as deprecated synonyms. * Make `eqStableName#` use pointer equality instead of unnecessarily comparing stable name table indices. Reviewers: simonmar, bgamari, erikd Reviewed By: bgamari Subscribers: rwbarton, carter GHC Trac Issues: #15555 Differential Revision: https://phabricator.haskell.org/D5084
* Fix precision of asinh/acosh/atanh by making them primopsArtem Pelenitsyn2018-08-211-0/+6
| | | | | | | | | | Reviewers: hvr, bgamari, simonmar, jrtc27 Reviewed By: bgamari Subscribers: alpmestan, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D5034
* Replace most occurences of foldl with foldl'.klebinger.andreas@gmx.at2018-08-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds foldl' to GhcPrelude and changes must occurences of foldl to foldl'. This leads to better performance especially for quick builds where GHC does not perform strictness analysis. It does change strictness behaviour when we use foldl' to turn a argument list into function applications. But this is only a drawback if code looks ONLY at the last argument but not at the first. And as the benchmarks show leads to fewer allocations in practice at O2. Compiler performance for Nofib: O2 Allocations: -1 s.d. ----- -0.0% +1 s.d. ----- -0.0% Average ----- -0.0% O2 Compile Time: -1 s.d. ----- -2.8% +1 s.d. ----- +1.3% Average ----- -0.8% O0 Allocations: -1 s.d. ----- -0.2% +1 s.d. ----- -0.1% Average ----- -0.2% Test Plan: ci Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal Reviewed By: bgamari, monoidal Subscribers: tdammers, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4929
* Turn on MonadFail desugaring by defaultHerbert Valerio Riedel2018-08-073-19/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This contains two commits: ---- Make GHC's code-base compatible w/ `MonadFail` There were a couple of use-sites which implicitly used pattern-matches in `do`-notation even though the underlying `Monad` didn't explicitly support `fail` This refactoring turns those use-sites into explicit case discrimations and adds an `MonadFail` instance for `UniqSM` (`UniqSM` was the worst offender so this has been postponed for a follow-up refactoring) --- Turn on MonadFail desugaring by default This finally implements the phase scheduled for GHC 8.6 according to https://prime.haskell.org/wiki/Libraries/Proposals/MonadFail#Transitionalstrategy This also preserves some tests that assumed MonadFail desugaring to be active; all ghc boot libs were already made compatible with this `MonadFail` long ago, so no changes were needed there. Test Plan: Locally performed ./validate --fast Reviewers: bgamari, simonmar, jrtc27, RyanGlScott Reviewed By: bgamari Subscribers: bgamari, RyanGlScott, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D5028
* Built-in Natural literals in CoreSylvain Henry2018-06-152-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for built-in Natural literals in Core. - Replace MachInt,MachWord, LitInteger, etc. with a single LitNumber constructor with a LitNumType field - Support built-in Natural literals - Add desugar warning for negative literals - Move Maybe(..) from GHC.Base to GHC.Maybe for module dependency reasons This patch introduces only a few rules for Natural literals (compared to Integer's rules). Factorization of the built-in rules for numeric literals will be done in another patch as this one is already big to review. Test Plan: validate test build with integer-simple Reviewers: hvr, bgamari, goldfire, Bodigrim, simonmar Reviewed By: bgamari Subscribers: phadej, simonpj, RyanGlScott, carter, hsyl20, rwbarton, thomie GHC Trac Issues: #14170, #14465 Differential Revision: https://phabricator.haskell.org/D4212
* Rename some mutable closure types for consistencyÖmer Sinan Ağacan2018-06-051-8/+8
| | | | | | | | | | | | | | | | | | | | | | | SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY SMALL_MUT_ARR_PTRS_FROZEN -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN MUT_ARR_PTRS_FROZEN0 -> MUT_ARR_PTRS_FROZEN_DIRTY MUT_ARR_PTRS_FROZEN -> MUT_ARR_PTRS_FROZEN_CLEAN Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR, MUT_ARR_PTRS). (alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR) Removed a few comments in Scav.c about FROZEN0 being on the mut_list because it's now clear from the closure type. Reviewers: bgamari, simonmar, erikd Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4784
* remove dead maybeIsLFConGabor Greif2018-05-291-6/+1
|
* Comments and refactoring onlySimon Marlow2018-05-171-1/+1
| | | | Addressing review comments on D4637
* Merge FUN_STATIC closure with its SRTSimon Marlow2018-05-162-18/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The idea here is to save a little code size and some work in the GC, by collapsing FUN_STATIC closures and their SRTs. This is (4) in a series; see D4632 for more details. There's a tradeoff here: more complexity in the compiler in exchange for a modest code size reduction (probably around 0.5%). Results: * GHC binary itself (statically linked) is 1% smaller * -0.2% binary sizes in nofib (-0.5% module sizes) Full nofib results comparing D4634 with this: P177 (ignore runtimes, these aren't stable on my laptop) Test Plan: validate, nofib Reviewers: bgamari, niteria, simonpj, erikd Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4637
* An overhaul of the SRT representationSimon Marlow2018-05-161-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Previously we would hvae a single big table of pointers per module, with a set of bitmaps to reference entries within it. The new representation is identical to a static constructor, which is much simpler for the GC to traverse, and we get to remove the complicated bitmap-traversal code from the GC. - Rewrite all the code to generate SRTs in CmmBuildInfoTables, and document it much better (see Note [SRTs]). This has been something I've wanted to do since we moved to the new code generator, I finally had the opportunity to finish it while on a transatlantic flight recently :) There are a series of 4 diffs: 1. D4632 (this one), which does the bulk of the changes 2. D4633 which adds support for smaller `CmmLabelDiffOff` constants 3. D4634 which takes advantage of D4632 and D4633 to save a word in info tables that have an SRT on x86_64. This is where most of the binary size improvement comes from. 4. D4637 which makes a further optimisation to merge some SRTs with static FUN closures. This adds some complexity and the benefits are fairly modest, so it's not clear yet whether we should do this. Results (after (3), on x86_64) - GHC itself (staticaly linked) is 5.2% smaller - -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176 - I measured the overhead of traversing all the static objects in a major GC in GHC itself by doing `replicateM_ 1000 performGC` as the first thing in `Main.main`. The new version was 5-10% faster, but the results did vary quite a bit. - I'm not sure if there's a compile-time difference, the results are too unreliable. Test Plan: validate Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1 Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4632
* Add 'addWordC#' PrimOpSebastian Graf2018-05-051-10/+62
| | | | | | | | | | | | | | | | | | | This is mostly for congruence with 'subWordC#' and '{add,sub}IntC#'. I found 'plusWord2#' while implementing this, which both lacks documentation and has a slightly different specification than 'addWordC#', which means the generic implementation is unnecessarily complex. While I was at it, I also added lacking meta-information on PrimOps and refactored 'subWordC#'s generic implementation to be branchless. Reviewers: bgamari, simonmar, jrtc27, dfeuer Reviewed By: bgamari, dfeuer Subscribers: dfeuer, thomie, carter Differential Revision: https://phabricator.haskell.org/D4592
* Add unaligned bytearray access primops. Fixes #4442.Reiner Pope2018-03-251-0/+53
| | | | | | | | | | | | Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: dfeuer, rwbarton, thomie, carter GHC Trac Issues: #4442 Differential Revision: https://phabricator.haskell.org/D4488
* Fix seq# case of exprOkForSpeculationSimon Peyton Jones2018-03-201-6/+9
| | | | | | | | | This subtle patch fixes Trac #5129 (again; comment:20 and following). I took the opportunity to document seq# properly; see Note [seq# magic] in PrelRules, and Note [seq# and expr_ok] in CoreUtils.
* Improve accuracy of get/setAllocationCounterBen Gamari2018-03-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: get/setAllocationCounter didn't take into account allocations in the current block. This was known at the time, but it turns out to be important to have more accuracy when using these in a fine-grained way. Test Plan: New unit test to test incrementally larger allocaitons. Before I got results like this: ``` +0 +0 +0 +0 +0 +4096 +0 +0 +0 +0 +0 +4064 +0 +0 +4088 +4056 +0 +0 +0 +4088 +4096 +4056 +4096 ``` Notice how the results aren't always monotonically increasing. After this patch: ``` +344 +416 +488 +560 +632 +704 +776 +848 +920 +992 +1064 +1136 +1208 +1280 +1352 +1424 +1496 +1568 +1640 +1712 +1784 +1856 +1928 +2000 +2072 +2144 ``` Reviewers: hvr, erikd, simonmar, jrtc27, trommler Reviewed By: simonmar Subscribers: trommler, jrtc27, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4363
* Get rid of more CPP in cmm/ and codeGen/Michal Terepeta2018-03-198-33/+11
| | | | | | | | | | | | | | | | | | This removes a bunch of unnecessary includes of `HsVersions.h` along with unnecessary CPP (e.g., due to checking for DEBUG which can be achieved by looking at `debugIsOn`) Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4462
* Fix interpreter with profilingSimon Marlow2018-03-063-16/+28
| | | | | | | | | | | | | | | | | | | This was broken by D3746 and/or D3809, but unfortunately we didn't notice because CI at the time wasn't building the profiling way. Test Plan: ``` cd testsuite/test/profiling/should_run make WAY=ghci-ext-prof ``` Reviewers: bgamari, michalt, hvr, erikd Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14705 Differential Revision: https://phabricator.haskell.org/D4437
* myThreadId# is trivial; make it an inline primopSimon Marlow2018-02-181-0/+3
| | | | | | | | | | | | | | | The pattern `threadCapability =<< myThreadId` is used a lot in code that uses `hs_try_putmvar`, I want to make it cheaper. Test Plan: validate Reviewers: bgamari, erikd Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4381
* Tidy up and consolidate canned CmmReg and CmmGlobalsSimon Marlow2018-02-189-66/+45
| | | | | | | | | | | | Test Plan: validate Reviewers: bgamari, erikd Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4380
* Add ptr-eq short-cut to `compareByteArrays#` primitiveHerbert Valerio Riedel2018-01-261-0/+43
| | | | | | | | | | | | | | | | | | This is an obvious optimisation whose overhead is neglectable but which significantly simplifies the common uses of `compareByteArrays#` which would otherwise require to make *careful* use of `reallyUnsafePtrEquality#` or (equally fragile) `byteArrayContents#` which can result in less optimal assembler code being generated. Test Plan: carefully examined generated cmm/asm code; validate via phab Reviewers: alexbiehl, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4319
* Add new mbmi and mbmi2 compiler flagsJohn Ky2018-01-211-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Implement x86 code generator for pdep and pext. Properly initialise bmiVersion field. pdep and pext test cases Fix pattern match for pdep and pext instructions Fix build of pdep and pext code for 32-bit architectures Test Plan: Validate Reviewers: austin, simonmar, bgamari, angerman Reviewed By: bgamari Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4236
* Remove unused extern cost centre collectionÖmer Sinan Ağacan2018-01-181-1/+1
| | | | | | | | | | Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: alexbiehl, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4309
* Revert "Improve accuracy of get/setAllocationCounter"Ben Gamari2018-01-181-2/+2
| | | | This reverts commit a1a689dda48113f3735834350fb562bb1927a633.
* Typos in commentsGabor Greif2018-01-171-1/+1
|
* Improve accuracy of get/setAllocationCounterSimon Marlow2018-01-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: get/setAllocationCounter didn't take into account allocations in the current block. This was known at the time, but it turns out to be important to have more accuracy when using these in a fine-grained way. Test Plan: New unit test to test incrementally larger allocaitons. Before I got results like this: ``` +0 +0 +0 +0 +0 +4096 +0 +0 +0 +0 +0 +4064 +0 +0 +4088 +4056 +0 +0 +0 +4088 +4096 +4056 +4096 ``` Notice how the results aren't always monotonically increasing. After this patch: ``` +344 +416 +488 +560 +632 +704 +776 +848 +920 +992 +1064 +1136 +1208 +1280 +1352 +1424 +1496 +1568 +1640 +1712 +1784 +1856 +1928 +2000 +2072 +2144 ``` Reviewers: niteria, bgamari, hvr, erikd Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4288
* Cache the number of data cons in DataTyCon and SumTyConBartosz Nitka2018-01-041-8/+3
| | | | | | | | | | | | | | | | This is a follow-up after faf60e85 - Make tagForCon non-linear. On the mailing list @simonpj suggested to solve the linear behavior by caching the sizes. Test Plan: ./validate Reviewers: simonpj, simonmar, bgamari, austin Reviewed By: simonpj Subscribers: carter, goldfire, rwbarton, thomie, simonpj Differential Revision: https://phabricator.haskell.org/D4131
* Get rid of some stuttering in comments and docsGabor Greif2017-12-191-1/+1
|
* typos in local varGabor Greif2017-12-141-1/+1
|
* Revert "Add new mbmi and mbmi2 compiler flags"Ben Gamari2017-11-221-78/+0
| | | | | | This broke the 32-bit build. This reverts commit f5dc8ccc29429d0a1d011f62b6b430f6ae50290c.
* Add new mbmi and mbmi2 compiler flagsJohn Ky2017-11-151-0/+78
| | | | | | | | | | | | | | | | | This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Test Plan: Validate Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd Reviewed By: bgamari Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4063
* Allow packing constructor fieldsMichal Terepeta2017-10-294-36/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is another step for fixing #13825 and is based on D38 by Simon Marlow. The change allows storing multiple constructor fields within the same word. This currently applies only to `Float`s, e.g., ``` data Foo = Foo {-# UNPACK #-} !Float {-# UNPACK #-} !Float ``` on 64-bit arch, will now store both fields within the same constructor word. For `WordX/IntX` we'll need to introduce new primop types. Main changes: - We now use sizes in bytes when we compute the offsets for constructor fields in `StgCmmLayout` and introduce padding if necessary (word-sized fields are still word-aligned) - `ByteCodeGen` had to be updated to correctly construct the data types. This required some new bytecode instructions to allow pushing things that are not full words onto the stack (and updating `Interpreter.c`). Note that we only use the packed stuff when constructing data types (i.e., for `PACK`), in all other cases the behavior should not change. - `RtClosureInspect` was changed to handle the new layout when extracting subterms. This seems to be used by things like `:print`. I've also added a test for this. - I deviated slightly from Simon's approach and use `PrimRep` instead of `ArgRep` for computing the size of fields. This seemed more natural and in the future we'll probably want to introduce new primitive types (e.g., `Int8#`) and `PrimRep` seems like a better place to do that (where we already have `Int64Rep` for example). `ArgRep` on the other hand seems to be more focused on calling functions. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar, austin, hvr, goldfire, erikd Reviewed By: bgamari Subscribers: maoe, rwbarton, thomie GHC Trac Issues: #13825 Differential Revision: https://phabricator.haskell.org/D3809
* Turn `compareByteArrays#` out-of-line primop into inline primopalexbiehl2017-10-291-1/+40
| | | | | | | | | | | | Depends on D4090 Reviewers: austin, bgamari, erikd, simonmar, alexbiehl Reviewed By: bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4091
* Make tagForCon non-linearBartosz Nitka2017-10-271-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Computing the number of constructors for TyCon is linear in the number of constructors. That's wasteful if all you want to check is if that number is smaller than what fits in tag bits (usually 8 things). What this change does is to use a function that can determine the ineqaulity without computing the size. This improves compile time on a module with a data type that has 10k constructors. The variance in total time is (suspiciously) high, but going by the best of 3 the numbers are 8.186s vs 7.511s. For 1000 constructors the difference isn't noticeable: 0.646s vs 0.624s. The hot spots were cgDataCon and cgEnumerationTyCon where tagForCon is called in a loop. One alternative would be to pass down the size. Test Plan: harbormaster Reviewers: bgamari, simonmar, austin Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4116