summaryrefslogtreecommitdiff
path: root/compiler/nativeGen/CFG.hs
Commit message (Collapse)AuthorAgeFilesLines
* Modules: CmmToAsm (#13009)Sylvain Henry2020-02-241-1320/+0
|
* Modules: Driver (#13009)Sylvain Henry2020-02-211-1/+1
| | | | submodule updates: nofib, haddock
* compiler: Qualify imports of Data.ListBen Gamari2020-02-081-1/+1
|
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-251-13/+12
|
* Fix more typos, via an improved Levenshtein-style correctorBrian Wignall2020-01-121-9/+9
|
* Fix bug in the x86 backend involving the CFG.Andreas Klebinger2019-10-231-39/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is part two of fixing #17334. There are two parts to this commit: - A bugfix for computing loop levels - A bugfix of basic block invariants in the NCG. ----------------------------------------------------------- In the first bug we ended up with a CFG of the sort: [A -> B -> C] This was represented via maps as fromList [(A,B),(B,C)] and later transformed into a adjacency array. However the transformation did not include block C in the array (since we only looked at the keys of the map). This was still fine until we tried to look up successors for C and tried to read outside of the array bounds when accessing C. In order to prevent this in the future I refactored to code to include all nodes as keys in the map representation. And make this a invariant which is checked in a few places. Overall I expect this to make the code more robust as now any failed lookup will represent an error, versus failed lookups sometimes being expected and sometimes not. In terms of performance this makes some things cheaper (getting a list of all nodes) and others more expensive (adding a new edge). Overall this adds up to no noteable performance difference. ----------------------------------------------------------- Part 2: When the NCG generated a new basic block, it did not always insert a NEWBLOCK meta instruction in the stream which caused a quite subtle bug. During instruction selection a statement `s` in a block B with control of the sort: B -> C will sometimes result in control flow of the sort: ┌ < ┐ v ^ B -> B1 ┴ -> C as is the case for some atomic operations. Now to keep the CFG in sync when introducing B1 we clearly want to insert it between B and C. However there is a catch when we have to deal with self loops. We might start with code and a CFG of these forms: loop: stmt1 ┌ < ┐ .... v ^ stmtX loop ┘ stmtY .... goto loop: Now we introduce B1: ┌ ─ ─ ─ ─ ─┐ loop: │ ┌ < ┐ │ instrs v │ │ ^ .... loop ┴ B1 ┴ ┘ instrsFromX stmtY goto loop: This is simple, all outgoing edges from loop now simply start from B1 instead and the code generator knows which new edges it introduced for the self loop of B1. Disaster strikes if the statement Y follows the same pattern. If we apply the same rule that all outgoing edges change then we end up with: loop ─> B1 ─> B2 ┬─┐ │ │ └─<┤ │ │ └───<───┘ │ └───────<────────┘ This is problematic. The edge B1->B1 is modified as expected. However the modification is wrong! The assembly in this case looked like this: _loop: <instrs> _B1: ... cmpxchgq ... jne _B1 <instrs> <end _B1> _B2: ... cmpxchgq ... jne _B2 <instrs> jmp loop There is no edge _B2 -> _B1 here. It's still a self loop onto _B1. The problem here is that really B1 should be two basic blocks. Otherwise we have control flow in the *middle* of a basic block. A contradiction! So to account for this we add yet another basic block marker: _B: <instrs> _B1: ... cmpxchgq ... jne _B1 jmp _B1' _B1': <instrs> <end _B1> _B2: ... Now when inserting B2 we will only look at the outgoing edges of B1' and everything will work out nicely. You might also wonder why we don't insert jumps at the end of _B1'. There is no way another block ends up jumping to the labels _B1 or _B2 since they are essentially invisible to other blocks. View them as control flow labels local to the basic block if you'd like. Not doing this ultimately caused (part 2 of) #17334.
* Add loop level analysis to the NCG backend.klebinger.andreas@gmx.at2019-10-161-64/+684
| | | | | | | | | | | | | | | | | | | | | | | | | | | For backends maintaining the CFG during codegen we can now find loops and their nesting level. This is based on the Cmm CFG and dominator analysis. As a result we can estimate edge frequencies a lot better for methods, resulting in far better code layout. Speedup on nofib: ~1.5% Increase in compile times: ~1.9% To make this feasible this commit adds: * Dominator analysis based on the Lengauer-Tarjan Algorithm. * An algorithm estimating global edge frequences from branch probabilities - In CFG.hs A few static branch prediction heuristics: * Expect to take the backedge in loops. * Expect to take the branch NOT exiting a loop. * Expect integer vs constant comparisons to be false. We also treat heap/stack checks special for branch prediction to avoid them being treated as loops.
* Fix #17334 where NCG did not properly update the CFG.wip/andreask/17334Andreas Klebinger2019-10-131-3/+4
| | | | | | | | | | | | | Statements can change the basic block in which instructions are placed during instruction selection. We have to keep track of this switch of the current basic block as we need this information in order to properly update the CFG. This commit implements this change and fixes #17334. We do so by having stmtToInstr return the new block id if a statement changed the basic block.
* Remove unused imports of the form 'import foo ()' (Fixes #17065)James Foster2019-08-151-1/+1
| | | | | | | | | | | These kinds of imports are necessary in some cases such as importing instances of typeclasses or intentionally creating dependencies in the build system, but '-Wunused-imports' can't detect when they are no longer needed. This commit removes the unused ones currently in the code base (not including test files or submodules), with the hope that doing so may increase parallelism in the build system by removing unnecessary dependencies.
* Rip out object splittingBen Gamari2019-03-051-6/+1
| | | | | | | | | | | | | | | The splitter is an evil Perl script that processes assembler code. Its job can be done better by the linker's --gc-sections flag. GHC passes this flag to the linker whenever -split-sections is passed on the command line. This is based on @DemiMarie's D2768. Fixes Trac #11315 Fixes Trac #9832 Fixes Trac #8964 Fixes Trac #8685 Fixes Trac #8629
* Allow resizing the stack for the graph allocator.klebinger.andreas@gmx.at2019-02-081-0/+18
| | | | | | | | | | The graph allocator now dynamically resizes the number of stack slots when running into the limit. This fixes #8657. Also loop membership of basic blocks is now available in the register allocator for cost heuristics.
* A few typofixesGabor Greif2019-01-231-1/+1
|
* Taming the Kind Inference MonsterSimon Peyton Jones2018-11-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My original goal was (Trac #15809) to move towards using level numbers as the basis for deciding which type variables to generalise, rather than searching for the free varaibles of the environment. However it has turned into a truly major refactoring of the kind inference engine. Let's deal with the level-numbers part first: * Augment quantifyTyVars to calculate the type variables to quantify using level numbers, and compare the result with the existing approach. That is; no change in behaviour, just a WARNing if the two approaches give different answers. * To do this I had to get the level number right when calling quantifyTyVars, and this entailed a bit of care, especially in the code for kind-checking type declarations. * However, on the way I was able to eliminate or simplify a number of calls to solveEqualities. This work is incomplete: I'm not /using/ level numbers yet. When I subsequently get rid of any remaining WARNings in quantifyTyVars, that the level-number answers differ from the current answers, then I can rip out the current "free vars of the environment" stuff. Anyway, this led me into deep dive into kind inference for type and class declarations, which is an increasingly soggy part of GHC. Richard already did some good work recently in commit 5e45ad10ffca1ad175b10f6ef3327e1ed8ba25f3 Date: Thu Sep 13 09:56:02 2018 +0200 Finish fix for #14880. The real change that fixes the ticket is described in Note [Naughty quantification candidates] in TcMType. but I kept turning over stones. So this patch has ended up with a pretty significant refactoring of that code too. Kind inference for types and classes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Major refactoring in the way we generalise the inferred kind of a TyCon, in kcTyClGroup. Indeed, I made it into a new top-level function, generaliseTcTyCon. Plus a new Note to explain it Note [Inferring kinds for type declarations]. * We decided (Trac #15592) not to treat class type variables specially when dealing with Inferred/Specified/Required for associated types. That simplifies things quite a bit. I also rewrote Note [Required, Specified, and Inferred for types] * Major refactoring of the crucial function kcLHsQTyVars: I split it into kcLHsQTyVars_Cusk and kcLHsQTyVars_NonCusk because the two are really quite different. The CUSK case is almost entirely rewritten, and is much easier because of our new decision not to treat the class variables specially * I moved all the error checks from tcTyClTyVars (which was a bizarre place for it) into generaliseTcTyCon and/or the CUSK case of kcLHsQTyVars. Now tcTyClTyVars is extremely simple. * I got rid of all the all the subtleties in tcImplicitTKBndrs. Indeed now there is no difference between tcImplicitTKBndrs and kcImplicitTKBndrs; there is now a single bindImplicitTKBndrs. Same for kc/tcExplicitTKBndrs. None of them monkey with level numbers, nor build implication constraints. scopeTyVars is gone entirely, as is kcLHsQTyVarBndrs. It's vastly simpler. I found I could get rid of kcLHsQTyVarBndrs entirely, in favour of the bnew bindExplicitTKBndrs. Quantification ~~~~~~~~~~~~~~ * I now deal with the "naughty quantification candidates" of the previous patch in candidateQTyVars, rather than in quantifyTyVars; see Note [Naughty quantification candidates] in TcMType. I also killed off closeOverKindsCQTvs in favour of the same strategy that we use for tyCoVarsOfType: namely, close over kinds at the occurrences. And candidateQTyVars no longer needs a gbl_tvs argument. * Passing the ContextKind, rather than the expected kind itself, to tc_hs_sig_type_and_gen makes it easy to allocate the expected result kind (when we are in inference mode) at the right level. Type families ~~~~~~~~~~~~~~ * I did a major rewrite of the impenetrable tcFamTyPats. The result is vastly more comprehensible. * I got rid of kcDataDefn entirely, quite a big function. * I re-did the way that checkConsistentFamInst works, so that it allows alpha-renaming of invisible arguments. * The interaction of kind signatures and family instances is tricky. Type families: see Note [Apparently-nullary families] Data families: see Note [Result kind signature for a data family instance] and Note [Eta-reduction for data families] * The consistent instantation of an associated type family is tricky. See Note [Checking consistent instantiation] and Note [Matching in the consistent-instantation check] in TcTyClsDecls. It's now checked in TcTyClsDecls because that is when we have the relevant info to hand. * I got tired of the compromises in etaExpandFamInst, so I did the job properly by adding a field cab_eta_tvs to CoAxBranch. See Coercion.etaExpandCoAxBranch. tcInferApps and friends ~~~~~~~~~~~~~~~~~~~~~~~ * I got rid of the mysterious and horrible ClsInstInfo argument to tcInferApps, checkExpectedKindX, and various checkValid functions. It was horrible! * I got rid of [Type] result of tcInferApps. This list was used only in tcFamTyPats, when checking the LHS of a type instance; and if there is a cast in the middle, the list is meaningless. So I made tcInferApps simpler, and moved the complexity (not much) to tcInferApps. Result: tcInferApps is now pretty comprehensible again. * I refactored the many function in TcMType that instantiate skolems. Smaller things * I rejigged the error message in checkValidTelescope; I think it's quite a bit better now. * checkValidType was not rejecting constraints in a kind signature forall (a :: Eq b => blah). blah2 That led to further errors when we then do an ambiguity check. So I make checkValidType reject it more aggressively. * I killed off quantifyConDecl, instead calling kindGeneralize directly. * I fixed an outright bug in tyCoVarsOfImplic, where we were not colleting the tyvar of the kind of the skolems * Renamed ClsInstInfo to AssocInstInfo, and made it into its own data type * Some fiddling around with pretty-printing of family instances which was trickier than I thought. I wanted wildcards to print as plain "_" in user messages, although they each need a unique identity in the CoAxBranch. Some other oddments * Refactoring around the trace messages from reportUnsolved. * A bit of extra tc-tracing in TcHsSyn.commitFlexi This patch fixes a raft of bugs, and includes tests for them. * #14887 * #15740 * #15764 * #15789 * #15804 * #15817 * #15870 * #15874 * #15881
* Fixup the new code layout patch for SplitObjs.klebinger.andreas@gmx.at2018-11-221-3/+7
| | | | | | | | | | | | | | | When splitting objects we sometimes generate dummy CmmProcs containing bottom in some fields. Code introduced in the new code layout patch looked at these which blew up the compiler. Now we instead check first if the function actually contains code. Reviewers: bgamari Subscribers: simonpj, rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5357
* NCG: New code layout algorithm.Andreas Klebinger2018-11-171-0/+635
Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726