| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Add StgToCmm module hierarchy. Platform modules that are used in several
other places (NCG, LLVM codegen, Cmm transformations) are put into
GHC.Platform.
|
|
|
|
|
|
|
|
|
|
|
| |
These kinds of imports are necessary in some cases such as
importing instances of typeclasses or intentionally creating
dependencies in the build system, but '-Wunused-imports' can't
detect when they are no longer needed. This commit removes the
unused ones currently in the code base (not including test files
or submodules), with the hope that doing so may increase
parallelism in the build system by removing unnecessary
dependencies.
|
|
|
|
|
|
|
| |
ghc-pkg needs to be aware of platforms so it can figure out which
subdire within the user package db to use. This is admittedly
roundabout, but maybe Cabal could use the same notion of a platform as
GHC to good affect too.
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds foldl' to GhcPrelude and changes must occurences
of foldl to foldl'. This leads to better performance especially
for quick builds where GHC does not perform strictness analysis.
It does change strictness behaviour when we use foldl' to turn
a argument list into function applications. But this is only a
drawback if code looks ONLY at the last argument but not at the first.
And as the benchmarks show leads to fewer allocations in practice
at O2.
Compiler performance for Nofib:
O2 Allocations:
-1 s.d. ----- -0.0%
+1 s.d. ----- -0.0%
Average ----- -0.0%
O2 Compile Time:
-1 s.d. ----- -2.8%
+1 s.d. ----- +1.3%
Average ----- -0.8%
O0 Allocations:
-1 s.d. ----- -0.2%
+1 s.d. ----- -0.1%
Average ----- -0.2%
Test Plan: ci
Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal
Reviewed By: bgamari, monoidal
Subscribers: tdammers, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4929
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Fix the naming and comments to indicate that we are calculating
*reverse* postorder (and not the standard postorder).
- Rewrite the calculation to avoid CPS code. I found it fairly
difficult to understand and the new one seems faster (according to
nofib, decreases compiler allocations by 0.2%)
- Remove `LabelsPtr`, which seems unnecessary and could be *really*
confusing. For instance, previously:
`postorder_dfs_from <block with label X>`
and
`postorder_dfs_from <label X>`
would actually mean quite different things (and give different
results).
- Change the `Dataflow` module to always use entry of the graph for
reverse postorder calculation. This should be the only change in
behavior of this commit.
Previously, if the caller provided initial facts for some of the
labels, we would use those labels for our postorder calculation.
However, I don't think that's correct in general - if the initial
facts did not contain the entry of the graph, we would never analyze
the blocks reachable from the entry but unreachable from the labels
provided with the initial facts. It seems that the only analysis that
used this was proc-point analysis, which I think would always include
the entry block (so I don't think there's any bug due to this).
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4464
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: bgamari, AndreasK, erikd
Reviewed By: AndreasK
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4398
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This basically replaces all uses of `foldl` with `foldl'`. I've looked
at all the call sites and there doesn't seem to be any reason to prefer
the lazy version.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4463
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: bgamari, erikd
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4380
|
|
|
|
| |
... in CmmSink
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CmmProcs which have *lots* of local variables take a considerable
amount of time in CmmSink. This was noticed by @tdammers in #7258
while compiling files with large records (~200-400 fields).
Before:
```
Sun Oct 29 19:58 2017 Time and Allocation Profiling Report (Final)
ghc-stage2 +RTS -p -RTS
-B/Users/alexbiehl/git/ghc/inplace/lib /Users/alexbiehl/Downloads/W2.hs
-fforce-recomp -O2
total time = 26.00 secs (25996 ticks @ 1000 us, 1 processor)
total alloc = 14,921,627,912 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
sink CmmPipeline
compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 55.7 15.9
SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74 19.5 30.6
FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66 4.2 9.0
RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 4.0 11.1
pprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 2.8 6.3
NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63 1.6 3.7
OccAnal SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 1.5 3.5
StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 1.2 2.4
regLiveness AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 1.2 1.9
genMachCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 0.9 1.8
NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78 0.9 2.1
CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 0.8 1.9
```
After:
```
Sun Oct 29 19:18 2017 Time and Allocation Profiling Report (Final)
ghc-stage2 +RTS -p -RTS
-B/Users/alexbiehl/git/ghc/inplace/lib /Users/alexbiehl/Downloads/W2.hs
-fforce-recomp -O2
total time = 13.31 secs (13307 ticks @ 1000 us, 1 processor)
total alloc = 15,772,184,488 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
SimplTopBinds SimplCore
compiler/simplCore/SimplCore.hs:761:39-74 38.3 29.0
sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 13.2 20.3
RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 8.3 10.5
FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66 8.1 8.5
pprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 5.4 5.9
NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63 3.1 3.5
OccAnal SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 2.9 3.3
StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 2.3 2.3
regLiveness AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 2.1 1.8
NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78 1.7 2.0
genMachCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 1.6 1.7
CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 1.4 1.8
foldNodesBwdOO Hoopl.Dataflow compiler/cmm/Hoopl/Dataflow.hs:(397,1)-(403,17) 1.1 0.8
```
Reviewers: austin, bgamari, simonmar
Reviewed By: bgamari
Subscribers: duog, rwbarton, thomie, tdammers
GHC Trac Issues: #7258
Differential Revision: https://phabricator.haskell.org/D4145
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This switches the compiler/ component to get compiled with
-XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
modules.
This is motivated by the upcoming "Prelude" re-export of
`Semigroup((<>))` which would cause lots of name clashes in every
modulewhich imports also `Outputable`
Reviewers: austin, goldfire, bgamari, alanz, simonmar
Reviewed By: bgamari
Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
Differential Revision: https://phabricator.haskell.org/D3989
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This copies the subset of Hoopl's functionality needed by GHC to
`cmm/Hoopl` and removes the dependency on the Hoopl package.
The main motivation for this change is the confusing/noisy interface
between GHC and Hoopl:
- Hoopl has `Label` which is GHC's `BlockId` but different than
GHC's `CLabel`
- Hoopl has `Unique` which is different than GHC's `Unique`
- Hoopl has `Unique{Map,Set}` which are different than GHC's
`Uniq{FM,Set}`
- GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
needed just to filter the exposed functions (filter out some of the
Hoopl's and add the GHC ones)
With this change, we'll be able to simplify this significantly.
It'll also be much easier to do invasive changes (Hoopl is a public
package on Hackage with users that depend on the current behavior)
This should introduce no changes in functionality - it merely
copies the relevant code.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: austin, bgamari, simonmar
Reviewed By: bgamari, simonmar
Subscribers: simonpj, kavon, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3616
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch in in preparation for the fix to Trac #13397
The code generator has a special case for
case tagToEnum (a>#b) of
False -> e1
True -> e2
but it was not doing nearly so well on
case a>#b of
DEFAULT -> e1
1# -> e2
This patch arranges to behave essentially identically in
both cases. In due course we can eliminate the special
case for tagToEnum#, once we've completed Trac #13397.
The changes are:
* Make CmmSink swizzle the order of a conditional where necessary;
see Note [Improving conditionals] in CmmSink
* Hack the general case of StgCmmExpr.cgCase so that it use
NoGcInAlts for conditionals. This doesn't seem right, but it's
the same choice as the tagToEnum version. Without it, code size
increases a lot (more heap checks).
There's a loose end here.
* Add comments in CmmOpt.cmmMachOpFoldM
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This continues removal of `BlockId` module in favor of Hoopl's `Label`.
Most of the changes here are mechanical, apart from the orphan
`Outputable` instances for `LabelMap` and `LabelSet`. For now I've
moved them to `cmm/Hoopl`, since it's already trying to manage all
imports from Hoopl (to avoid any collisions).
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: validate
Reviewers: bgamari, austin, simonmar
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2800
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
On x86_64, commit e2f6bbd3a27685bc667655fdb093734cb565b4cf assigned
the STG registers F1 and D1 the same hardware register (xmm1), and
the same for the registers F2 and D2, etc. When mixing calls to
functions involving Float#s and Double#s, this can cause wrong Cmm
optimizations that assume the F1 and D1 registers are independent.
Reviewers: simonpj, austin
Reviewed By: austin
Subscribers: simonpj, thomie, bgamari
Differential Revision: https://phabricator.haskell.org/D993
GHC Trac Issues: #10521
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
CodeGen.Platform.hs was changed with the following diff:
-#endif
globalRegMaybe _ = Nothing
+#elif MACHREGS_NO_REGS
+globalRegMaybe _ = Nothing
+#else
+globalRegMaybe = panic "globalRegMaybe not defined for this platform"
+#endif
which causes globalRegMaybe ot panic for arch ARM.
This patch ensures globalRegMaybe is not called on ARM.
Signed-off-by: Moritz Angermann <moritz@lichtzwerge.de>
Test Plan: Building arm cross-compiler (e.g. --target=arm-apple-darwin10)
Reviewers: hvr, ezyang, simonmar, rwbarton, austin
Reviewed By: austin
Subscribers: dterei, bgamari, simonmar, ezyang, carter
Differential Revision: https://phabricator.haskell.org/D208
GHC Trac Issues: #9593
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the second attempt to add this functionality. The first
attempt was reverted in 950fcae46a82569e7cd1fba1637a23b419e00ecd, due
to register allocator failure on x86. Given how the register
allocator currently works, we don't have enough registers on x86 to
support cmpxchg using complicated addressing modes. Instead we fall
back to a simpler addressing mode on x86.
Adds the following primops:
* atomicReadIntArray#
* atomicWriteIntArray#
* fetchSubIntArray#
* fetchOrIntArray#
* fetchXorIntArray#
* fetchAndIntArray#
Makes these pre-existing out-of-line primops inline:
* fetchAddIntArray#
* casIntArray#
|
|
|
|
|
|
|
|
| |
This commit caused the register allocator to fail on i386.
This reverts commit d8abf85f8ca176854e9d5d0b12371c4bc402aac3 and
04dd7cb3423f1940242fdfe2ea2e3b8abd68a177 (the second being a fix to
the first).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add more primops for atomic ops on byte arrays
Adds the following primops:
* atomicReadIntArray#
* atomicWriteIntArray#
* fetchSubIntArray#
* fetchOrIntArray#
* fetchXorIntArray#
* fetchAndIntArray#
Makes these pre-existing out-of-line primops inline:
* fetchAddIntArray#
* casIntArray#
|
| |
|
| |
|
|
|
|
| |
This reverts commit a79613a75c7da0d3d225850382f0f578a07113b5.
|
|
|
|
|
|
|
|
| |
We don't yet understand WHY commit ad15c2, which is to do with
CmmSink, causes seg-faults on Windows, but it certainly seems to. So
reverting it is a stop-gap, but we need to un-block the 7.8 release.
Many thanks to awson for identifying the offending commit.
|
|
|
|
| |
By using the constant-folder to reduce it to an integer.
|
|
|
|
|
|
|
| |
Inlining global registers and constants made code slightly larger in
some cases. I finally got around to looking into why, and discovered
one reason: we weren't discarding dead code in some cases. This patch
fixes it.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit does two things:
* Allows duplicating of global registers and literals by inlining
them. Previously we would only inline global register or literal
if it was used only once.
* Changes method of determining conflicts between a node and an
assignment. New method has two advantages. It relies on
DefinerOfRegs and UserOfRegs typeclasses, so if a set of registers
defined or used by a node should ever change, `conflicts` function
will use the changed definition. This definition also catches
more cases than the previous one (namely CmmCall and CmmForeignCall)
which is a step towards making it possible to run sinking pass
before stack layout (currently this doesn't work).
This patch also adds a lot of comments that are result of about two-week
long investigation of how sinking pass works and why it does what it does.
|
|
|
|
| |
And update comments
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
We would like to calculate register liveness for global registers as well as
local registers, so this patch generalizes the existing infrastructure to set
the stage.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
We were making an aggressive assumption that foreign calls cannot
clobber heap or stack memory, which for the majority of foreign calls
is true, but we violate the assumption in the implementation of
primops in the RTS. This was causing crashes in some STM tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main change here is that the Cmm parser now allows high-level cmm
code with argument-passing and function calls. For example:
foo ( gcptr a, bits32 b )
{
if (b > 0) {
// we can make tail calls passing arguments:
jump stg_ap_0_fast(a);
}
return (x,y);
}
More details on the new cmm syntax are in Note [Syntax of .cmm files]
in CmmParse.y.
The old syntax is still more-or-less supported for those occasional
code fragments that really need to explicitly manipulate the stack.
However there are a couple of differences: it is now obligatory to
give a list of live GlobalRegs on every jump, e.g.
jump %ENTRY_CODE(Sp(0)) [R1];
Again, more details in Note [Syntax of .cmm files].
I have rewritten most of the .cmm files in the RTS into the new
syntax, except for AutoApply.cmm which is generated by the genapply
program: this file could be generated in the new syntax instead and
would probably be better off for it, but I ran out of enthusiasm.
Some other changes in this batch:
- The PrimOp calling convention is gone, primops now use the ordinary
NativeNodeCall convention. This means that primops and "foreign
import prim" code must be written in high-level cmm, but they can
now take more than 10 arguments.
- CmmSink now does constant-folding (should fix #7219)
- .cmm files now go through the cmmPipeline, and as a result we
generate better code in many cases. All the object files generated
for the RTS .cmm files are now smaller. Performance should be
better too, but I haven't measured it yet.
- RET_DYN frames are removed from the RTS, lots of code goes away
- we now have some more canned GC points to cover unboxed-tuples with
2-4 pointers, which will reduce code size a little.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
I've switched to passing DynFlags rather than Platform, as (a) it's
simpler to not have to extract targetPlatform in so many places, and
(b) it may be useful to have DynFlags around in future.
|
|
|
|
|
|
|
|
| |
This means that we now generate the same code whatever platform we are
on, which should help avoid changes on one platform breaking the build
on another.
It's also another step towards full cross-compilation.
|
| |
|