diff options
| author | Ömer Sinan Ağacan <omeragacan@gmail.com> | 2019-10-14 15:44:58 +0300 | 
|---|---|---|
| committer | Ömer Sinan Ağacan <omeragacan@gmail.com> | 2020-01-31 12:21:10 +0300 | 
| commit | c846618ae0f8601515683a4c7677c20c3272a50f (patch) | |
| tree | 22caa68b7c6cdf3464d078e556f7eac19400b0bf /compiler/nativeGen/X86/Instr.hs | |
| parent | 01b15b835a7555c501df862b4dc8cc8eaff86afc (diff) | |
| download | haskell-c846618ae0f8601515683a4c7677c20c3272a50f.tar.gz | |
Do CafInfo/SRT analysis in Cmm
This patch removes all CafInfo predictions and various hacks to preserve
predicted CafInfos from the compiler and assigns final CafInfos to
interface Ids after code generation. SRT analysis is extended to support
static data, and Cmm generator is modified to allow generating
static_link fields after SRT analysis.
This also fixes `-fcatch-bottoms`, which introduces error calls in case
expressions in CorePrep, which runs *after* CoreTidy (which is where we
decide on CafInfos) and turns previously non-CAFFY things into CAFFY.
Fixes #17648
Fixes #9718
Evaluation
==========
NoFib
-----
Boot with: `make boot mode=fast`
Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1`
--------------------------------------------------------------------------------
        Program           Size    Allocs    Instrs     Reads    Writes
--------------------------------------------------------------------------------
             CS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            CSD          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             FS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
              S          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             VS          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            VSD          -0.0%      0.0%     -0.0%     -0.0%     -0.5%
            VSM          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           anna          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
           ansi          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           atom          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         awards          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         banner          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
     bernouilli          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
   binary-trees          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          boyer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         boyer2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           bspt          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      cacheprof          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       calendar          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       cichelli          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        circsim          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       clausify          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
  comp_lab_zift          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       compress          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      compress2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
    constraints          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
   cryptarithm1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
   cryptarithm2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            cse          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
   digits-of-e1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
   digits-of-e2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         dom-lt          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          eliza          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          event          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
    exact-reals          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         exp3_8          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         expert          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
 fannkuch-redux          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          fasta          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            fem          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            fft          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           fft2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       fibheaps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           fish          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          fluid          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
         fulsom          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         gamteb          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            gcd          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
    gen_regexps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         genfft          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
             gg          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           grep          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         hidden          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            hpg          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
            ida          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          infer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        integer          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      integrate          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
   k-nucleotide          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          kahan          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        knights          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         lambda          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
     last-piece          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           lcss          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           life          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           lift          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         linear          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
      listcompr          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       listcopy          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       maillist          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         mandel          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        mandel2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           mate          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        minimax          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        mkhprog          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
     multiplier          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         n-body          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       nucleic2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           para          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      paraffins          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         parser          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
        parstof          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
            pic          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       pidigits          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          power          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         pretty          -0.0%      0.0%     -0.3%     -0.4%     -0.4%
         primes          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      primetest          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         prolog          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         puzzle          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         queens          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        reptile          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
reverse-complem          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        rewrite          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           rfib          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            rsa          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            scc          -0.0%      0.0%     -0.3%     -0.5%     -0.4%
          sched          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            scs          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         simple          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
          solid          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        sorting          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
  spectral-norm          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         sphere          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
         symalg          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
            tak          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      transform          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       treejoin          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      typecheck          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
        veritas          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           wang          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
      wave4main          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
   wheel-sieve1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
   wheel-sieve2          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           x2n1          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
--------------------------------------------------------------------------------
            Min          -0.1%      0.0%     -0.3%     -0.5%     -0.5%
            Max          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
 Geometric Mean          -0.0%     -0.0%     -0.0%     -0.0%     -0.0%
--------------------------------------------------------------------------------
        Program           Size    Allocs    Instrs     Reads    Writes
--------------------------------------------------------------------------------
        circsim          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
    constraints          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       fibheaps          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
       gc_bench          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           hash          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
           lcss          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
          power          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
     spellcheck          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
--------------------------------------------------------------------------------
            Min          -0.1%      0.0%     -0.0%     -0.0%     -0.0%
            Max          -0.0%      0.0%     -0.0%     -0.0%     -0.0%
 Geometric Mean          -0.0%     +0.0%     -0.0%     -0.0%     -0.0%
Manual inspection of programs in testsuite/tests/programs
---------------------------------------------------------
I built these programs with a bunch of dump flags and `-O` and compared
STG, Cmm, and Asm dumps and file sizes.
(Below the numbers in parenthesis show number of modules in the program)
These programs have identical compiler (same .hi and .o sizes, STG, and
Cmm and Asm dumps):
- Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3),
  andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2),
  jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4),
  jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1),
  bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1),
  strict_anns (1), thurston-module-arith (2), okeefe_neural (1),
  joao-circular (6), 10queens (1)
Programs with different compiler outputs:
- jl_defaults (1): For some reason GHC HEAD marks a lot of top-level
  `[Int]` closures as CAFFY for no reason. With this patch we no longer
  make them CAFFY and generate less SRT entries. For some reason Main.o
  is slightly larger with this patch (1.3%) and the executable sizes are
  the same. (I'd expect both to be smaller)
- launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked
  as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the
  executable sizes are the same.
- galois_raytrace (13): Differences are in the Parse module. There are a
  lot, but some of the changes are caused by the fact that for some
  reason (I think a bug) GHC HEAD marks the dictionary for `Functor
  Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the
  same.
- north_array: We now generate less SRT entries because some of array
  primops used in this program like `NewArrayOp` get eliminated during
  Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24%
  larger (9224 bytes from 9000 bytes), executable sizes are the same.
- seward-space-leak: Difference in this program is better shown by this
  smaller example:
      module Lib where
      data CDS
        = Case [CDS] [(Int, CDS)]
        | Call CDS CDS
      instance Eq CDS where
        Case sels1 rets1 == Case sels2 rets2 =
            sels1 == sels2 && rets1 == rets2
        Call a1 b1 == Call a2 b2 =
            a1 == a2 && b1 == b2
        _ == _ =
            False
   In this program GHC HEAD builds a new SRT for the recursive group of
   `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==`
   in its SRT field, and `==` uses the SRT object as its SRT. With this
   patch we use the closure for `/=` as the SRT and add `==` there. Then
   `/=` gets an empty SRT field and `==` points to `/=` in its SRT
   field.
   This change looks fine to me.
   Main.o gets 0.07% larger, executable sizes are identical.
head.hackage
------------
head.hackage's CI script builds 428 packages from Hackage using this
patch with no failures.
Compiler performance
--------------------
The compiler perf tests report that the compiler allocates slightly more
(worst case observed so far is 4%). However most programs in the test
suite are small, single file programs. To benchmark compiler performance
on something more realistic I build Cabal (the library, 236 modules)
with different optimisation levels. For the "max residency" row I run
GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows
are generated with just `-s`. (This is because `-i0` causes running GC
much more frequently and as a result "bytes copied" gets inflated by
more than 25x in some cases)
* -O0
|                 | GHC HEAD       | This MR        | Diff   |
| --------------- | -------------- | -------------- | ------ |
| Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% |
| Bytes copied    |  4,926,037,184 |  4,990,638,760 | +1.31% |
| Max residency   |    421,225,624 |    424,324,264 | +0.73% |
* -O1
|                 | GHC HEAD        | This MR         | Diff   |
| --------------- | --------------- | --------------- | ------ |
| Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% |
| Bytes copied    |  26,943,452,560 |  27,089,972,296 | +0.54% |
| Max residency   |     982,643,440 |     991,663,432 | +0.91% |
* -O2
|                 | GHC HEAD        | This MR         | Diff   |
| --------------- | --------------- | --------------- | ------ |
| Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% |
| Bytes copied    |  37,044,237,616 |  36,121,690,472 | -2.49% |
| Max residency   |   1,071,600,328 |   1,086,396,256 | +1.38% |
Extra compiler allocations
--------------------------
Runtime allocations of programs are as reported above (NoFib section).
The compiler now allocates more than before. Main source of allocation
in this patch compared to base commit is the new SRT algorithm
(GHC.Cmm.Info.Build). Below is some of the extra work we do with this
patch, numbers generated by profiled stage 2 compiler when building a
pathological case (the test 'ManyConstructors') with '-O2':
- We now sort the final STG for a module, which means traversing the
  entire program, generating free variable set for each top-level
  binding, doing SCC analysis, and re-ordering the program. In
  ManyConstructors this step allocates 97,889,952 bytes.
- We now do SRT analysis on static data, which in a program like
  ManyConstructors causes analysing 10,000 bindings that we would
  previously just skip. This step allocates 70,898,352 bytes.
- We now maintain an SRT map for the entire module as we compile Cmm
  groups:
      data ModuleSRTInfo = ModuleSRTInfo
        { ...
        , moduleSRTMap :: SRTMap
        }
   (SRTMap is just a strict Map from the 'containers' library)
   This map gets an entry for most bindings in a module (exceptions are
   THUNKs and CAFFY static functions). For ManyConstructors this map
   gets 50015 entries.
- Once we're done with code generation we generate a NameSet from SRTMap
  for the non-CAFFY names in the current module. This set gets the same
  number of entries as the SRTMap.
- Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using
  the NameSet generated in the previous step. This usually does the
  least amount of allocation among the work listed here.
Only place with this patch where we do less work in the CAF analysis in
the tidying pass (CoreTidy). However that doesn't save us much, as the
pass still needs to traverse the whole program and update IdInfos for
other reasons. Only thing we don't here do is the `hasCafRefs` pass over
the RHS of bindings, which is a stateless pass that returns a boolean
value, so it doesn't allocate much.
(Metric changes blow are all increased allocations)
Metric changes
--------------
Metric Increase:
    ManyAlternatives
    ManyConstructors
    T13035
    T14683
    T1969
    T9961
Diffstat (limited to 'compiler/nativeGen/X86/Instr.hs')
| -rw-r--r-- | compiler/nativeGen/X86/Instr.hs | 8 | 
1 files changed, 4 insertions, 4 deletions
| diff --git a/compiler/nativeGen/X86/Instr.hs b/compiler/nativeGen/X86/Instr.hs index 4591464671..422bb96de4 100644 --- a/compiler/nativeGen/X86/Instr.hs +++ b/compiler/nativeGen/X86/Instr.hs @@ -174,7 +174,7 @@ data Instr          -- some static data spat out during code          -- generation.  Will be extracted before          -- pretty-printing. -        | LDATA   Section (Alignment, CmmStatics) +        | LDATA   Section (Alignment, RawCmmStatics)          -- start a new basic block.  Useful during          -- codegen, removed later.  Preceding @@ -1017,9 +1017,9 @@ shortcutJump fn insn = shortcutJump' fn (setEmpty :: LabelSet) insn      shortcutJump' _ _ other = other  -- Here because it knows about JumpDest -shortcutStatics :: (BlockId -> Maybe JumpDest) -> (Alignment, CmmStatics) -> (Alignment, CmmStatics) -shortcutStatics fn (align, Statics lbl statics) -  = (align, Statics lbl $ map (shortcutStatic fn) statics) +shortcutStatics :: (BlockId -> Maybe JumpDest) -> (Alignment, RawCmmStatics) -> (Alignment, RawCmmStatics) +shortcutStatics fn (align, RawCmmStatics lbl statics) +  = (align, RawCmmStatics lbl $ map (shortcutStatic fn) statics)    -- we need to get the jump tables, so apply the mapping to the entries    -- of a CmmData too. | 
