nativeGen: One approach to fix #18527

Previously the code generator could produce corrupt C call sequences due to register overlap between MachOp lowerings and the platform's calling convention. We fix this using a hack described in Note [Evaluate C-call arguments before placing in destination registers].
author: Ben Gamari <ben@smart-cactus.org> 2020-08-05 19:24:10 -0400
committer: Marge Bot <ben+marge-bot@smart-cactus.org> 2020-08-07 08:35:21 -0400
commit: 15b36de030ecdd60897bc7a6a02bdeabd0825be4 (patch)
tree: 6d1eb105b0b14840dfa133882d63f105408bf10b
parent: 6402c1240d5bd768b8fe8b4368413932bedbe107 (diff)
download: haskell-15b36de030ecdd60897bc7a6a02bdeabd0825be4.tar.gz
2 files changed, 95 insertions, 8 deletions
diff --git a/compiler/GHC/Cmm/MachOp.hs b/compiler/GHC/Cmm/MachOp.hs
index 077f663161..558cb13a7e 100644
--- a/compiler/GHC/Cmm/MachOp.hs
+++ b/compiler/GHC/Cmm/MachOp.hs
@@ -45,6 +45,9 @@ native code generators to handle.
 Most operations are parameterised by the 'Width' that they operate on.
 Some operations have separate signed and unsigned versions, and float
 and integer versions.
+
+Note that there are variety of places in the native code generator where we
+assume that the code produced for a MachOp does not introduce new blocks.
 -}
 
 data MachOp
diff --git a/compiler/GHC/CmmToAsm/X86/CodeGen.hs b/compiler/GHC/CmmToAsm/X86/CodeGen.hs
index fa1a7d4884..f210cebb2d 100644
--- a/compiler/GHC/CmmToAsm/X86/CodeGen.hs
+++ b/compiler/GHC/CmmToAsm/X86/CodeGen.hs
@@ -287,11 +287,11 @@ we construct as a separate data type and the actual control flow graph in the co
 Instead we now return the new basic block if a statement causes a change
 in the current block and use the block for all following statements.
 
-For this reason genCCall is also split into two parts.
-One for calls which *won't* change the basic blocks in
-which successive instructions will be placed.
-A different one for calls which *are* known to change the
-basic block.
+For this reason genCCall is also split into two parts.  One for calls which
+*won't* change the basic blocks in which successive instructions will be
+placed (since they only evaluate CmmExpr, which can only contain MachOps, which
+cannot introduce basic blocks in their lowerings).  A different one for calls
+which *are* known to change the basic block.
 
 -}
 
@@ -1028,6 +1028,9 @@ getRegister' _ is32Bit (CmmMachOp mop [x, y]) = do -- dyadic MachOps
           tmp.  This is likely to be better, because the reg alloc can
           eliminate this reg->reg move here (it won't eliminate the other one,
           because the move is into the fixed %ecx).
+      * in the case of C calls the use of ecx here can interfere with arguments.
+        We avoid this with the hack described in Note [Evaluate C-call
+        arguments before placing in destination registers]
     -}
     shift_code width instr x y{-amount-} = do
         x_code <- getAnyReg x
@@ -2022,6 +2025,7 @@ genCCall is32Bit (PrimTarget (MO_AtomicRMW width amop))
     arg <- getNewRegNat format
     arg_code <- getAnyReg n
     platform <- ncgPlatform <$> getConfig
+
     let dst_r    = getRegisterReg platform  (CmmLocal dst)
     (code, lbl) <- op_code dst_r arg amode
     return (addr_code `appOL` arg_code arg `appOL` code, Just lbl)
@@ -2667,9 +2671,12 @@ genCCall' _ is32Bit target dest_regs args bid = do
                return code
         _ -> panic "genCCall: Wrong number of arguments/results for imul2"
 
-    _ -> if is32Bit
-         then genCCall32' target dest_regs args
-         else genCCall64' target dest_regs args
+    _ -> do
+        (instrs0, args') <- evalArgs bid args
+        instrs1 <- if is32Bit
+          then genCCall32' target dest_regs args'
+          else genCCall64' target dest_regs args'
+        return (instrs0 `appOL` instrs1)
 
   where divOp1 platform signed width results [arg_x, arg_y]
             = divOp platform signed width results Nothing arg_x arg_y
@@ -2732,6 +2739,83 @@ genCCall' _ is32Bit target dest_regs args bid = do
         addSubIntC _ _ _ _ _ _ _ _
             = panic "genCCall: Wrong number of arguments/results for addSubIntC"
 
+{-
+Note [Evaluate C-call arguments before placing in destination registers]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When producing code for C calls we must take care when placing arguments
+in their final registers. Specifically, we must ensure that temporary register
+usage due to evaluation of one argument does not clobber a register in which we
+already placed a previous argument (e.g. as the code generation logic for
+MO_Shl can clobber %rcx due to x86 instruction limitations).
+
+This is precisely what happened in #18527. Consider this C--:
+
+    (result::I64) = call "ccall" doSomething(_s2hp::I64, 2244, _s2hq::I64, _s2hw::I64 | (1 << _s2hz::I64));
+
+Here we are calling the C function `doSomething` with three arguments, the last
+involving a non-trivial expression involving MO_Shl. In this case the NCG could
+naively generate the following assembly (where $tmp denotes some temporary
+register and $argN denotes the register for argument N, as dictated by the
+platform's calling convention):
+
+    mov _s2hp, $arg1   # place first argument
+    mov _s2hq, $arg2   # place second argument
+
+    # Compute 1 << _s2hz
+    mov _s2hz, %rcx
+    shl %cl, $tmp
+
+    # Compute (_s2hw | (1 << _s2hz))
+    mov _s2hw, $arg3
+    or $tmp, $arg3
+
+    # Perform the call
+    call func
+
+This code is outright broken on Windows which assigns $arg1 to %rcx. This means
+that the evaluation of the last argument clobbers the first argument.
+
+To avoid this we use a rather awful hack: when producing code for a C call with
+at least one non-trivial argument, we first evaluate all of the arguments into
+local registers before moving them into their final calling-convention-defined
+homes.  This is performed by 'evalArgs'. Here we define "non-trivial" to be an
+expression which might contain a MachOp since these are the only cases which
+might clobber registers. Furthermore, we use a conservative approximation of
+this condition (only looking at the top-level of CmmExprs) to avoid spending
+too much effort trying to decide whether we want to take the fast path.
+
+Note that this hack *also* applies to calls to out-of-line PrimTargets (which
+are lowered via a C call) since outOfLineCmmOp produces the call via
+(stmtToInstrs (CmmUnsafeForeignCall ...)), which will ultimately end up
+back in genCCall{32,64}.
+-}
+
+-- | See Note [Evaluate C-call arguments before placing in destination registers]
+evalArgs :: BlockId -> [CmmActual] -> NatM (InstrBlock, [CmmActual])
+evalArgs bid actuals
+  | any mightContainMachOp actuals = do
+      regs_blks <- mapM evalArg actuals
+      return (concatOL $ map fst regs_blks, map snd regs_blks)
+  | otherwise = return (nilOL, actuals)
+  where
+    mightContainMachOp (CmmReg _)      = False
+    mightContainMachOp (CmmRegOff _ _) = False
+    mightContainMachOp (CmmLit _)      = False
+    mightContainMachOp _               = True
+
+    evalArg :: CmmActual -> NatM (InstrBlock, CmmExpr)
+    evalArg actual = do
+        platform <- getPlatform
+        lreg <- newLocalReg $ cmmExprType platform actual
+        (instrs, bid1) <- stmtToInstrs bid $ CmmAssign (CmmLocal lreg) actual
+        -- The above assignment shouldn't change the current block
+        MASSERT(isNothing bid1)
+        return (instrs, CmmReg $ CmmLocal lreg)
+
+    newLocalReg :: CmmType -> NatM LocalReg
+    newLocalReg ty = LocalReg <$> getUniqueM <*> pure ty
+
 -- Note [DIV/IDIV for bytes]
 --
 -- IDIV reminder:
author	Ben Gamari <ben@smart-cactus.org>	2020-08-05 19:24:10 -0400
committer	Marge Bot <ben+marge-bot@smart-cactus.org>	2020-08-07 08:35:21 -0400
commit	15b36de030ecdd60897bc7a6a02bdeabd0825be4 (patch)
tree	6d1eb105b0b14840dfa133882d63f105408bf10b
parent	6402c1240d5bd768b8fe8b4368413932bedbe107 (diff)
download	haskell-15b36de030ecdd60897bc7a6a02bdeabd0825be4.tar.gz