diff options
author | Matthew Pickering <matthewtpickering@gmail.com> | 2021-07-20 11:49:22 +0200 |
---|---|---|
committer | Matthew Pickering <matthewtpickering@gmail.com> | 2021-12-28 09:47:53 +0000 |
commit | fd42ab5fa1df847a6b595dfe4b63d9c7eecbf400 (patch) | |
tree | 3bd7add640ee4e1340de079a16a05fd34548925f /compiler/GHC/Unit/Module/Graph.hs | |
parent | 3219610e3ba6cb6a5cd1f4e32e2b4befea5bd384 (diff) | |
download | haskell-fd42ab5fa1df847a6b595dfe4b63d9c7eecbf400.tar.gz |
Multiple Home Units
Multiple home units allows you to load different packages which may depend on
each other into one GHC session. This will allow both GHCi and HLS to support
multi component projects more naturally.
Public Interface
~~~~~~~~~~~~~~~~
In order to specify multiple units, the -unit @⟨filename⟩ flag
is given multiple times with a response file containing the arguments for each unit.
The response file contains a newline separated list of arguments.
```
ghc -unit @unitLibCore -unit @unitLib
```
where the `unitLibCore` response file contains the normal arguments that cabal would pass to `--make` mode.
```
-this-unit-id lib-core-0.1.0.0
-i
-isrc
LibCore.Utils
LibCore.Types
```
The response file for lib, can specify a dependency on lib-core, so then modules in lib can use modules from lib-core.
```
-this-unit-id lib-0.1.0.0
-package-id lib-core-0.1.0.0
-i
-isrc
Lib.Parse
Lib.Render
```
Then when the compiler starts in --make mode it will compile both units lib and lib-core.
There is also very basic support for multiple home units in GHCi, at the
moment you can start a GHCi session with multiple units but only the
:reload is supported. Most commands in GHCi assume a single home unit,
and so it is additional work to work out how to modify the interface to
support multiple loaded home units.
Options used when working with Multiple Home Units
There are a few extra flags which have been introduced specifically for
working with multiple home units. The flags allow a home unit to pretend
it’s more like an installed package, for example, specifying the package
name, module visibility and reexported modules.
-working-dir ⟨dir⟩
It is common to assume that a package is compiled in the directory
where its cabal file resides. Thus, all paths used in the compiler
are assumed to be relative to this directory. When there are
multiple home units the compiler is often not operating in the
standard directory and instead where the cabal.project file is
located. In this case the -working-dir option can be passed which
specifies the path from the current directory to the directory the
unit assumes to be it’s root, normally the directory which contains
the cabal file.
When the flag is passed, any relative paths used by the compiler are
offset by the working directory. Notably this includes -i and
-I⟨dir⟩ flags.
-this-package-name ⟨name⟩
This flag papers over the awkward interaction of the PackageImports
and multiple home units. When using PackageImports you can specify
the name of the package in an import to disambiguate between modules
which appear in multiple packages with the same name.
This flag allows a home unit to be given a package name so that you
can also disambiguate between multiple home units which provide
modules with the same name.
-hidden-module ⟨module name⟩
This flag can be supplied multiple times in order to specify which
modules in a home unit should not be visible outside of the unit it
belongs to.
The main use of this flag is to be able to recreate the difference
between an exposed and hidden module for installed packages.
-reexported-module ⟨module name⟩
This flag can be supplied multiple times in order to specify which
modules are not defined in a unit but should be reexported. The
effect is that other units will see this module as if it was defined
in this unit.
The use of this flag is to be able to replicate the reexported
modules feature of packages with multiple home units.
Offsetting Paths in Template Haskell splices
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When using Template Haskell to embed files into your program,
traditionally the paths have been interpreted relative to the directory
where the .cabal file resides. This causes problems for multiple home
units as we are compiling many different libraries at once which have
.cabal files in different directories.
For this purpose we have introduced a way to query the value of the
-working-dir flag to the Template Haskell API. By using this function we
can implement a makeRelativeToProject function which offsets a path
which is relative to the original project root by the value of
-working-dir.
```
import Language.Haskell.TH.Syntax ( makeRelativeToProject )
foo = $(makeRelativeToProject "./relative/path" >>= embedFile)
```
> If you write a relative path in a Template Haskell splice you should use the makeRelativeToProject function so that your library works correctly with multiple home units.
A similar function already exists in the file-embed library. The
function in template-haskell implements this function in a more robust
manner by honouring the -working-dir flag rather than searching the file
system.
Closure Property for Home Units
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For tools or libraries using the API there is one very important closure
property which must be adhered to:
> Any dependency which is not a home unit must not (transitively) depend
on a home unit.
For example, if you have three packages p, q and r, then if p depends on
q which depends on r then it is illegal to load both p and r as home
units but not q, because q is a dependency of the home unit p which
depends on another home unit r.
If you are using GHC by the command line then this property is checked,
but if you are using the API then you need to check this property
yourself. If you get it wrong you will probably get some very confusing
errors about overlapping instances.
Limitations of Multiple Home Units
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are a few limitations of the initial implementation which will be smoothed out on user demand.
* Package thinning/renaming syntax is not supported
* More complicated reexports/renaming are not yet supported.
* It’s more common to run into existing linker bugs when loading a
large number of packages in a session (for example #20674, #20689)
* Backpack is not yet supported when using multiple home units.
* Dependency chasing can be quite slow with a large number of
modules and packages.
* Loading wired-in packages as home units is currently not supported
(this only really affects GHC developers attempting to load
template-haskell).
* Barely any normal GHCi features are supported, it would be good to
support enough for ghcid to work correctly.
Despite these limitations, the implementation works already for nearly
all packages. It has been testing on large dependency closures,
including the whole of head.hackage which is a total of 4784 modules
from 452 packages.
Internal Changes
~~~~~~~~~~~~~~~~
* The biggest change is that the HomePackageTable is replaced with the
HomeUnitGraph. The HomeUnitGraph is a map from UnitId to HomeUnitEnv,
which contains information specific to each home unit.
* The HomeUnitEnv contains:
- A unit state, each home unit can have different package db flags
- A set of dynflags, each home unit can have different flags
- A HomePackageTable
* LinkNode: A new node type is added to the ModuleGraph, this is used to
place the linking step into the build plan so linking can proceed in
parralel with other packages being built.
* New invariant: Dependencies of a ModuleGraphNode can be completely
determined by looking at the value of the node. In order to achieve
this, downsweep now performs a more complete job of downsweeping and
then the dependenices are recorded forever in the node rather than
being computed again from the ModSummary.
* Some transitive module calculations are rewritten to use the
ModuleGraph which is more efficient.
* There is always an active home unit, which simplifies modifying a lot
of the existing API code which is unit agnostic (for example, in the
driver).
The road may be bumpy for a little while after this change but the
basics are well-tested.
One small metric increase, which we accept and also submodule update to
haddock which removes ExtendedModSummary.
Closes #10827
-------------------------
Metric Increase:
MultiLayerModules
-------------------------
Co-authored-by: Fendor <power.walross@gmail.com>
Diffstat (limited to 'compiler/GHC/Unit/Module/Graph.hs')
-rw-r--r-- | compiler/GHC/Unit/Module/Graph.hs | 162 |
1 files changed, 97 insertions, 65 deletions
diff --git a/compiler/GHC/Unit/Module/Graph.hs b/compiler/GHC/Unit/Module/Graph.hs index 0df5779416..a225c50f27 100644 --- a/compiler/GHC/Unit/Module/Graph.hs +++ b/compiler/GHC/Unit/Module/Graph.hs @@ -6,9 +6,9 @@ module GHC.Unit.Module.Graph ( ModuleGraph , ModuleGraphNode(..) + , nodeDependencies , emptyMG , mkModuleGraph - , mkModuleGraph' , extendMG , extendMGInst , extendMG' @@ -16,7 +16,6 @@ module GHC.Unit.Module.Graph , mapMG , mgModSummaries , mgModSummaries' - , mgExtendedModSummaries , mgElemModule , mgLookupModule , mgBootModules @@ -36,6 +35,10 @@ module GHC.Unit.Module.Graph , mkNodeKey , msKey + + , moduleGraphNodeUnitId + + , ModNodeKeyWithUid(..) ) where @@ -60,9 +63,9 @@ import GHC.Utils.Outputable import System.FilePath import qualified Data.Map as Map import GHC.Types.Unique.DSet -import GHC.Types.SrcLoc import qualified Data.Set as Set import GHC.Unit.Module +import GHC.Linker.Static.Utils -- | A '@ModuleGraphNode@' is a node in the '@ModuleGraph@'. -- Edges between nodes mark dependencies arising from module imports @@ -70,21 +73,51 @@ import GHC.Unit.Module data ModuleGraphNode -- | Instantiation nodes track the instantiation of other units -- (backpack dependencies) with the holes (signatures) of the current package. - = InstantiationNode InstantiatedUnit + = InstantiationNode UnitId InstantiatedUnit -- | There is a module summary node for each module, signature, and boot module being built. - | ModuleNode ExtendedModSummary + | ModuleNode [NodeKey] ModSummary + -- | Link nodes are whether are are creating a linked product (ie executable/shared object etc) for a unit. + | LinkNode [NodeKey] UnitId -moduleGraphNodeModSum :: ModuleGraphNode -> Maybe ExtendedModSummary +moduleGraphNodeModule :: ModuleGraphNode -> Maybe ModuleName +moduleGraphNodeModule mgn = ms_mod_name <$> (moduleGraphNodeModSum mgn) + +moduleGraphNodeModSum :: ModuleGraphNode -> Maybe ModSummary moduleGraphNodeModSum (InstantiationNode {}) = Nothing -moduleGraphNodeModSum (ModuleNode ems) = Just ems +moduleGraphNodeModSum (LinkNode {}) = Nothing +moduleGraphNodeModSum (ModuleNode _ ms) = Just ms -moduleGraphNodeModule :: ModuleGraphNode -> Maybe ModuleName -moduleGraphNodeModule = fmap (ms_mod_name . emsModSummary) . moduleGraphNodeModSum +moduleGraphNodeUnitId :: ModuleGraphNode -> UnitId +moduleGraphNodeUnitId mgn = + case mgn of + InstantiationNode uid _iud -> uid + ModuleNode _ ms -> toUnitId (moduleUnit (ms_mod ms)) + LinkNode _ uid -> uid instance Outputable ModuleGraphNode where ppr = \case - InstantiationNode iuid -> ppr iuid - ModuleNode ems -> ppr ems + InstantiationNode _ iuid -> ppr iuid + ModuleNode nks ms -> ppr (ms_mnwib ms) <+> ppr nks + LinkNode uid _ -> text "LN:" <+> ppr uid + +data NodeKey = NodeKey_Unit {-# UNPACK #-} !InstantiatedUnit + | NodeKey_Module {-# UNPACK #-} !ModNodeKeyWithUid + | NodeKey_Link !UnitId + deriving (Eq, Ord) + +instance Outputable NodeKey where + ppr nk = pprNodeKey nk + +pprNodeKey :: NodeKey -> SDoc +pprNodeKey (NodeKey_Unit iu) = ppr iu +pprNodeKey (NodeKey_Module mk) = ppr mk +pprNodeKey (NodeKey_Link uid) = ppr uid + +data ModNodeKeyWithUid = ModNodeKeyWithUid { mnkModuleName :: ModuleNameWithIsBoot + , mnkUnitId :: UnitId } deriving (Eq, Ord) + +instance Outputable ModNodeKeyWithUid where + ppr (ModNodeKeyWithUid mnwib uid) = ppr uid <> colon <> ppr mnwib -- | A '@ModuleGraph@' contains all the nodes from the home package (only). See -- '@ModuleGraphNode@' for information about the nodes. @@ -125,8 +158,9 @@ needsTemplateHaskellOrQQ mg = mg_needs_th_or_qq mg mapMG :: (ModSummary -> ModSummary) -> ModuleGraph -> ModuleGraph mapMG f mg@ModuleGraph{..} = mg { mg_mss = flip fmap mg_mss $ \case - InstantiationNode iuid -> InstantiationNode iuid - ModuleNode (ExtendedModSummary ms bds) -> ModuleNode (ExtendedModSummary (f ms) bds) + InstantiationNode uid iuid -> InstantiationNode uid iuid + LinkNode uid nks -> LinkNode uid nks + ModuleNode deps ms -> ModuleNode deps (f ms) , mg_non_boot = mapModuleEnv f mg_non_boot } @@ -137,10 +171,7 @@ mgTransDeps :: ModuleGraph -> Map.Map NodeKey (Set.Set NodeKey) mgTransDeps = mg_trans_deps mgModSummaries :: ModuleGraph -> [ModSummary] -mgModSummaries mg = [ m | ModuleNode (ExtendedModSummary m _) <- mgModSummaries' mg ] - -mgExtendedModSummaries :: ModuleGraph -> [ExtendedModSummary] -mgExtendedModSummaries mg = [ ems | ModuleNode ems <- mgModSummaries' mg ] +mgModSummaries mg = [ m | ModuleNode _ m <- mgModSummaries' mg ] mgModSummaries' :: ModuleGraph -> [ModuleGraphNode] mgModSummaries' = mg_mss @@ -163,9 +194,9 @@ isTemplateHaskellOrQQNonBoot ms = -- | Add an ExtendedModSummary to ModuleGraph. Assumes that the new ModSummary is -- not an element of the ModuleGraph. -extendMG :: ModuleGraph -> ExtendedModSummary -> ModuleGraph -extendMG ModuleGraph{..} ems@(ExtendedModSummary ms _) = ModuleGraph - { mg_mss = ModuleNode ems : mg_mss +extendMG :: ModuleGraph -> [NodeKey] -> ModSummary -> ModuleGraph +extendMG ModuleGraph{..} deps ms = ModuleGraph + { mg_mss = ModuleNode deps ms : mg_mss , mg_trans_deps = td , mg_non_boot = case isBootSummary ms of IsBoot -> mg_non_boot @@ -176,24 +207,25 @@ extendMG ModuleGraph{..} ems@(ExtendedModSummary ms _) = ModuleGraph , mg_needs_th_or_qq = mg_needs_th_or_qq || isTemplateHaskellOrQQNonBoot ms } where - (gg, _lookup_node) = moduleGraphNodes False (ModuleNode ems : mg_mss) + (gg, _lookup_node) = moduleGraphNodes False (ModuleNode deps ms : mg_mss) td = allReachable gg (mkNodeKey . node_payload) -extendMGInst :: ModuleGraph -> InstantiatedUnit -> ModuleGraph -extendMGInst mg depUnitId = mg - { mg_mss = InstantiationNode depUnitId : mg_mss mg +extendMGInst :: ModuleGraph -> UnitId -> InstantiatedUnit -> ModuleGraph +extendMGInst mg uid depUnitId = mg + { mg_mss = InstantiationNode uid depUnitId : mg_mss mg } +extendMGLink :: ModuleGraph -> UnitId -> [NodeKey] -> ModuleGraph +extendMGLink mg uid nks = mg { mg_mss = LinkNode nks uid : mg_mss mg } + extendMG' :: ModuleGraph -> ModuleGraphNode -> ModuleGraph extendMG' mg = \case - InstantiationNode depUnitId -> extendMGInst mg depUnitId - ModuleNode ems -> extendMG mg ems - -mkModuleGraph :: [ExtendedModSummary] -> ModuleGraph -mkModuleGraph = foldr (flip extendMG) emptyMG + InstantiationNode uid depUnitId -> extendMGInst mg uid depUnitId + ModuleNode deps ms -> extendMG mg deps ms + LinkNode deps uid -> extendMGLink mg uid deps -mkModuleGraph' :: [ModuleGraphNode] -> ModuleGraph -mkModuleGraph' = foldr (flip extendMG') emptyMG +mkModuleGraph :: [ModuleGraphNode] -> ModuleGraph +mkModuleGraph = foldr (flip extendMG') emptyMG -- | This function filters out all the instantiation nodes from each SCC of a -- topological sort. Use this with care, as the resulting "strongly connected components" @@ -202,8 +234,9 @@ mkModuleGraph' = foldr (flip extendMG') emptyMG filterToposortToModules :: [SCC ModuleGraphNode] -> [SCC ModSummary] filterToposortToModules = mapMaybe $ mapMaybeSCC $ \case - InstantiationNode _ -> Nothing - ModuleNode (ExtendedModSummary node _) -> Just node + InstantiationNode _ _ -> Nothing + LinkNode{} -> Nothing + ModuleNode _deps node -> Just node where -- This higher order function is somewhat bogus, -- as the definition of "strongly connected component" @@ -217,9 +250,17 @@ filterToposortToModules = mapMaybe $ mapMaybeSCC $ \case as -> Just $ CyclicSCC as showModMsg :: DynFlags -> Bool -> ModuleGraphNode -> SDoc -showModMsg _ _ (InstantiationNode indef_unit) = +showModMsg dflags _ (LinkNode {}) = + let staticLink = case ghcLink dflags of + LinkStaticLib -> True + _ -> False + + platform = targetPlatform dflags + exe_file = exeFileName platform staticLink (outputFile_ dflags) + in text exe_file +showModMsg _ _ (InstantiationNode _uid indef_unit) = ppr $ instUnitInstanceOf indef_unit -showModMsg dflags recomp (ModuleNode (ExtendedModSummary mod_summary _)) = +showModMsg dflags recomp (ModuleNode _ mod_summary) = if gopt Opt_HideSourcePaths dflags then text mod_str else hsep $ @@ -244,7 +285,6 @@ showModMsg dflags recomp (ModuleNode (ExtendedModSummary mod_summary _)) = - type SummaryNode = Node Int ModuleGraphNode summaryNodeKey :: SummaryNode -> Int @@ -261,22 +301,23 @@ summaryNodeSummary = node_payload -- .hs, by introducing a cycle. Additionally, it ensures that we will always -- process the .hs-boot before the .hs, and so the HomePackageTable will always -- have the most up to date information. -unfilteredEdges :: Bool -> ModuleGraphNode -> [NodeKey] -unfilteredEdges drop_hs_boot_nodes = \case - InstantiationNode iuid -> - NodeKey_Module . flip GWIB NotBoot <$> uniqDSetToList (instUnitHoles iuid) - ModuleNode (ExtendedModSummary ms bds) -> - [ NodeKey_Unit inst_unit | inst_unit <- bds ] ++ - (NodeKey_Module . flip GWIB hs_boot_key . unLoc <$> ms_home_srcimps ms) ++ - [ NodeKey_Module $ GWIB (ms_mod_name ms) IsBoot +nodeDependencies :: Bool -> ModuleGraphNode -> [NodeKey] +nodeDependencies drop_hs_boot_nodes = \case + LinkNode deps _uid -> deps + InstantiationNode uid iuid -> + NodeKey_Module . (\mod -> ModNodeKeyWithUid (GWIB mod NotBoot) uid) <$> uniqDSetToList (instUnitHoles iuid) + ModuleNode deps ms -> + [ NodeKey_Module $ (ModNodeKeyWithUid (GWIB (ms_mod_name ms) IsBoot) (ms_unitid ms)) | not $ drop_hs_boot_nodes || ms_hsc_src ms == HsBootFile - ] ++ - (NodeKey_Module . flip GWIB NotBoot . unLoc <$> ms_home_imps ms) + ] ++ map drop_hs_boot deps where -- Drop hs-boot nodes by using HsSrcFile as the key hs_boot_key | drop_hs_boot_nodes = NotBoot -- is regular mod or signature | otherwise = IsBoot + drop_hs_boot (NodeKey_Module (ModNodeKeyWithUid (GWIB mn IsBoot) uid)) = (NodeKey_Module (ModNodeKeyWithUid (GWIB mn hs_boot_key) uid)) + drop_hs_boot x = x + moduleGraphNodes :: Bool -> [ModuleGraphNode] -> (Graph SummaryNode, NodeKey -> Maybe SummaryNode) moduleGraphNodes drop_hs_boot_nodes summaries = @@ -299,39 +340,30 @@ moduleGraphNodes drop_hs_boot_nodes summaries = -- We use integers as the keys for the SCC algorithm nodes :: [SummaryNode] - nodes = [ DigraphNode s key $ out_edge_keys $ unfilteredEdges drop_hs_boot_nodes s + nodes = [ DigraphNode s key $ out_edge_keys $ nodeDependencies drop_hs_boot_nodes s | (s, key) <- numbered_summaries -- Drop the hi-boot ones if told to do so , case s of - InstantiationNode _ -> True - ModuleNode ems -> not $ isBootSummary (emsModSummary ems) == IsBoot && drop_hs_boot_nodes + InstantiationNode {} -> True + LinkNode {} -> True + ModuleNode _ ms -> not $ isBootSummary ms == IsBoot && drop_hs_boot_nodes ] out_edge_keys :: [NodeKey] -> [Int] out_edge_keys = mapMaybe lookup_key -- If we want keep_hi_boot_nodes, then we do lookup_key with -- IsBoot; else False - -type ModNodeKey = ModuleNameWithIsBoot - -data NodeKey = NodeKey_Unit {-# UNPACK #-} !InstantiatedUnit | NodeKey_Module {-# UNPACK #-} !ModNodeKey - deriving (Eq, Ord) - -instance Outputable NodeKey where - ppr nk = pprNodeKey nk - newtype NodeMap a = NodeMap { unNodeMap :: Map.Map NodeKey a } deriving (Functor, Traversable, Foldable) mkNodeKey :: ModuleGraphNode -> NodeKey mkNodeKey = \case - InstantiationNode x -> NodeKey_Unit x - ModuleNode x -> NodeKey_Module $ ms_mnwib (emsModSummary x) + InstantiationNode _ iu -> NodeKey_Unit iu + ModuleNode _ x -> NodeKey_Module $ msKey x + LinkNode _ uid -> NodeKey_Link uid -msKey :: ModSummary -> ModuleNameWithIsBoot -msKey = ms_mnwib +msKey :: ModSummary -> ModNodeKeyWithUid +msKey ms = ModNodeKeyWithUid (ms_mnwib ms) (ms_unitid ms) -pprNodeKey :: NodeKey -> SDoc -pprNodeKey (NodeKey_Unit iu) = ppr iu -pprNodeKey (NodeKey_Module mk) = ppr mk +type ModNodeKey = ModuleNameWithIsBoot |