{-# LANGUAGE TypeSynonymInstances #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE DeriveFunctor #-} -- | Units are library components from Cabal packages compiled and installed in -- a database module GHC.Unit ( module GHC.Unit.Types , module GHC.Unit.Info , module GHC.Unit.Parser , module GHC.Unit.State , module GHC.Unit.Subst , module GHC.Unit.Module ) where import GHC.Unit.Types import GHC.Unit.Info import GHC.Unit.Parser import GHC.Unit.State import GHC.Unit.Subst import GHC.Unit.Module -- Note [About Units] -- ~~~~~~~~~~~~~~~~~~ -- -- Haskell users are used to manipulate Cabal packages. These packages are -- identified by: -- - a package name :: String -- - a package version :: Version -- - (a revision number, when they are registered on Hackage) -- -- Cabal packages may contain several components (libraries, programs, -- testsuites). In GHC we are mostly interested in libraries because those are -- the components that can be depended upon by other components. Components in a -- package are identified by their component name. Historically only one library -- component was allowed per package, hence it didn't need a name. For this -- reason, component name may be empty for one library component in each -- package: -- - a component name :: Maybe String -- -- UnitId -- ------ -- -- Cabal libraries can be compiled in various ways (different compiler options -- or Cabal flags, different dependencies, etc.), hence using package name, -- package version and component name isn't enough to identify a built library. -- We use another identifier called UnitId: -- -- package name \ -- package version | ________ -- component name | hash of all this ==> | UnitId | -- Cabal flags | -------- -- compiler options | -- dependencies' UnitId / -- -- Fortunately GHC doesn't have to generate these UnitId: they are provided by -- external build tools (e.g. Cabal) with `-this-unit-id` command-line parameter. -- -- UnitIds are important because they are used to generate internal names -- (symbols, etc.). -- -- Wired-in units -- -------------- -- -- Certain libraries are known to the compiler, in that we know about certain -- entities that reside in these libraries. The compiler needs to declare static -- Modules and Names that refer to units built from these libraries. -- -- Hence UnitIds of wired-in libraries are fixed. Instead of letting Cabal chose -- the UnitId for these libraries, their .cabal file uses the following stanza to -- force it to a specific value: -- -- ghc-options: -this-unit-id ghc-prim -- taken from ghc-prim.cabal -- -- The RTS also uses entities of wired-in units by directly referring to symbols -- such as "base_GHCziIOziException_heapOverflow_closure" where the prefix is -- the UnitId of "base" unit. -- -- Unit databases -- -------------- -- -- Units are stored in databases in order to be reused by other codes: -- -- UnitKey ---> UnitInfo { exposed modules, package name, package version -- component name, various file paths, -- dependencies :: [UnitKey], etc. } -- -- Because of the wired-in units described above, we can't exactly use UnitIds -- as UnitKeys in the database: if we did this, we could only have a single unit -- (compiled library) in the database for each wired-in library. As we want to -- support databases containing several different units for the same wired-in -- library, we do this: -- -- * for non wired-in units: -- * UnitId = UnitKey = Identifier (hash) computed by Cabal -- -- * for wired-in units: -- * UnitKey = Identifier computed by Cabal (just like for non wired-in units) -- * UnitId = unit-id specified with -this-unit-id command-line flag -- -- We can expose several units to GHC via the `package-id ` -- command-line parameter. We must use the UnitKeys of the units so that GHC can -- find them in the database. -- -- GHC then replaces the UnitKeys with UnitIds by taking into account wired-in -- units: these units are detected thanks to their UnitInfo (especially their -- package name). -- -- For example, knowing that "base", "ghc-prim" and "rts" are wired-in packages, -- the following dependency graph expressed with UnitKeys (as found in the -- database) will be transformed into a similar graph expressed with UnitIds -- (that are what matters for compilation): -- -- UnitKeys -- ~~~~~~~~ ---> rts-1.0-hashABC <-- -- | | -- | | -- foo-2.0-hash123 --> base-4.1-hashXYZ ---> ghc-prim-0.5.3-hashABC -- -- UnitIds -- ~~~~~~~ ---> rts <-- -- | | -- | | -- foo-2.0-hash123 --> base ---------------> ghc-prim -- -- -- Module signatures / indefinite units / instantiated units -- --------------------------------------------------------- -- -- GHC distinguishes two kinds of units: -- -- * definite: units for which every module has an associated code object -- (i.e. real compiled code in a .o/.a/.so/.dll/...) -- -- * indefinite: units for which some modules are replaced by module -- signatures. -- -- Module signatures are a kind of interface (similar to .hs-boot files). They -- are used in place of some real code. GHC allows real modules from other -- units to be used to fill these module holes. The process is called -- "unit/module instantiation". -- -- You can think of this as polymorphism at the module level: module signatures -- give constraints on the "type" of module that can be used to fill the hole -- (where "type" means types of the exported module entitites, etc.). -- -- Module signatures contain enough information (datatypes, abstract types, type -- synonyms, classes, etc.) to typecheck modules depending on them but not -- enough to compile them. As such, indefinite units found in databases only -- provide module interfaces (the .hi ones this time), not object code. -- -- To distinguish between indefinite and finite unit ids at the type level, we -- respectively use 'IndefUnitId' and 'DefUnitId' datatypes that are basically -- wrappers over 'UnitId'. -- -- Unit instantiation -- ------------------ -- -- Indefinite units can be instantiated with modules from other units. The -- instantiating units can also be instantiated themselves (if there are -- indefinite) and so on. The 'Unit' datatype represents a unit which may have -- been instantiated: -- -- data Unit = RealUnit DefUnitId -- | VirtUnit InstantiatedUnit -- -- 'InstantiatedUnit' has two interesting fields: -- -- * instUnitInstanceOf :: IndefUnitId -- -- ^ the indefinite unit that is instantiated -- -- * instUnitInsts :: [(ModuleName,(Unit,ModuleName)] -- -- ^ a list of instantiations, where an instantiation is: -- (module hole name, (instantiating unit, instantiating module name)) -- -- A 'Unit' may be indefinite or definite, it depends on whether some holes -- remain in the instantiated unit OR in the instantiating units (recursively). -- -- Pretty-printing UnitId -- ---------------------- -- -- GHC mostly deals with UnitIds which are some opaque strings. We could display -- them when we pretty-print a module origin, a name, etc. But it wouldn't be -- very friendly to the user because of the hash they usually contain. E.g. -- -- foo-4.18.1:thelib-XYZsomeUglyHashABC -- -- Instead when we want to pretty-print a 'UnitId' we query the database to -- get the 'UnitInfo' and print something nicer to the user: -- -- foo-4.18.1:thelib -- -- We do the same for wired-in units. -- -- Currently (2020-04-06), we don't thread the database into every function that -- pretty-prints a Name/Module/Unit. Instead querying the database is delayed -- until the `SDoc` is transformed into a `Doc` using the database that is -- active at this point in time. This is an issue because we want to be able to -- unload units from the database and we also want to support several -- independent databases loaded at the same time (see #14335). The alternatives -- we have are: -- -- * threading the database into every function that pretty-prints a UnitId -- for the user (directly or indirectly). -- -- * storing enough info to correctly display a UnitId into the UnitId -- datatype itself. This is done in the IndefUnitId wrapper (see -- 'UnitPprInfo' datatype) but not for every 'UnitId'. Statically defined -- 'UnitId' for wired-in units would have empty UnitPprInfo so we need to -- find some places to update them if we want to display wired-in UnitId -- correctly. This leads to a solution similar to the first one above. -- -- Note [VirtUnit to RealUnit improvement] -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- -- Over the course of instantiating VirtUnits on the fly while typechecking an -- indefinite library, we may end up with a fully instantiated VirtUnit. I.e. -- one that could be compiled and installed in the database. During -- type-checking we generate a virtual UnitId for it, say "abc". -- -- Now the question is: do we have a matching installed unit in the database? -- Suppose we have one with UnitId "xyz" (provided by Cabal so we don't know how -- to generate it). The trouble is that if both units end up being used in the -- same type-checking session, their names won't match (e.g. "abc:M.X" vs -- "xyz:M.X"). -- -- As we want them to match we just replace the virtual unit with the installed -- one: for some reason this is called "improvement". -- -- There is one last niggle: improvement based on the package database means -- that we might end up developing on a package that is not transitively -- depended upon by the packages the user specified directly via command line -- flags. This could lead to strange and difficult to understand bugs if those -- instantiations are out of date. The solution is to only improve a -- unit id if the new unit id is part of the 'preloadClosure'; i.e., the -- closure of all the packages which were explicitly specified. -- Note [Representation of module/name variables] -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- In our ICFP'16, we use to represent module holes, and {A.T} to represent -- name holes. This could have been represented by adding some new cases -- to the core data types, but this would have made the existing 'moduleName' -- and 'moduleUnit' partial, which would have required a lot of modifications -- to existing code. -- -- Instead, we use a fake "hole" unit: -- -- ===> hole:A -- {A.T} ===> hole:A.T -- -- This encoding is quite convenient, but it is also a bit dangerous too, -- because if you have a 'hole:A' you need to know if it's actually a -- 'Module' or just a module stored in a 'Name'; these two cases must be -- treated differently when doing substitutions. 'renameHoleModule' -- and 'renameHoleUnit' assume they are NOT operating on a -- 'Name'; 'NameShape' handles name substitutions exclusively.