{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE DeriveFunctor #-}

-- | Units are library components from Cabal packages compiled and installed in
-- a database
module GHC.Unit
   ( module GHC.Unit.Types
   , module GHC.Unit.Info
   , module GHC.Unit.Parser
   , module GHC.Unit.State
   , module GHC.Unit.Subst
   , module GHC.Unit.Module
   )
where

import GHC.Unit.Types
import GHC.Unit.Info
import GHC.Unit.Parser
import GHC.Unit.State
import GHC.Unit.Subst
import GHC.Unit.Module

-- Note [About Units]
-- ~~~~~~~~~~~~~~~~~~
--
-- Haskell users are used to manipulate Cabal packages. These packages are
-- identified by:
--    - a package name :: String
--    - a package version :: Version
--    - (a revision number, when they are registered on Hackage)
--
-- Cabal packages may contain several components (libraries, programs,
-- testsuites). In GHC we are mostly interested in libraries because those are
-- the components that can be depended upon by other components. Components in a
-- package are identified by their component name. Historically only one library
-- component was allowed per package, hence it didn't need a name. For this
-- reason, component name may be empty for one library component in each
-- package:
--    - a component name :: Maybe String
--
-- UnitId
-- ------
--
-- Cabal libraries can be compiled in various ways (different compiler options
-- or Cabal flags, different dependencies, etc.), hence using package name,
-- package version and component name isn't enough to identify a built library.
-- We use another identifier called UnitId:
--
--   package name             \
--   package version          |                       ________
--   component name           | hash of all this ==> | UnitId |
--   Cabal flags              |                       --------
--   compiler options         |
--   dependencies' UnitId     /
--
-- Fortunately GHC doesn't have to generate these UnitId: they are provided by
-- external build tools (e.g. Cabal) with `-this-unit-id` command-line parameter.
--
-- UnitIds are important because they are used to generate internal names
-- (symbols, etc.).
--
-- Wired-in units
-- --------------
--
-- Certain libraries are known to the compiler, in that we know about certain
-- entities that reside in these libraries. The compiler needs to declare static
-- Modules and Names that refer to units built from these libraries.
--
-- Hence UnitIds of wired-in libraries are fixed. Instead of letting Cabal chose
-- the UnitId for these libraries, their .cabal file uses the following stanza to
-- force it to a specific value:
--
--    ghc-options: -this-unit-id ghc-prim    -- taken from ghc-prim.cabal
--
-- The RTS also uses entities of wired-in units by directly referring to symbols
-- such as "base_GHCziIOziException_heapOverflow_closure" where the prefix is
-- the UnitId of "base" unit.
--
-- Unit databases
-- --------------
--
-- Units are stored in databases in order to be reused by other codes:
--
--    UnitKey ---> UnitInfo { exposed modules, package name, package version
--                            component name, various file paths,
--                            dependencies :: [UnitKey], etc. }
--
-- Because of the wired-in units described above, we can't exactly use UnitIds
-- as UnitKeys in the database: if we did this, we could only have a single unit
-- (compiled library) in the database for each wired-in library. As we want to
-- support databases containing several different units for the same wired-in
-- library, we do this:
--
--    * for non wired-in units:
--       * UnitId = UnitKey = Identifier (hash) computed by Cabal
--
--    * for wired-in units:
--       * UnitKey = Identifier computed by Cabal (just like for non wired-in units)
--       * UnitId  = unit-id specified with -this-unit-id command-line flag
--
-- We can expose several units to GHC via the `package-id <UnitKey>`
-- command-line parameter. We must use the UnitKeys of the units so that GHC can
-- find them in the database.
--
-- GHC then replaces the UnitKeys with UnitIds by taking into account wired-in
-- units: these units are detected thanks to their UnitInfo (especially their
-- package name).
--
-- For example, knowing that "base", "ghc-prim" and "rts" are wired-in packages,
-- the following dependency graph expressed with UnitKeys (as found in the
-- database) will be transformed into a similar graph expressed with UnitIds
-- (that are what matters for compilation):
--
--    UnitKeys
--    ~~~~~~~~                             ---> rts-1.0-hashABC <--
--                                         |                      |
--                                         |                      |
--    foo-2.0-hash123 --> base-4.1-hashXYZ ---> ghc-prim-0.5.3-hashABC
--
--    UnitIds
--    ~~~~~~~                              ---> rts <--
--                                         |          |
--                                         |          |
--    foo-2.0-hash123 --> base ---------------> ghc-prim
--
--
-- Module signatures / indefinite units / instantiated units
-- ---------------------------------------------------------
--
-- GHC distinguishes two kinds of units:
--
--    * definite: units for which every module has an associated code object
--    (i.e. real compiled code in a .o/.a/.so/.dll/...)
--
--    * indefinite: units for which some modules are replaced by module
--    signatures.
--
-- Module signatures are a kind of interface (similar to .hs-boot files). They
-- are used in place of some real code. GHC allows real modules from other
-- units to be used to fill these module holes. The process is called
-- "unit/module instantiation".
--
-- You can think of this as polymorphism at the module level: module signatures
-- give constraints on the "type" of module that can be used to fill the hole
-- (where "type" means types of the exported module entitites, etc.).
--
-- Module signatures contain enough information (datatypes, abstract types, type
-- synonyms, classes, etc.) to typecheck modules depending on them but not
-- enough to compile them. As such, indefinite units found in databases only
-- provide module interfaces (the .hi ones this time), not object code.
--
-- To distinguish between indefinite and finite unit ids at the type level, we
-- respectively use 'IndefUnitId' and 'DefUnitId' datatypes that are basically
-- wrappers over 'UnitId'.
--
-- Unit instantiation
-- ------------------
--
-- Indefinite units can be instantiated with modules from other units. The
-- instantiating units can also be instantiated themselves (if there are
-- indefinite) and so on. The 'Unit' datatype represents a unit which may have
-- been instantiated:
--
--    data Unit = RealUnit DefUnitId
--              | VirtUnit InstantiatedUnit
--
-- 'InstantiatedUnit' has two interesting fields:
--
--    * instUnitInstanceOf :: IndefUnitId
--       -- ^ the indefinite unit that is instantiated
--
--    * instUnitInsts :: [(ModuleName,(Unit,ModuleName)]
--       -- ^ a list of instantiations, where an instantiation is:
--            (module hole name, (instantiating unit, instantiating module name))
--
-- A 'Unit' may be indefinite or definite, it depends on whether some holes
-- remain in the instantiated unit OR in the instantiating units (recursively).
--
-- Pretty-printing UnitId
-- ----------------------
--
-- GHC mostly deals with UnitIds which are some opaque strings. We could display
-- them when we pretty-print a module origin, a name, etc. But it wouldn't be
-- very friendly to the user because of the hash they usually contain. E.g.
--
--    foo-4.18.1:thelib-XYZsomeUglyHashABC
--
-- Instead when we want to pretty-print a 'UnitId' we query the database to
-- get the 'UnitInfo' and print something nicer to the user:
--
--    foo-4.18.1:thelib
--
-- We do the same for wired-in units.
--
-- Currently (2020-04-06), we don't thread the database into every function that
-- pretty-prints a Name/Module/Unit. Instead querying the database is delayed
-- until the `SDoc` is transformed into a `Doc` using the database that is
-- active at this point in time. This is an issue because we want to be able to
-- unload units from the database and we also want to support several
-- independent databases loaded at the same time (see #14335). The alternatives
-- we have are:
--
--    * threading the database into every function that pretty-prints a UnitId
--    for the user (directly or indirectly).
--
--    * storing enough info to correctly display a UnitId into the UnitId
--    datatype itself. This is done in the IndefUnitId wrapper (see
--    'UnitPprInfo' datatype) but not for every 'UnitId'. Statically defined
--    'UnitId' for wired-in units would have empty UnitPprInfo so we need to
--    find some places to update them if we want to display wired-in UnitId
--    correctly. This leads to a solution similar to the first one above.
--
-- Note [VirtUnit to RealUnit improvement]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
-- Over the course of instantiating VirtUnits on the fly while typechecking an
-- indefinite library, we may end up with a fully instantiated VirtUnit. I.e.
-- one that could be compiled and installed in the database. During
-- type-checking we generate a virtual UnitId for it, say "abc".
--
-- Now the question is: do we have a matching installed unit in the database?
-- Suppose we have one with UnitId "xyz" (provided by Cabal so we don't know how
-- to generate it). The trouble is that if both units end up being used in the
-- same type-checking session, their names won't match (e.g. "abc:M.X" vs
-- "xyz:M.X").
--
-- As we want them to match we just replace the virtual unit with the installed
-- one: for some reason this is called "improvement".
--
-- There is one last niggle: improvement based on the package database means
-- that we might end up developing on a package that is not transitively
-- depended upon by the packages the user specified directly via command line
-- flags.  This could lead to strange and difficult to understand bugs if those
-- instantiations are out of date.  The solution is to only improve a
-- unit id if the new unit id is part of the 'preloadClosure'; i.e., the
-- closure of all the packages which were explicitly specified.

-- Note [Representation of module/name variables]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- In our ICFP'16, we use <A> to represent module holes, and {A.T} to represent
-- name holes.  This could have been represented by adding some new cases
-- to the core data types, but this would have made the existing 'moduleName'
-- and 'moduleUnit' partial, which would have required a lot of modifications
-- to existing code.
--
-- Instead, we use a fake "hole" unit:
--
--      <A>   ===> hole:A
--      {A.T} ===> hole:A.T
--
-- This encoding is quite convenient, but it is also a bit dangerous too,
-- because if you have a 'hole:A' you need to know if it's actually a
-- 'Module' or just a module stored in a 'Name'; these two cases must be
-- treated differently when doing substitutions.  'renameHoleModule'
-- and 'renameHoleUnit' assume they are NOT operating on a
-- 'Name'; 'NameShape' handles name substitutions exclusively.