diff options
Diffstat (limited to 'docs/backpack')
| -rw-r--r-- | docs/backpack/algorithm.pdf | bin | 299029 -> 286307 bytes | |||
| -rw-r--r-- | docs/backpack/algorithm.tex | 819 | 
2 files changed, 233 insertions, 586 deletions
| diff --git a/docs/backpack/algorithm.pdf b/docs/backpack/algorithm.pdfBinary files differ index 9d72314d19..22692dd80e 100644 --- a/docs/backpack/algorithm.pdf +++ b/docs/backpack/algorithm.pdf diff --git a/docs/backpack/algorithm.tex b/docs/backpack/algorithm.tex index 2de50f5f8f..588f10e8cc 100644 --- a/docs/backpack/algorithm.tex +++ b/docs/backpack/algorithm.tex @@ -56,28 +56,50 @@  \maketitle -This document describes the Backpack shaping and typechecking -passes, as we intend to implement it. +\noindent +In this document, we look at the compilation of a Backpack unit. +Here are the steps a unit goes through: -\section{Changelog} - -\paragraph{April 28, 2015}  A signature declaration no longer provides -a signature in the technical shaping sense; the motivation for this change -is explained in \textbf{In-scope signatures are not provisions}.  The simplest -consequence of this is that all requirements are importable (Derek has stated that he doesn't -think this will be too much of a problem in practice); it is also possible to -extend shape with a \verb|signatures| field, although some work has to be -done specifying coherence conditions between \verb|signatures| and \verb|requirements|. +\begin{enumerate} +    \item The \textbf{unit renamer} takes the Backpack file consisting +    of units transforms each unit name to an indefinite unit ID \I{IndefUnitId}. +    In particular, it associates each unit name with its local binding +    site (unit declaration), or an external unit declaration from +    the indefinite unit database. + +    \item The \textbf{dependency solver} takes a \I{unit} +    and converts it into a +    directed acyclic graph representing the +    dependency structure of the declarations in a unit. +    It also computes the \I{Module} of each module/signature +    declaration, the \I{UnitKey} of each include, and the overall +    requirements of the unit. + +    \item The \textbf{shaping pass} takes the DAG +    and computes its \I{Shape}.  A \I{Shape} describes precisely what +    a unit provides and requires at the Haskell declaration level (\I{AvailInfo}). + +    \item The \textbf{indefinite pipeline} takes the DAG and the shape +    and typechecks each module and signature against the indefinite unit +    database.  The type checking results are saved in the indefinite +    unit database under the \I{IndefiniteUnitId}. + +    \item The \textbf{definite pipeline} takes the DAG as well as a +    \emph{hole mapping} specifying how each requirement in the unit +    is to be filled, and type-checks and compiles the unit against the +    installed unit database.  The type checking results and object files +    are saved to the installed unit database under the \I{InstalledUnitId}. +\end{enumerate}  \section{Front-end syntax}  \begin{figure}[htpb]  $$  \begin{array}{rcll} -p,q,r && \mbox{Package names} \\ +p,q,r && \mbox{Unit names} \\  m,n   && \mbox{Module names} \\[1em] -\multicolumn{3}{l}{\mbox{\bf Packages}} \\ -  \I{pkg} & ::= & \verb|package|\; p\; [\I{provreq}]\; \verb|where {| d_1 \verb|;| \ldots \verb|;| d_n \verb|}| \\[1em] +\multicolumn{3}{l}{\mbox{\bf Units}} \\ +  \I{unit} & ::= & \verb|unit|\; p\; [\I{provreq}]\; \verb|where {| d_1 \verb|;| \ldots \verb|;| d_n \verb|}| \\[1em]  \multicolumn{3}{l}{\mbox{\bf Declarations}} \\    d & ::= & \verb|module|\;    m \; [exports]\; \verb|where|\; \I{body} \\      & |   & \verb|signature|\; m \; [exports]\; \verb|where|\; \I{body} \\ @@ -95,10 +117,139 @@ $$  \caption{Syntax of Backpack} \label{fig:syntax}  \end{figure} -The syntax of Backpack is given in Figure~\ref{fig:syntax}. -See the ``Backpack manual'' for more explanation about the syntax.  It -is slightly simplified here by removing any constructs which are easily implemented as -syntactic sugar (e.g., a bare $m$ in a renaming is simply $m\; \verb|as|\; m$.) +A (slightly simplified) syntax of Backpack is given in Figure~\ref{fig:syntax}. + +\newpage +\section{Unit renamer} + +\begin{figure}[htpb] +$$ +\begin{array}{rcll} +  \I{InstalledPackageId} &  & \mbox{Installed package IDs} \\ +  \I{IndefiniteUnitId} & ::= & \I{InstalledPackageId}\, \verb|-|\, p  \\ +  \I{InstalledUnitId} & ::= & \I{IndefiniteUnitId} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| & \mbox{Also known as \I{UnitKey}} \\ +  \I{Module} & ::= & \I{InstalledUnitId} \verb|:| m \\ +\end{array} +$$ +\caption{Unit identification} \label{fig:ids} +\end{figure} + +The unit renamer is responsible for transforming unit names $p$ into \I{IndefiniteUnitId}s, given some current \I{InstalledPackageId} (\I{ThisInstalledPackageId}) and a mapping from $p$ to +\I{IndefiniteUnitId} (\I{UnitNameMap}).  Its operation on a Backpack file (collection of units) is very simple: + +\begin{itemize} +    \item Every unit declaration $\verb|unit|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$ +    \item Every unit include $\verb|include|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$ if $p$ was declared in the Backpack file; otherwise it is renamed according to \I{UnitNameMap}. +\end{itemize} +% +The purpose of an \emph{IndefiniteUnitId} is to uniquely identify the results of typechecking +an indefinite unit; similarly, an \emph{InstalledUnitId} uniquely identifies +the results of compiling a unit with all its holes +filled.  It thus records a \emph{hole mapping} which specifies how each +hole was filled. + +An \emph{installed package ID} (IPID) is an opaque string provided by +Cabal which uniquely identifies an installed package.  A recipe for +computing an IPID would incorporate both the source info, such as the hash of the +source code distribution tarball, as well as build info, such +as the selected Cabal and GHC flags, what the provided mapping from +$p$ to $IndefiniteUnitId$ was, etc. + +\paragraph{The difference between units and packages} +Cabal packages are: + +\begin{itemize} +    \item The unit of distribution +    \item The unit that Hackage handles +    \item The unit of versioning +    \item The unit of ownership (who maintains it etc)  +\end{itemize} + +Backpack units are the building blocks of modular development; +there may be multiple units per a Cabal package.  While in theory +Cabal could do sophisticated things with multiple units in a +package, we expect Cabal +to pick a distinguished unit (with the same unit name $p$ as +the package) which serves as the publically visible unit. + +\paragraph{Notational convention} +In the rest of this document, when it is unambiguous, we will use $p$, $q$ and $r$ +interchangeably with \I{IndefiniteUnitId}, as after unit renaming, there +are no occurrences of unit names. + +\newpage +\section{Dependency solver} + +\begin{figure}[htpb] +$$ +\begin{array}{rcll} +  \tilde{d} & ::= & \verb|module|\;    Module \; [exports]\; \verb|where|\; \I{body} \\ +    & |   & \verb|signature|\; \I{Module} \; [exports]\; \verb|where|\; \I{body} \\ +    & |   & \verb|include|\; p\,\verb|(|\, m\; \verb|->|\; \I{Module}\,\verb|,|\;\ldots\,\verb|)| \\ +    & |   & \verb|merge|\; \I{Module}\; \verb|of|\; (\I{Module}, \I{IsSource?}) \ldots +\end{array} +$$ +\caption{Resolved declarations} \label{fig:resolved} +\end{figure} + +The first phase of compilation is defines a directed acyclic graph on +the source syntax representing the dependency structure of the +modules/signatures/includes in the unit. This DAG has a node per: + +\begin{itemize} +    \item Each source-level module, signature and include, and +    \item Each unfilled requirement (called a ``signature merge'' node). +\end{itemize} +% +Each module, signature and signature merge node can be identified +with the tuple $\left(\I{Module}, \I{IsSource?}\right)$, where \I{IsSource?} +is true for signatures and false for modules and signature merges. +The four nodes are described in Figure~\ref{fig:resolved}. + +The edges of the directed graph signify a ``depends on'' relation, and are +defined as follows: + +\begin{itemize} +    \item A module/signature $m$ depends on $\verb|include|\; p$ if $m$ imports a module provided by $p$. +    \item A module/signature $m$ depends on a module/signature merge $n$ if $m$ imports $n$. +    \item A module/signature $m$ depends on a signature $n$ if $m$ \verb|{-# SOURCE #-}| imports $n$. +    \item A module/signature merge $m$ depends on a local signature $m$ (if it exists). +    \item A module/signature merge $m$ depends on a $\verb|include|\; p$, if the include requires $m$. +\end{itemize} +% +For compilation, these extra edges can also be defined if they +do not introduce a cycle: + +\begin{itemize} +    \item An $\verb|include|\; p$ depends on $\verb|include|\; q$ if, for some module name $m$, $p$ requires $m$ and $q$ provides $m$. +    \item An $\verb|include|\; p$ depends on a module $m$ if $p$ requires a module named $m$. +\end{itemize} +% +If the resulting graph has a cycle, this is an error. + +\paragraph{Computing unfilled requirements}  To compute unfilled requirements, +maintain two sets of module names: the provisions $P$ and the possible requirements $R'$.  For each declaration: + +\begin{itemize} +    \item $\verb|include|\; p$: union provisions with $P$ and requirements with $R'$. +    \item $\verb|module|\; m$: add $m$ to $P$ +    \item $\verb|signature|\; m$: add $m$ to $R'$ +\end{itemize} +% +The unfilled requirements $R=R'-P$. + +\paragraph{Computing the \I{Module} of declarations} +The \I{Module} of any declaration $m$ in a unit $p$ is simply +\verb|p(A -> HOLE:A, ...):m|, where the hole map is a map +from each unfilled requirement $n$ to \verb|HOLE:n|. + +\paragraph{Computing the hole mapping of includes}  In absence of mutual +recursion of includes, the DAG is acyclic with include-include edges.  Process includes +in this topological order, maintaining a mapping of provided modules $\Gamma$, accumulating +provisions of includes as we go along.  For each $\verb|include|; p$, the hole map is +simply the requirements of $p$, mapping $m$ to $\Gamma(m)$ if it is defined, and \verb|HOLE:m| otherwise. + +With mutual recursion, we have to use the regular tree unification algorithm described in the Backpack paper.  We omit it from here for now.  \newpage  \section{Shaping} @@ -106,21 +257,22 @@ syntactic sugar (e.g., a bare $m$ in a renaming is simply $m\; \verb|as|\; m$.)  \begin{figure}[htpb]  $$  \begin{array}{rcll} -\I{Shape} & ::= & \verb|provides:|\; m \; \verb|->|\; \I{Module}\; \verb|{|\, \I{Name} \verb|,|\, \ldots \, \verb|};| \ldots \\ -      &     & \verb|requires:| \; m \; \verb|->|\; \textcolor{white}{\I{Module}}\; \verb|{| \, \I{Name} \verb|,| \, \ldots \, \verb|}| \verb|;| \ldots \\ -\I{PkgKey} & ::= & p \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| \\ -\I{Module} & ::= & \I{PkgKey} \verb|:| m \\ +\I{Shape} & ::= & \verb|provides:|\; m \; \verb|->|\; \I{Module}\; \verb|{|\, \I{AvailInfo} \verb|,|\, \ldots \, \verb|};| \ldots \\ +      &     & \verb|requires:| \; m \; \verb|->|\; \textcolor{white}{\I{Module}}\; \verb|{| \, \I{AvailInfo} \verb|,| \, \ldots \, \verb|}| \verb|;| \ldots \\ +\I{AvailInfo} & ::= & \I{Name} & \mbox{Plain identifiers} \\ +          & |   & \I{Name} \, \verb|{| \, \I{Name}_0\verb|,| \, \ldots\verb|,| \, \I{Name}_n \, \verb|}| & \mbox{Type constructors} \\  \I{Name}   & ::= & \I{Module} \verb|.| \I{OccName} \\  \I{OccName} & & \mbox{Unqualified name in a namespace}  \end{array}  $$ -\caption{Semantic entities in Backpack} \label{fig:semantic} +\caption{Shaping} \label{fig:semantic}  \end{figure}  Shaping computes a \I{Shape}, whose form is described in Figure~\ref{fig:semantic}. -A shape describes what modules a package implements and exports (the \emph{provides}) -and what signatures a package needs to have filled in (the \emph{requires}).  Both -provisions and requires can be imported after a package is included. +A shape describes what modules a unit implements and exports (the \emph{provides}) +and what signatures a unit needs to have filled in (the \emph{requires}).  Both +provisions and requires are available for import by units which include this +unit.  We incrementally build a shape by starting with an empty  shape context and adding to it as follows: @@ -128,27 +280,16 @@ shape context and adding to it as follows:  \begin{enumerate}      \item Calculate the shape of a declaration, with respect to the          current shape context.  (e.g., by renaming a module/signature, -        or using the shape from an included package.) +        or using the shape from an included unit.)      \item Merge this shape into the shape context.  \end{enumerate} -The final shape context is the shape of the package as a whole. +The final shape context is the shape of the unit as a whole.  Optionally, we can also compute the renamed syntax trees of  modules and signatures. -%   (There is a slight -%   technical difficulty here, where to do shaping, we actually need an \texttt{AvailInfo}, -%   so we can resolve \texttt{T(..)} style imports.) - -%   One variation of shaping also computes the renamed version of a package, -%   i.e., where each identifier in the module and signature is replaced with -%   the original name (equivalently, we record the output of GHC's renaming -%   pass). This simplifies type checking because you no longer have to -%   recalculate the set of available names, which otherwise would be lost. -%   See more about this in the type checking section. - -In the description below, we'll assume \verb|THIS| is the package key -of the package being processed. +In the description below, we'll assume \verb|THIS| is the unit key +of the unit being processed.  \begin{aside}  \textbf{\textit{OccName} is implied by \textit{Name}.} @@ -171,11 +312,11 @@ Presently, however, such an \I{OccName} annotation would be redundant: it can be  \end{aside}  \begin{aside} -\textbf{Holes of a package are a mapping, not a set.} Why can't the \I{PkgKey} just record a -set of \I{Module}s, e.g. $\I{PkgKey}\;::= \; \I{SrcPkgKey} \; \verb|{| \; \I{Module} \; \verb|}|$?  Consider: +\textbf{Holes of a unit are a mapping, not a set.} Why can't the \I{UnitKey} just record a +set of \I{Module}s, e.g. $\I{UnitKey}\;::= \; \I{SrcUnitKey} \; \verb|{| \; \I{Module} \; \verb|}|$?  Consider:  \begin{verbatim} -    package p (A) requires (H1, H2) where +    unit p (A) requires (H1, H2) where          signature H1(T) where              data T          signature H2(T) where @@ -185,7 +326,7 @@ set of \I{Module}s, e.g. $\I{PkgKey}\;::= \; \I{SrcPkgKey} \; \verb|{| \; \I{Mod              import qualified H2              data A = A H1.T H2.T -    package q (A12, A21) where +    unit q (A12, A21) where          module I1(T) where              data T = T Int          module I2(T) where @@ -206,13 +347,13 @@ why not specify it as \verb|A -> { T, foo }|,  e.g., \verb|required: { ModName -> { OccName } }|?  Consider:  \begin{verbatim} -    package p () requires (A, B) where +    unit p () requires (A, B) where          signature A(T) where              data T          signature B(T) where              import T  \end{verbatim} -The requirements of this package specify that \verb|A.T| $=$ \verb|B.T|; this +The requirements of this unit specify that \verb|A.T| $=$ \verb|B.T|; this  can be expressed with \I{Name}s as  \begin{verbatim} @@ -229,7 +370,7 @@ are equal when their \I{Name}s are the same; however,  for values, it is less clear why we care.  But consider this example:  \begin{verbatim} -    package p (A) requires (H1, H2) where +    unit p (A) requires (H1, H2) where          signature H1(x) where              x :: Int          signature H2(x) where @@ -262,7 +403,7 @@ as might occur with aliasing:  The benefit of this restriction is that when a requirement is filled,  it is obvious that this is the only requirement that is filled: you won't  magically cause some other requirements to be filled.  The downside is -it's not possible to write a package which looks for an interface it is +it's not possible to write a unit which looks for an interface it is  looking for in one of $n$ names, accepting any name as an acceptable linkage.  If aliasing was allowed, we'd need a separate physical shaping context,  to make sure multiple mentions of the same hole were consistent. @@ -320,18 +461,18 @@ will actually fill a signature.  Consider this example, where  a signature is required and exposed:  \begin{verbatim} -    package a-sigs (A) requires (A) where -- *** +    unit a-sigs (A) requires (A) where -- ***          signature A where              data T -    package a-user (B) requires (A) where +    unit a-user (B) requires (A) where          signature A where              data T              x :: T          module B where              ... -    package p where +    unit p where          include a-sigs          include a-user  \end{verbatim} @@ -346,7 +487,7 @@ of \verb|a-sigs| and \verb|a-user|.  \begin{verbatim} -    package a-sigs (M as A) requires (H as A) where +    unit a-sigs (M as A) requires (H as A) where          signature H(T) where              data T          module M(T) where @@ -356,7 +497,7 @@ of \verb|a-sigs| and \verb|a-user|.  We rightly should error, since the provision is a module. And in this situation:  \begin{verbatim} -    package a-sigs (H as A) requires (H) where +    unit a-sigs (H as A) requires (H) where          signature H(T) where              data T  \end{verbatim} @@ -373,7 +514,7 @@ alternative to \verb|() requires (exposed A)|.  \newpage  \subsection{\texttt{include pkg (X) requires (Y)}} -We merge with the transformed shape of package \verb|pkg|, where this +We merge with the transformed shape of unit \verb|pkg|, where this  shape is transformed by:  \begin{itemize} @@ -389,27 +530,27 @@ If there are no thinnings/renamings, you just merge the  shape unchanged! Here is an example:  \begin{verbatim} -    package p (M) requires (H) where +    unit p (M) requires (H) where          signature H where              data T          module M where              import H              data S = S T -    package q (A) where +    unit q (A) where          module X where              data T = T          include p (M as A) requires (H as X)  \end{verbatim}  % -The shape of package \verb|p| is: +The shape of unit \verb|p| is:  \begin{verbatim}      requires: M -> { p(H -> HOLE:H):M.S }      provides: H -> { HOLE:H.T }  \end{verbatim}  % -Thus, when we process the \verb|include| in package \verb|q|, +Thus, when we process the \verb|include| in unit \verb|q|,  we make the following two changes: we rename the provisions,  and we rename the requirements, substituting \verb|HOLE|s.  The resulting shape to be merged in is: @@ -436,11 +577,11 @@ simple.  Merging combines two shapes, filling requirements with  implementations, unifying \I{Name}s, and unioning requirements; it is  the most complicated part of the shaping process. -The best way to think about merging is that we take two packages with +The best way to think about merging is that we take two units with  inputs (requirements) and outputs (provisions) and ``wiring'' them up so  that outputs feed into inputs.  In the absence  of mutual recursion, this wiring process is \emph{directed}: the provisions -of the first package feed into the requirements of the second package, +of the first unit feed into the requirements of the second unit,  but never vice versa.  (With mutual recursion, things can go in the opposite  direction as well.) @@ -479,17 +620,17 @@ pile of examples of it in action.  \subsubsection{A simple example} -In the following set of packages: +In the following set of units:  \begin{verbatim} -    package p(M) requires (A) where +    unit p(M) requires (A) where          signature A(T) where              data T          module M(T, S) where              import A(T)              data S = S T -    package q where +    unit q where          module A where              data T = T          include p @@ -575,33 +716,6 @@ they are also an instructive example for signature merging.  \Red{I'm pretty sure any choice of \textit{Name} is OK, since the  subsequent substitution will make it alpha-equivalent.} -%   \subsubsection{Leaky requirements} - -%   Both requirements and provisions can be imported, but requirements -%   are always available - -%\Red{How to relax this so hs-boot works} - -%\Red{Example of how loopy modules which rename requirements make it un-obvious whether or not -%a requirement is still required.  Same example works declaration level.} - -%\Red{package p (A) requires (A); the input output ports should be the same} - -% We figure out the requirements (because no loopy modules) -% -% package p (A, B) requires (B) -%   include base -%   sig B(T) -%       import Prelude(T) -% -% requirement example -% -% mental model: you start with an empty package, and you start accreting -% things on things, merging things together as you discover that this is -% the case. - -%\newpage -  \subsection{Export declarations}  If an explicit export declaration is given, the final shape is the @@ -610,7 +724,7 @@ appropriate renaming applied to provisions and requirements.  (Requirements  are implicitly passed through if they are not named.)  If no explicit export declaration is given, the final shape is  the computed shape, including only provisions which were defined -in the declarations of the package. +in the declarations of the unit.  \begin{aside}  \textbf{Signature visibility, and defaulting} @@ -626,11 +740,11 @@ for a signature.  If we use the same behavior as with modules,  a strange situation can occur:  \begin{verbatim} -    package p where -- S is visible +    unit p where -- S is visible          signature S where              x :: True -    package q where -- use defaulting +    unit q where -- use defaulting          include p          signature S where              y :: True @@ -638,7 +752,7 @@ a strange situation can occur:              import S              z = x && y      -- OK -    package r where +    unit r where          include q          module N where              import S @@ -656,12 +770,12 @@ the signature as ``in-line'', and all declarations are now provided  The latter interpretation avoids having to keep track of providedness  per declarations, and means that you can always express defaulting -behavior by writing an explicit provides declaration on the package. +behavior by writing an explicit provides declaration on the unit.  However, it has the odd behavior of making empty signatures semantically  meaningful:  \begin{verbatim} -package q where +unit q where      include p      signature S where  \end{verbatim} @@ -669,94 +783,12 @@ package q where  %  %   SPJ: This would be too complicated (if there's yet a third way) -\subsection{Package key} - -What is \verb|THIS|?  It is the package name, plus for every requirement \verb|M|, -a mapping \verb|M -> HOLE:M|.  Annoyingly, you don't know the full set of -requirements until the end of shaping, so you don't know the package key ahead of time; -however, it can be substituted at the end easily. -  \clearpage  \newpage -\section{Type constructor exports} - -In the previous section, we described the \I{Name}s of a -module as a flat namespace; but actually, there is one level of -hierarchy associated with type-constructors.  The type: - -\begin{verbatim} -    data A = B { foo :: Int } -\end{verbatim} -% -brings three \I{OccName}s into scope, \verb|A|, \verb|B| -and \verb|foo|, but the constructors and record selectors are -considered \emph{children} -of \verb|A|: in an import list, they can be implicitly brought -into scope with \verb|A(..)|, or individually brought into -scope with \verb|foo| or \verb|pattern B| (using the new \verb|PatternSynonyms| -extension).  Symmetrically, a module may export only \emph{some} -of the constructors/selectors of a type; it may not even -export the type itself! - -We \emph{absolutely} need this information to rename a module or -signature, which means that there is a little bit of extra information -we have to collect when shaping.  What is this information?  If we take -GHC's internal representation at face value, we have the more complex -semantic representation seen in Figure~\ref{fig:semantic2}: - -\begin{figure}[htpb] -$$ -\begin{array}{rcll} -\I{Shape} & ::= & \verb|provides:|\; m \; \verb|->|\; \I{Module}\; \verb|{|\, \I{AvailInfo} \verb|,|\, \ldots \, \verb|};| \ldots \\ -      &     & \verb|requires:| \; m \; \verb|->|\; \textcolor{white}{\I{Module}}\; \verb|{| \, \I{AvailInfo} \verb|,| \, \ldots \, \verb|}| \verb|;| \ldots \\ -\I{AvailInfo} & ::= & \I{Name} & \mbox{Plain identifiers} \\ -          & |   & \I{Name} \, \verb|{| \, \I{Name}_0\verb|,| \, \ldots\verb|,| \, \I{Name}_n \, \verb|}| & \mbox{Type constructors} \\ -\end{array} -$$ -\caption{Enriched semantic entities in Backpack} \label{fig:semantic2} -\end{figure} -% -For type constructors, the outer \I{Name} identifies the \emph{parent} -identifier, which may not necessarily be in scope (define this to be the \texttt{availName}); the inner list consists -of the children identifiers that are actually in scope.  If a wildcard -is written, all of the child identifiers are brought into scope. -In the following examples, we've ensured that -types and constructors are unambiguous, although in Haskell proper they -live in separate namespaces; we've also elided the \verb|THIS| package -key from the identifiers. +\subsection{Merging AvailInfos} -\begin{verbatim} -    module M(A(..)) where -        data A = B { foo :: Int } -    -- M.A{ M.A, M.B, M.foo } - -    module N(A) where -        data A = B { foo :: Int } -    -- N.A{ N.A } - -    module O(foo) where -        data A = B { foo :: Int } -    -- O.A{ O.foo } - -    module A where -        data T = S { bar :: Int } -    module B where -        data T = S { baz :: Bool } -    module C(bar, baz) where -        import A(bar) -        import B(baz) -    -- A.T{ A.bar }, B.T{ B.baz } -    -- NB: it would be illegal for the type constructors -    -- A.T and B.T to be both exported from C! -\end{verbatim} -% -Previously, we stated that we simply merged \I{Name}s based on their -\I{OccName}s.  We now must consider what it means to merge \I{AvailInfo}s. - -\subsection{Algorithm} - -Our merging algorithm takes two sets of \I{AvailInfo}s and merges them +We describe how to take two sets of \I{AvailInfo}s and merges them  into one set.  In the degenerate case where every \I{AvailInfo} is a  $Name$, this algorithm operates the same as the original algorithm.  Merging proceeds in two steps: unification and then simple union. @@ -784,8 +816,6 @@ takes the \I{AvailInfo} \verb|HOLE:A.T { HOLE:A.B, HOLE:A.foo }| to  on children \I{Name}s is \emph{only} carried out by substituting on the outer name;  we will never directly substitute children. -\subsection{Examples} -  Unfortunately, there are a number of tricky scenarios:  \paragraph{Merging when type constructors are not in scope} @@ -904,14 +934,14 @@ The technical difficulty is that we now need to unify a plain identifier  Consider this situation:  \begin{verbatim} -    package p where +    unit p where          signature H(A, foo, bar) where              data A              foo :: A -> Int              bar :: A -> Bool          module X(A, foo) where              import H -    package q where +    unit q where          include p          signature H(bar) where              data A = A { foo :: Int, bar :: Bool } @@ -926,7 +956,7 @@ information elsewhere.  If the wildcard is not allowed, here is another situation:  \begin{verbatim} -    package p where +    unit p where          -- define without record selectors          signature X1(A, foo) where              data A @@ -934,7 +964,7 @@ If the wildcard is not allowed, here is another situation:          module M1(A, foo) where              import X1 -    package q where +    unit q where          -- define with record selectors (X1s unify)          signature X1(A(..)) where              data A = A { foo :: Int, bar :: Bool } @@ -947,7 +977,7 @@ If the wildcard is not allowed, here is another situation:          signature Y2(bar) where              import X2 -    package r where +    unit r where          include p          include q @@ -1002,7 +1032,7 @@ $$  \end{figure}  In general terms, -type checking an indefinite package (a package with holes) involves +type checking an indefinite unit (a unit with holes) involves  calculating, for every module, a \I{ModIface} representing the  type/interface of the module in question (which is serialized  to disk).  The general form of these @@ -1014,7 +1044,7 @@ with the \I{Name}. (We will say more about this lookup process shortly.)  For example, given:  \begin{verbatim} -    package p where +    unit p where          signature H where              data T          module A(S, T) where @@ -1121,10 +1151,10 @@ declaration.)  associated with a \I{ModName} in scope, e.g. in this situation:  \begin{verbatim} -    package p where +    unit p where          signature S where              data A -    package q where +    unit q where          include p          signature S where              data B @@ -1143,13 +1173,13 @@ produce a warning for each deprecated signature.  extended Backpack to allow hiding signatures from import.  \begin{verbatim} -    package p requires (H) where -- H is hidden from import +    unit p requires (H) where -- H is hidden from import          module A where              instance Eq (a -> b) where -- orphan          signature H {-# DEPRECATED "Don't use me" #-} where              import A -    package q where +    unit q where          include p          signature H where              data T @@ -1180,7 +1210,7 @@ we have to type-check the \I{ModIface} with the following adjustments:            \verb|A -> HOLE:B|, \emph{unlike} the disjoint            substitutions applied by shaping.)      \item Perform a \I{Name} substitution as follows: for any name -          with a package key that is a $\verb|HOLE|$, +          with a unit key that is a $\verb|HOLE|$,            substitute with the recorded \I{Name} in the requirements of the shape.            Otherwise, look up the (unique) \I{ModIface} for the \I{Module},            and subsitute with the corresponding \I{Name} in the \I{mi\_exports}. @@ -1188,17 +1218,17 @@ we have to type-check the \I{ModIface} with the following adjustments:  \paragraph{Signatures}  For signatures, we have a \I{Module} of the form  $\verb|HOLE:|m$.  Unlike modules, there are multiple \I{ModIface}s associated with a hole. -We distinguish each separate \I{ModIface} by considering the full \I{PkgKey} +We distinguish each separate \I{ModIface} by considering the full \I{UnitKey}  it was defined in, e.g. \verb|p(A -> HOLE:C, B -> q():B)|; call this -the hole's \emph{defining package key}; the set of \I{ModIface}s for a hole -and their defining package keys can easily be calculated during shaping. +the hole's \emph{defining unit key}; the set of \I{ModIface}s for a hole +and their defining unit keys can easily be calculated during shaping.  To generate the \I{ModDetails} associated with a hole, we type-check each  \I{ModIface}, with the following adjustments:  \begin{enumerate}      \item Perform a \I{Module} substitution according to the instantiation -        of the defining package key.  (NB: This may rename the hole itself!) +        of the defining unit key.  (NB: This may rename the hole itself!)      \item Perform a \I{Name} substitution as follows, in the same manner          as would be done in the case of modules.      \item When these \I{ModDetails} are merged into the EPT, some merging @@ -1213,14 +1243,14 @@ sure that we can always find out the correct \I{Name} to substitute to.  This isn't obviously true, consider:  \begin{verbatim} -    package p where +    unit p where          signature S(foo) where              data T              foo :: T          module M(bar) where              import S              bar = foo -    package q where +    unit q where          module A(T(..)) where              data T = T              foo = T @@ -1261,13 +1291,13 @@ exported by the signature}.  \textbf{Special case export rule for record selectors.}  Here is the analogous case for  record selectors:  \begin{verbatim} -    package p where +    unit p where          signature S(foo) where              data T = T { foo :: Int }          module M(bar) where              import S              bar = foo -    package q where +    unit q where          module A(T(..)) where              data T = T { foo :: Int }          module S(foo) where @@ -1283,387 +1313,4 @@ for \verb|T|, because the export of \verb|foo| is an \I{AvailTC} which  does mention \verb|T|.  \end{aside} -\section{Cabal} - -Design goals: - -\begin{itemize} -    \item Backpack files are user-written.  (In an earlier design, we had -        the idea that Cabal would generate Backpack files; however, we've -        since made Backpack files more user-friendly and reasonable to -        write by hand since they are reasonably designed for user development.) - -    \item Backpack files are optional.  A package can add a Backpack file -        to replace some (but not all) of the fields in a Cabal description. - -    \item Backpack files can be compiled without GHC, if it is self-contained -        with respect to all the indefinite packages it includes.  To include -        an indefinite package which is not locally defined but installed -        to the package database, you must use Cabal. - -    \item Backpack packages are \emph{unversioned}; you never see a version -        number in a Backpack package. -\end{itemize} - -\subsection{Versioning} - -In this section, we discuss how version numbers from Cabal factor into -Backpack.  In particular, versioning impacts the specification of \I{PkgKey}s. -See \url{https://ghc.haskell.org/trac/ghc/wiki/Commentary/Packages/Concepts} -for more background, and \url{https://ghc.haskell.org/trac/ghc/ticket/10566} -for implementation progress. - -\paragraph{Design goals} -Here are some design goals for versioning: - -\begin{enumerate} -    \item GHC doesn't know anything about version numbers: this is Cabal -    specific information.  There are a few cases in GHC today where -    this design goal is already in force: pre-7.10, linker -    symbols were prefixed using a package name and version, but GHC -    simply represented this internally as an opaque string.  And in -    today's GHC, package qualified imports only allow qualification by -    package name, and not by version. - -    \item Cabal doesn't know anything about package keys: GHC is -    responsible for calculating the package key of a package.  This is -    because GHC must be able to maintain a mapping between the unhashed -    and hashed versions of a key, and the hashing process must be -    deterministic.  If Cabal needs to generate a new package key, it -    must do so through GHC.  (This is NOT how this is happening in GHC 7.10.) - -    \item Our design should, in principle, support mutual recursion -    between packages, even if the implementation does not at the moment. - -    \item GHC should not lose functionality, i.e. it should still be -    possible to link together the same package with different versions; -    however, Cabal may arrange for this to not occur by default unless a -    user explicitly asks for it. - -    \item A Cabal source package identifier (e.g. \verb|foo-0.1|), which is -    a unit of distribution, is a distinct -    concept from a Backpack package (which we have referred to previously -    in the document as a mere package name), because a single Cabal file may -    ship a Backpack file that defines multiple internal packages. -\end{enumerate} - -These goals imply a few things: - -\begin{enumerate} -    \item Backpack files should not contain any version numbers, -    and should be agnostic to versioning.  Backpack files are parsed -    and interpreted by GHC, and version numbers are Cabal's provenance! - -    \item As a corollary, if you want to refer to a specific version of -    a package from a Backpack file, this has to be done by giving the -    alternate version a different package name, e.g. \verb|network-old|. -    (It is tempting to want to simply say that this means we should allow -    version numbers into GHC, but consider more complicated situations where -    you want to refer to two instances of \verb|foo|, but one compiled -    with \verb|bar-0.1| and the other compiled with \verb|bar-0.2|, then -    your description of which package to pick up becomes considerably more -    complicated than just a package name and version.  Better to defer -    this decision to Cabal.) - -    \item Package keys must record versioning information, otherwise -    we can't link together two different versions of the same package. -    This is due to our backwards-compatibility requirement. -\end{enumerate} - -\paragraph{Package keys} - -To allow linking together multiple versions of -the same package, we must record versioning information into the -\I{PkgKey}.  To do this, we include in the \I{PkgKey} a \I{VersionHash}. -Cabal is responsible for defining \I{VersionHash} and may do whatever it -wants, but we give two possible -definitions in Figure~\ref{fig:version}. - -\begin{figure}[htpb] -$$ -\begin{array}{rcll} -p && \mbox{Package name} \\ -\I{SrcPkgId} && \mbox{Cabal source package ID, e.g. } \verb|foo-0.1| \\[1em] -\I{VersionHash} & ::= & \I{SrcPkgId}\; \verb|{| \, p_0 \; \verb|->| \; \I{VersionHash}_0 \verb|,|\, \ldots\, p_n \; \verb|->| \; \I{VersionHash}_n \, \verb|}| & \mbox{Full version hash} \\ -\I{VersionHash'} & ::= & \I{SrcPkgId} \; \verb|{| \, \I{SrcPkgId}_0 \verb|,|\, \ldots\, \verb|,|\, \I{SrcPkgId}_n \, \verb|}| & \mbox{Simplified version hash} \\ -\I{PkgKey} & ::= & p\verb|-|\I{VersionHash} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| \\ -\end{array} -$$ -\caption{Version hash} \label{fig:version} -\end{figure} - -The difference between a full version hash and a simplified version hash -is what linking restrictions they impose on programs: the full version -hash supports linking arbitrary versions of packages with arbitrary -other versions, whereas the simplified hash has a Cabal-style requirement -that there be some globally consistent mapping from package name to version. - -The full version hash has some subtleties: - -\begin{itemize} -    \item Each sub-\I{VersionHash} recorded in a \I{VersionHash} is -    identified by a package name, which may not necessarily equal the -    package name embedded in the \I{SrcPkgId} in the \I{VersionHash}.  This permits us to calculate -    a \I{VersionHash} for a package like: -\begin{verbatim} -    package p where -        include network (Network) -        include network-old (Network as Network.Old) -        ... -\end{verbatim} -    if we want \verb|network| to refer to \verb|network-2.0| and -    \verb|network-old| to refer to \verb|network-1.0|.  Without -    identifying each subdependency by package name, we could not -    distinguish the recorded \I{VersionHash}s for \verb|network-old| and \verb|network|. - -    \item If a package name is locally specified in a Backpack -    file, it does not occur in the \I{VersionHash}: \I{VersionHash} -    strictly operates over Cabal's notion of package identity. - -    \item You might wonder why we need a \I{VersionHash} as well as a \I{PkgKey}; -    why not just specify \I{PkgKey} as $\I{SrcPkgId} \; \verb|{| \, p \; \verb|->| \; \I{PkgKey} \verb|,|\, \ldots\, \verb|}| \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)|$?  However, there is ``too much'' information in the \I{PkgKey}, causing the scheme to not work with mutual recursion: - -\begin{verbatim} -    package p where -        module M -        include q -\end{verbatim} - -    To specify the package key of \verb|p|, we need the package key of \verb|q|; to -    specify the package key of \verb|q|, we need the module identifier of \verb|M| -    which contains the package key of \verb|p|: circularity!  (The simplified -    version hash does not have this problem as it is not recursive.) -\end{itemize} - -\subsection{Distribution and installation} - -How are Backpack files installed so other people can use them? - -\paragraph{Challenges} - -\begin{itemize} -    \item Prior to Backpack, when a Cabal package (e.g. unit of -    distribution) was compiled and installed would result in a single -    entry in the installed package database.  With Backpack, compiling a -    package could result in multiple entries in the installed package -    database: (1) for indefinite packages which were instantiated, and -    (2) when there are multiple packages in a Backpack file. - -    \item Relatedly, when we include an indefinite package, we may need -    to rebuild it with our specific dependencies.  This makes compiling -    a Backpack file much more similar to \verb|cabal-install| than to -    \verb|Cabal|; however, the dependency structure is something that -    only GHC can calculate. -\end{itemize} - -\paragraph{Why distribute Backpack files?} - -Backpack files offer a convenient mechanism of defining multiple packages -with inline syntax for modules.  Further syntax extensions could allow us -to give people a MixML style of programming in Haskell. - -A Backpack file is not a replacement for a Cabal file: -\verb|exposed-modules| and similar fields are not necessary but we still -need a \verb|build-depends| to provide version bounds (until Backpack -can also be used to handle version dependency.)  This makes it easy -for cabal-install to do its job. - -This means we distinguish a package name $p$ which occurs in a Backpack -file and a Cabal \I{SrcPkgId}: Cabal creates a mapping between these. -So to refer to an old version of a package, you would refer to it with -a different name $q$, and then tell Cabal about the version bound constraints -you want. - -\paragraph{Definite packages} - -Suppose we have written a Backpack file that looks like: - -\begin{verbatim} -    package helper where -        include base -        module P -    package mypackage where -        include containers -        include helper -        module Q -\end{verbatim} - -and have written a Cabal file for it intending to distribute it on -Hackage under the name \verb|mypackage-0.1|.  In the end, we will end -up with the following entries in our installed package database: - -\begin{verbatim} -    name: "mypackage" -    id: mypackage-1.0-IPID -    version: 1.0 -    key: XXX -    # e.g. mypackage-AAA {} -    version-hash: AAA -    # e.g. mypackage-1.0 { base -> base-4.7 , containers -> containers-0.5 } -    depends: mypackage$helper-1.0-IPID, base-4.7-IPID -    --- -    name: "mypackage$helper" -    version: 1.0 -    id: mypackage$helper-1.0-IPID -    key: YYY -    # e.g. helper-AAA {} -    version-hash: AAA -    depends: containers-0.5-IPID -\end{verbatim} -% -Things to note: - -\begin{enumerate} -    \item The package in the Backpack file with the same name as the Cabal -    package has special status: this is the package which is registered -    to the installed package database under the same name.  All other packages -    are \emph{qualified} under the Cabal package name, e.g. \verb|mypackage$helper|. - -    \item The version hash, as described previously, is computed once for all -    packages in the Backpack file, and the \verb|version| and \verb|version-hash| -    are the same across all of them. - -    \item The key varies between the packages, since the $p$ parameter is different -    in each one. - -    \item The installed package ID incorporates information about the package name. - -    \item Dependencies are only recorded directly \verb|include|d packages in a Backpack package.  (GHC has to communicate to Cabal what the includes of every subpackage are.) -\end{enumerate} -% -A more complex example with instantiated packages looks similar: - -\begin{verbatim} -    package helper where -        signature Data.Map -        module P -    package mypackage where -        include containers (Data.Map) -        include helper -        module Q -\end{verbatim} -% -however, now the instantiation is recorded in the database as well. - -\begin{verbatim} -    name: "mypackage" -    id: mypackage-1.0-IPID -    version: 1.0 -    key: XXX -    # e.g. mypackage-AAA {} -    version-hash: AAA -    # e.g. mypackage-1.0 { containers -> containers-0.5 } -    depends: mypackage$helper-1.0-IPID, containers-0.5-IPID -    --- -    name: "mypackage$helper" -    version: 1.0 -    id: mypackage$helper-1.0-IPID -    key: YYY -    # e.g. helper-AAA { Data.Map -> containers-KEY:Data.Map } -    version-hash: AAA -    depends: (none) -    instantiated-with: -        Data.Map -> Data.Map@containers-0.5-IPID -\end{verbatim} -% -More remarks: - -\begin{enumerate} -    \item Cabal's recorded \verb|instantiated-with| records installed -    package IDs, so that the used implementation is uniquely determined. - -    \item Conversely, \verb|depends| does NOT record non-textual dependencies -    such as instantiated holes.  \Red{is this necessary} - -    \item IPID includes information about how holes were instantiated. -\end{enumerate} - -\paragraph{GHC to Cabal} - -When GHC compiles a Backpack file, it is the only entity which knows -about the subpackages of a package.  In order to make sure they are -all correctly installed, GHC has to communicate back some meta-data to -Cabal: for each package, - -\begin{itemize} -    \item The (computed) package keys -    \item The dependencies -    \item The instantiation -\end{itemize} - -I guess we have to define some format to do this.  GHC can't directly -write to the package database, because it doesn't know how to write in -the Cabal-specific portion of the information. - -\Red{This is clunky, is there a way to eliminate this?  It's not possible -for Cabal out of the box to handle this, since it assumes no module name conflicts -but there definitely may be some in Backpack.} - -\paragraph{Indefinite package database} - -The indefinite package database records indefinite packages (with holes) -that have been typechecked.  An indefinite package is associated with a -(possibly unlimited) number of instantiated versions of the package, -which have been fully instantiated and compiled. - -An indefinite package is a new type of entry in the existing installed package -database. \Red{or maybe another entry in a different database}  Here are the important things to keep track of for an -indefinite package: - -\begin{itemize} -    \item Where do the (indefinite) interface files live? (NB: there are no -    libraries since we haven't compiled the package.) -    \item Where does the shape information live?  (We could put it with the -    interface files, it's a pretty similar binary file.) -    \item Where does the source live, so we can recompile it when we instantiate it. -    (If it's empty, we'll have to refetch it from Hackage or something). -    \item Where does the Cabal configuration (result of running -    \verb|cabal configure|) live, so that we build it with the same dependencies, flags, etc. -\end{itemize} - -Associated with an indefinite package is some number of instantiated versions -of this package.  These are identified by package key (the installed package ID -is the same) and are morally ``sub''-packages of the indefinite package, -although they get their own entries.  \Red{Alternate plan: put them together. -Distinction between Cabal package and Backpack package.} - -What makes installed indefinite packages difficult is that GHC may need to -recompile them on the fly depending on an include. - -\paragraph{The plan} - -\Red{To be worked out} - -%   Description: cabal-install only computes package-name edge labeling, -%   then attempts to compile.  If the package is indefinite, Cabal -%   type checks and installs the interface files, source code and -%   configuration information (TODO: this is something GHC has -%   to understand\ldots) to the package database.  If the package -%   is definite, Cabal goes and ahead and builds it.  During compilation, -%   when processing an include GHC may notice that a package depends on an -%   instantiation of an indefinite package that is not compiled; GHC -%   goes ahead and builds it using the saved information. - -%   Con: We need to install indefinite packages, including all of -%   the source and information we'd need to actually build it -%   (the result of a configure?  Only Cabal really knows how -%   to understand that; so it should be like a Cabal configured -%   package?  If GHC calls in that's annoying.)  It would be nice -%   if this was done cabal-install style, but there are many downside -%   to deferring all of this processing to cabal-install. - - -%   Model: GHC compiles everything itself -%       GHC needs to report multiple distinct compile products to Cabal -%       GHC needs to ``reset'' the EPS (but only for type checking) - -%   Model: Cabal pre-compiles dependencies, and then GHC handles the rest -%       Trouble: Cabal needs to be able to read the bkp file to find out what the instantiation is -%           Fix: Have a GHC mode to output this information. Also, if Cabal is doing an old style it already knows. -%       Trouble: seems wrong for normal Cabal to isntall it -%           Think about it like a CACHE - - - -  \end{document} % chktex 16 | 
