diff options
Diffstat (limited to 'compiler')
-rw-r--r-- | compiler/parser/ApiAnnotation.hs | 114 | ||||
-rw-r--r-- | compiler/parser/Lexer.x | 15 | ||||
-rw-r--r-- | compiler/parser/Parser.y | 14 |
3 files changed, 88 insertions, 55 deletions
diff --git a/compiler/parser/ApiAnnotation.hs b/compiler/parser/ApiAnnotation.hs index ac784bcea4..b20f23f066 100644 --- a/compiler/parser/ApiAnnotation.hs +++ b/compiler/parser/ApiAnnotation.hs @@ -23,21 +23,40 @@ import Data.Data {- Note [Api annotations] ~~~~~~~~~~~~~~~~~~~~~~ -In order to do source to source conversions using the GHC API, the -locations of all elements of the original source needs to be tracked. -This includes keywords such as 'let' / 'in' / 'do' etc as well as -punctuation such as commas and braces, and also comments. +Given a parse tree of a Haskell module, how can we reconstruct +the original Haskell source code, retaining all whitespace and +source code comments? We need to track the locations of all +elements from the original source: this includes keywords such as +'let' / 'in' / 'do' etc as well as punctuation such as commas and +braces, and also comments. We collectively refer to this +metadata as the "API annotations". -These are captured in a structure separate from the parse tree, and -returned in the pm_annotations field of the ParsedModule type. +Rather than annotate the resulting parse tree with these locations +directly (this would be a major change to some fairly core data +structures in GHC), we instead capture locations for these elements in a +structure separate from the parse tree, and returned in the +pm_annotations field of the ParsedModule type. -The non-comment annotations are stored indexed to the SrcSpan of the -AST element containing them, together with a AnnKeywordId value -identifying the specific keyword being captured. +The full ApiAnns type is + +> type ApiAnns = ( Map.Map ApiAnnKey [SrcSpan] -- non-comments +> , Map.Map SrcSpan [Located AnnotationComment]) -- comments + +NON-COMMENT ELEMENTS + +Intuitively, every AST element directly contains a bag of keywords +(keywords can show up more than once in a node: a semicolon i.e. newline +can show up multiple times before the next AST element), each of which +needs to be associated with its location in the original source code. + +Consequently, the structure that records non-comment elements is logically +a two level map, from the SrcSpan of the AST element containing it, to +a map from keywords ('AnnKeyWord') to all locations of the keyword directly +in the AST element: > type ApiAnnKey = (SrcSpan,AnnKeywordId) > -> Map.Map ApiAnnKey SrcSpan +> Map.Map ApiAnnKey [SrcSpan] So @@ -50,35 +69,44 @@ would result in the AST element and the annotations (span,AnnLet) having the location of the 'let' keyword + (span,AnnEqual) having the location of the '=' sign (span,AnnIn) having the location of the 'in' keyword +For any given element in the AST, there is only a set number of +keywords that are applicable for it (e.g., you'll never see an +'import' keyword associated with a let-binding.) The set of allowed +keywords is documented in a comment associated with the constructor +of a given AST element, although the ground truth is in Parser +and RdrHsSyn (which actually add the annotations; see #13012). -The comments are indexed to the SrcSpan of the lowest AST element -enclosing them - -> Map.Map SrcSpan [Located AnnotationComment] - -So the full ApiAnns type is - -> type ApiAnns = ( Map.Map ApiAnnKey SrcSpan -> , Map.Map SrcSpan [Located AnnotationComment]) +COMMENT ELEMENTS +Every comment is associated with a *located* AnnotationComment. +We associate comments with the lowest (most specific) AST element +enclosing them: -This is done in the lexer / parser as follows. +> Map.Map SrcSpan [Located AnnotationComment] +PARSER STATE -The PState variable in the lexer has the following variables added +There are three fields in PState (the parser state) which play a role +with annotations. > annotations :: [(ApiAnnKey,[SrcSpan])], > comment_q :: [Located AnnotationComment], > annotations_comments :: [(SrcSpan,[Located AnnotationComment])] -The first and last store the values that end up in the ApiAnns value -at the end via Map.fromList +The 'annotations' and 'annotations_comments' fields are simple: they simply +accumulate annotations that will end up in 'ApiAnns' at the end +(after they are passed to Map.fromList). -The comment_q captures comments as they are seen in the token stream, +The 'comment_q' field captures comments as they are seen in the token stream, so that when they are ready to be allocated via the parser they are -available. +available (at the time we lex a comment, we don't know what the enclosing +AST node of it is, so we can't associate it with a SrcSpan in +annotations_comments). + +PARSER EMISSION OF ANNOTATIONS The parser interacts with the lexer using the function @@ -88,35 +116,11 @@ which takes the AST element SrcSpan, the annotation keyword and the target SrcSpan. This adds the annotation to the `annotations` field of `PState` and -transfers any comments in `comment_q` to the `annotations_comments` -field. - -Parser ------- - -The parser implements a number of helper types and methods for the -capture of annotations - -> type AddAnn = (SrcSpan -> P ()) -> -> mj :: AnnKeywordId -> Located e -> (SrcSpan -> P ()) -> mj a l = (\s -> addAnnotation s a (gl l)) - -AddAnn represents the addition of an annotation a to a provided -SrcSpan, and `mj` constructs an AddAnn value. - -> ams :: Located a -> [AddAnn] -> P (Located a) -> ams a@(L l _) bs = (mapM_ (\a -> a l) bs) >> return a - -So the production in Parser.y for the HsLet AST element is - - | 'let' binds 'in' exp {% ams (sLL $1 $> $ HsLet (snd $ unLoc $2) $4) - (mj AnnLet $1:mj AnnIn $3 - :(fst $ unLoc $2)) } - -This adds an AnnLet annotation for 'let', an AnnIn for 'in', as well -as any annotations that may arise in the binds. This will include open -and closing braces if they are used to delimit the let expressions. +transfers any comments in `comment_q` WHICH ARE ENCLOSED by +the SrcSpan of this element to the `annotations_comments` +field. (Comments which are outside of this annotation are deferred +until later. 'allocateComments' in 'Lexer' is responsible for +making sure we only attach comments that actually fit in the 'SrcSpan'.) The wiki page describing this feature is https://ghc.haskell.org/trac/ghc/wiki/ApiAnnotations @@ -124,9 +128,11 @@ https://ghc.haskell.org/trac/ghc/wiki/ApiAnnotations -} -- --------------------------------------------------------------------- +-- If you update this, update the Note [Api annotations] above type ApiAnns = ( Map.Map ApiAnnKey [SrcSpan] , Map.Map SrcSpan [Located AnnotationComment]) +-- If you update this, update the Note [Api annotations] above type ApiAnnKey = (SrcSpan,AnnKeywordId) diff --git a/compiler/parser/Lexer.x b/compiler/parser/Lexer.x index 14a7cb2ffa..6c4abe047a 100644 --- a/compiler/parser/Lexer.x +++ b/compiler/parser/Lexer.x @@ -2797,6 +2797,21 @@ clean_pragma prag = canon_ws (map toLower (unprefix prag)) -- | Encapsulated call to addAnnotation, requiring only the SrcSpan of -- the AST construct the annotation belongs to; together with the -- AnnKeywordId, this is is the key of the annotation map +-- +-- This type is useful for places in the parser where it is not yet +-- known what SrcSpan an annotation should be added to. The most +-- common situation is when we are parsing a list: the annotations +-- need to be associated with the AST element that *contains* the +-- list, not the list itself. 'AddAnn' lets us defer adding the +-- annotations until we finish parsing the list and are now parsing +-- the enclosing element; we then apply the 'AddAnn' to associate +-- the annotations. Another common situation is where a common fragment of +-- the AST has been factored out but there is no separate AST node for +-- this fragment (this occurs in class and data declarations). In this +-- case, the annotation belongs to the parent data declaration. +-- +-- The usual way an 'AddAnn' is created is using the 'mj' ("make jump") +-- function, and then it can be discharged using the 'ams' function. type AddAnn = SrcSpan -> P () addAnnotation :: SrcSpan -- SrcSpan of enclosing AST construct diff --git a/compiler/parser/Parser.y b/compiler/parser/Parser.y index 9fe8e01998..112c4a9b44 100644 --- a/compiler/parser/Parser.y +++ b/compiler/parser/Parser.y @@ -3555,7 +3555,19 @@ am a (b,s) = do addAnnotation l b (gl s) return av --- |Add a list of AddAnns to the given AST element +-- | Add a list of AddAnns to the given AST element. For example, +-- the parsing rule for @let@ looks like: +-- +-- @ +-- | 'let' binds 'in' exp {% ams (sLL $1 $> $ HsLet (snd $ unLoc $2) $4) +-- (mj AnnLet $1:mj AnnIn $3 +-- :(fst $ unLoc $2)) } +-- @ +-- +-- This adds an AnnLet annotation for @let@, an AnnIn for @in@, as well +-- as any annotations that may arise in the binds. This will include open +-- and closing braces if they are used to delimit the let expressions. +-- ams :: Located a -> [AddAnn] -> P (Located a) ams a@(L l _) bs = addAnnsAt l bs >> return a |