Ticket #1886 (closed bug: fixed)

Opened 2 years ago

Last modified 11 months ago

GHC API should preserve and provide access to comments

Reported by: claus Owned by:
Component: GHC API Version: 6.9
Keywords: GHC API, comments, program transformation, layout Cc: j.waldmann
Operating System: Unknown/Multiple
Test Case: Architecture: Unknown/Multiple
Type of failure:

Description

one class of applications of the GHC API are program transformations (refactoring, source to source optimisation, partial evaluation, ..) and code layouters (pretty-print, 2html, syntax-colouring, ..). but, even ignoring layout, parsing and pretty-printing with the GHC API does not currently preserve the source (nor does it generate syntactically valid code..).

consider this simple test: we want to parse a module, then pretty-print it (we might want to adjust the layout, or switch between layout and explicit braces). applying the attached code to itself gives this result:

$ /cygdrive/c/fptools/ghc/compiler/stage2/ghc-inplace -package ghc -e main API_Layout.hs
module API where
import DynFlags
import GHC
import PprTyThing
import System.Process
import System.IO
import Outputable
import Data.Maybe
instance Num () where
    []
    []
    { fromInteger = undefined }
mode = CompManager
compileToCoreFlag = False
writer >| cmd = runInteractiveCommand cmd >>= \ (i, o, e, p) -> writer i
cmd |> reader = runInteractiveCommand cmd >>= \ (i, o, e, p) -> reader o
ghcDir = "c:/fptools/ghc/compiler/stage2/ghc-inplace --print-libdir"
       |>
         (fmap dropLineEnds . hGetContents)
       where
           dropLineEnds = filter (not . (`elem` "\r\n"))
main = defaultErrorHandler defaultDynFlags
     $ do s <- newSession . Just =<< ghcDir
          flags <- getSessionDynFlags s
          (flags, _) <- parseDynamicFlags flags ["-package ghc"]
            GHC.defaultCleanupHandler flags
          $ do setSessionDynFlags s (flags {hscTarget = HscInterpreted})
                 addTarget s =<< guessTarget "API_Layout.hs" Nothing
               load s LoadAllTargets
               prelude <- findModule s (mkModuleName "Prelude") Nothing
               usermod <- findModule s (mkModuleName "API") Nothing
               setContext s [usermod] [prelude]
               Just cm <- checkModule s (mkModuleName "API") compileToCoreFlag
               unqual <- getPrintUnqual s
                   printForUser stdout unqual $ ppr $ parsedSource cm

this has lost all comments, including pragmas, and is syntactically invalid!

one suggestion, to avoid upsetting the rest of ghc, would be to preserve the comments, with source locations, but to separate them from the main abstract syntax tree. there would also need to be a way to link ast fragments to comments, which might be slightly awkward. perhaps something like:

-- was there a comment just preceeding the current AST fragment?
commentsBefore :: AST -> Maybe String
-- was there a comment immediately following the current AST fragment?
commentsAfter :: AST -> Maybe String

Attachments

API_Layout.hs Download (1.6 KB) - added by claus 2 years ago.
a module parsing and pretty-printing itself via the GHC API

Change History

Changed 2 years ago by claus

a module parsing and pretty-printing itself via the GHC API

Changed 2 years ago by claus

  • keywords GHC API, comments, program transformation, layout added

i forgot one important example of program transformations that would also need layout preservation: version updates to follow library api changes. i think someone once started a business with this kind of thing?-)

related ticket: #1467 (api reorganisation of stages)

Changed 2 years ago by igloo

  • difficulty set to Unknown
  • milestone set to 6.10 branch

Changed 2 years ago by claus

see also this thread on cvs-ghc, messages before and after this one:

 should haddock.ghc be a sub-repo of ghc?

Changed 2 years ago by j.waldmann

  • cc j.waldmann added

Changed 22 months ago by claus

see also this thread for a simpler breakdown of what is needed, and how it might be achieved:

 http://www.haskell.org/pipermail/haskell-cafe/2008-May/042671.html

Changed 19 months ago by Jedai

My proposal is to support access to a special kind of token stream including comments. As the tokens themselves aren't enough to get back to the source that produced them (some aesthetic details disappear), I also create a function to add source strings to the tokens in a stream and a function to show such a "rich" token stream. HaRe? use the following model : get the AST and the token stream >>> modify AST &&& propagate changes to token stream >>> second (pretty print the token stream).

While this model may not be as convenient as we could hope for, it works and the guts of this process could eventually become a package on Hackage, separate from HaRe?.

Changed 18 months ago by simonmar

  • architecture changed from Unknown to Unknown/Multiple

Changed 18 months ago by simonmar

  • os changed from Unknown to Unknown/Multiple

Changed 11 months ago by igloo

  • milestone changed from 6.10 branch to 6.12 branch

Changed 11 months ago by simonmar

  • status changed from new to closed
  • resolution set to fixed

We now have

getRichTokenStream :: GhcMonad m => Module -> m [(Located Token, String)]
showRichTokenStream :: [(Located Token, String)] -> String

amongst other thing, thanks to Jedai. If this isn't enough, please re-open.

Note: See TracTickets for help on using tickets.