Scrap Your Reprinter: Example
=============================

Reprinting takes a source file and its (possible transformed) AST and
"stitches" them together into a new source file. This library provides
a generic reprinting algorithm that works on any AST with some modest
requirements. Where any changes to the AST have been made the
reprinting algorithm can be parameterised to hook into
application-specific functionality for handling nodes in the AST that
have been marked as transformed (e.g., applying a pretty printer to
these parts).

This module gives an introduction to library usage. For a better view
of the library itself, [the 2017
paper](https://www.cs.kent.ac.uk/people/staff/dao7/publ/reprinter2017.pdf)
goes over implementation in depth. (This module is adapted from
Section 3.4.)

We demonstrate the library on a limited integer expression language (reused for
the library tests). This is a literate Haskell/Markdown file, so feel free to
follow along in GHCi or your favourite text viewer.

\begin{code}
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE FlexibleInstances  #-}

module Text.Reprinter.Example where

import Text.Reprinter
import Control.Monad.State      -- for later example
import Data.Char                -- for parsing
\end{code}

Introduction
------------
*(Section 1 of the 2017 paper covers this in better detail.)*

A compiler translates source code to a target language. Sometimes when writing
language tools, you may find yourself writing a compiler where the source and
target languages are the same; automated code refactoring tools in IDEs
provide a common set of examples. Such tools must be careful not to remove
*secondary notation* like whitespace and comments. This, in short, can be a
pain to do well.

The reprinter library allows you to write a reprinter for any algebraic data
type supporting a minimal interface the algorithm needs to track changes.

This module designs a whitespace-flexible language with comments, and uses the
library to allow reprinting that preserves such secondary notation.

Language definition
-------------------
Let's take a language targeting integer addition, plus variable assignments. Our
top-level type will be an SSA-like list of *variable declaration-assignments*:

\begin{code}
type AST a = [Decl a]
data Decl a = Decl a Span String (Expr a)
    deriving (Eq, Data, Typeable, Show)
\end{code}

A `Decl a span var expr` represents the assignment of the value of an
expression `expr` to a variable `var`. The AST is composed of a sequence
(list) of these `Decl`s.

Expressions are formed of variables, literals, and additions over expressions:

\begin{code}
data Expr a
  = Plus a Span (Expr a) (Expr a)
  | Var a Span String
  | Const a Span Int
    deriving (Eq, Data, Typeable, Show)
\end{code}

For our reprinting algorithm, every refactorable node in the AST must
store position information (`Span`, i.e., the start and end point of
this piece of syntax in the source code text) and whether it's been
refactored (and thus needs reprinting). In this case, we've
parameterised our AST over an arbitrary type `a`, which we specialise
in the rest of this file to `Bool` to represent whether this node has
been refactored or not. In a more complex AST, you could add this as a
field to an existing node annotation record type.

Note that the algorithm requires ASTs to have `Data` and `Typeable` instances.
Deriving these automatically requires the `DeriveDataTypeable` language pragma.

*(Section 1.1 in the 2017 paper gives an illustrated step-by-step example of a
transformation and reprint.)*

Concrete syntax and goals
-------------------------
Let's digress for a while to discuss our language's concrete syntax, since
reprinting uses abstract and concrete syntax simultaneously. Our language is
going to look something like this:

\begin{code}
exBasic :: String
exBasic = "x = +(0,1)\n"
\end{code}

We permit arbitrary spacing for prettier code, like so:

\begin{code}
exPrettier :: String
exPrettier = unlines
  [ "var = +(0  , 1)"
  , "x   = +(var, 2)"
  ]
\end{code}

And lines can be empty, or comments:

\begin{code}
exComment :: String
exComment = unlines
  [ "// slightly superfluous variable"
  , "zero = 0"
  , ""
  , "// somewhat useful variable"
  , "x = +(zero, 1)"
  ]
\end{code}

Knowing all this, our aim is to take a formatted program source, parse it, apply
a transformation to the AST, then reprint the program while keeping the original
formatting. Starting with the given source (taken from the 2017 paper)

\begin{code}
exPaper :: String
exPaper = unlines
  [ "x = +(1,2)"
  , "y  =  +(x,0)"
  , "// Calculate z"
  , "z  =  +( 1,  +(+(0,x) ,y) )"
  ]
\end{code}

We'll produce the following refactored and reprinted output:

    > putStr exPaper
    x = +(1,2)
    y  =  +(x,0)
    // Calculate z
    z  =  +( 1,  +(+(0,x) ,y) )
    > (putStr . refactor) exPaper
    x = +(1,2)
    y  =  x
    // Calculate z
    z  =  +( 1,  +(x ,y) )

Writing a transformation
------------------------
Putting concrete syntax aside, let's write a transformation for our AST - a
refactoring. A nice obvious one is replacing `x+0` (and `0+x`) expressions with
just `x`.

\begin{code}
refactorZero :: AST Bool -> AST Bool
refactorZero = map $ \(Decl a s n e) -> Decl a s n (go e)
  where
    go (Plus _ s e (Const _ _ 0)) = markRefactored (go e) s
    go (Plus _ s (Const _ _ 0) e) = markRefactored (go e) s
    go (Plus a s e1 e2) = Plus a s (go e1) (go e2)
    go e = e

    markRefactored (Plus _ _ e1 e2) s = Plus True s e1 e2
    markRefactored (Var _ _ n) s      = Var True s n
    markRefactored (Const _ _ i) s    = Const True s i
\end{code}

Note that when marking nodes as refactored (`markRefactored`), we
replace the `Span` of the refactored node with the span of the
original `x+0` node- this allows the reprinting algorithm to replace
the original part of the source code with the new refactored node.

In concrete syntax, we're making changes like:

    + ( x , 0 )    becomes
    x

See how `x` is pulled out. The `+(x,y)` expression is directly replaced with
`x`, so we make sure to use the original span. Any comments following the
expression will be `shifted' - *not* removed, because the reprinter only makes
changes when a node in the AST indicates it has been refactored. Parts of a
source file that aren't captured in the AST will be printed with no changes.

Reprinter plumbing
------------------
We have an AST and a transformation on it. Next, we need to tell the reprinter
how to use our AST.

\begin{code}
-- FlexibleInstances used to define this without a wrapper
instance Refactorable (Expr Bool) where
  isRefactored (Plus True _ _ _) = Just Replace
  isRefactored (Var True _ _)    = Just Replace
  isRefactored (Const True _ _)  = Just Replace
  isRefactored _                 = Nothing

  getSpan (Plus _ s _ _) = s
  getSpan (Var _ s _)    = s
  getSpan (Const _ s _)  = s
\end{code}

Your AST's `Refactorable` instances will depend on how you store annotations in
your tree. Likely you store refactoring information inside a larger record type.
Perhaps you disallow refactoring at the type level for certain nodes. In this
case, we're only writing an instance for `Expr`s, because we don't reprint
`Decl`s directly. (If we wrote a variable renaming transformation, then it would
be needed.)

We're almost there. Next we define a generic query over the AST that determines
what we do for each node in the AST. This reprinting is straightforward:

  * If an `Expr` is marked as refactored, replace it with the updated `Expr`
    pretty-printed (AST -> concrete syntax)
  * Else skip (if the node was a `Decl`, or an unrefactored `Expr`)

Reprintings of this type can be generated with `genReprinting`. A later example
writes the reprinting directly to annotate all nodes of a certain type. For now,
let's code that reprinting and the required pretty printer:

\begin{code}
-- See the 2017 paper and SYB documentation for more info on 'extQ' queries.
exprReprinter :: Reprinting String Identity
exprReprinter = catchAll `extQ` reprintExpr
  where
    reprintExpr x = genReprinting (return . prettyExpr) (x :: Expr Bool)

-- | Print an expression in canonical string form.
prettyExpr :: Expr a -> String
prettyExpr (Plus _ _ e1 e2) = "+(" <> prettyExpr e1 <> ", " <> prettyExpr e2 <> ")"
prettyExpr (Var _ _ n)      = n
prettyExpr (Const _ _ n)    = show n

-- Note that we don't define a pretty printer for declarations, as we're not
-- refactoring on that level, so won't ever reprint them.
\end{code}

`catchAll \`extQ\` reprintExpr` essentially says "try casting my argument to
use in `reprintExpr`, else default to `catchAll`" where `catchAll` always
returns `Nothing` (meaning no refactoring/don't reprint). See the 2017 paper and
Scrap Your Boilerplate (SYB) materials for more details.

Finally, we put together a function that parses, runs our refactoring, then
reprints.

\begin{code}
-- | Parse and refactor, then run the reprinter with the original source and
--   updated AST.
refactor :: String -> String
refactor s =
      runIdentity
    . flip (reprint exprReprinter) s
    . refactorZero
    . parse $ s

\end{code}

Further example: reprinting `After`
-----------------------------------
Using a monadic reprinter, we can write more complex reprintings. This example
from the 2017 paper annotates every variable declaration with its value.
Declarations are evaluated in order, building up a variable-value association
list. The list is stored in the `State` monad, which is passed along through the
reprinting.

\begin{code}
commentPrinter :: Reprinting String (State [(String, Int)])
commentPrinter = catchAll `extQ` decl
  where
    decl (Decl _ s v e) = do
      val <- eval (e :: Expr Bool)
      case val of
        Nothing -> return $ Nothing -- declaration expression referenced a
                                    -- variable before assignment: no annotation
        Just val -> do
          modify ((v,val) :)    -- add mapping to environment
          let msg = " // " <> v <> " = " <> show val
          return $ Just (After, msg, s)

eval :: Expr a -> State [(String, Int)] (Maybe Int)
eval (Plus _ _ e1 e2) = do
  e1' <- eval e1
  e2' <- eval e2
  return $ (+) <$> e1' <*> e2'
eval (Const _ _ i) = return $ Just i
eval (Var _ _ s) = get >>= return . lookup s

refactorComment :: String -> String
refactorComment input =
      flip evalState []
    . flip (reprint commentPrinter) input
    . parse $ input
\end{code}

Unscrapped boilerplate: parser for example language
---------------------------------------------------
The remainder of this module defines a simple monadic parser for the language.
It attempts to generate a position-tagged AST from a `String`.

\begin{code}
parse :: String -> AST Bool
parse s = evalState parseDecl (s, initPosition)

type Parser = State (String, Position)

parseDecl :: Parser (AST Bool)
parseDecl = do
   (xs, p1) <- get
   case xs of
       [] -> return []
       ('\n':xs) -> do
         put (xs, advanceLine p1)
         parseDecl
       _ -> do
         case commentPrefix xs of
           Just (comment, rest) -> do
             put (rest, p1)
             parseDecl
           Nothing -> do
             name <- many isAlpha
             spaces
             char '='
             spaces
             expr <- parseExpr
             p2 <- getPos
             char '\n'
             (xs, p') <- get
             put (xs, advanceLine p')
             rest <- parseDecl
             return $ Decl False (p1, p2) name expr : rest

commentPrefix :: String -> Maybe (String, String)
commentPrefix [] = Nothing
commentPrefix (' ':xs) = commentPrefix xs
commentPrefix ('/':'/':xs) = Just $ break (== '\n') xs
commentPrefix _ = Nothing

parseExpr :: Parser (Expr Bool)
parseExpr = do
    p1 <- getPos
    isPlus <- charP '+'
    if isPlus then do
      char '('
      spaces
      n <- parseExpr
      spaces
      charP ','
      spaces
      m <- parseExpr
      spaces
      char ')'
      p2 <- getPos
      return $ Plus False (p1, p2) n m
    else do
       isVar <- peekChar isAlpha
       if isVar then do
           name <- many isAlpha
           p2 <- getPos
           return $ Var False (p1, p2) name
       else do
           num <- many isDigit
           p2 <- getPos
           return $ Const False (p1, p2) $ read num

-- Some monadic parser helpers (standard)

getPos :: Parser Position
getPos = do
   (_, p) <- get
   return p

many :: (Char -> Bool) -> Parser String
many p = do
    (xs, pos) <- get
    case xs of
      (x:xs) | p x -> do
          put (xs, advanceCol pos)
          ys <- many p
          return $ x : ys
      _ -> return ""

spaces = many (==' ')

char :: Char -> Parser ()
char c = do
    (xs, pos) <- get
    case xs of
       (x:xs') -> if x == c
                then do
                  put (xs', advanceCol pos)
                  return ()
                else error $ "Expecting " ++ [c] ++ " but got " ++ [x]
       [] -> error $ "Expecting " ++ [c] ++ " but got empty"

charP :: Char -> Parser Bool
charP c =  do
    (xs, pos) <- get
    case xs of
       (x:xs') -> if x == c
                then do
                   put (xs', advanceCol pos)
                   return True
                else return False
       [] -> error $ "Expecting " ++ (c : " but got empty")

peekChar :: (Char -> Bool) -> Parser Bool
peekChar p =  do
    (xs, pos) <- get
    case xs of
       (x:_) -> if p x
                then return True
                else return False
\end{code}