# S-expresso S-expresso is a Haskell library designed to help you parse and print data or source code encoded as an S-expression. It provides a very flexible parser and (for now) a flat printer. # What is an S-expression Basically, an S-expression is a special form of tree structured data. An S-expression object is either an atom or a list of atoms and other S-expressions. This datatype is the definition of an S-expression for S-expresso. ~~~haskell data SExpr b a = SList b [SExpr b a] | SAtom a ~~~ The parameter `a` allows you to specify the datatype of atoms and the parameter `b` is usefull for keeping metadata about S-expression like source position for example. `SExpr` is not equivalent to `[a]` because the later cannot distinguish between an atom `(SAtom _)` and a tree containing only one atom `(SList _ [SAtom _])`. `SExpr` is also not equivalent to `Tree a` from `Data.Tree` because the later cannot encode the empty tree `(SList _ [])` and does not enforce that atoms are at the leaves. ## The Sexp type If you are only interested by the atoms, you can use the type alias `Sexp` that is a variant of the more general 'SExpr' data type with no data for the 'SList' constructor. ~~~haskell type Sexp a = SExpr () a ~~~ This type also comes with a bidirectional pattern synonym also named `Sexp` for object of the form `SExpr () _`. ~~~ x = Sexp [A 3] <-> x = SList () [SAtom 3] foo (Sexp xs) <-> foo (SList () xs) foo (Sexp (Sexp ys : A x : xs)) <-> foo (SList () (SList () ys : SAtom x : xs)) ~~~ ## Pattern synonyms S-expresso defines four pattern synonyms to ease your programming with `SExpr`. The patterns `L` helps you match the `SList` constructor and only its sublist, disregarding the `b` field. The pattern `:::` and `Nil` helps you specify the shape of the sublist of an `SList` constructor and finally the pattern `A` is a shorthand for `SAtom`. Together they make working with `SExpr` a little easier. ~~~ a = A 3 <-> a = SAtom 3 foo (A x) <-> foo (SAtom x) foo (A x1 ::: A x2 ::: Nil) <-> foo (SList _ [SAtom x1, SAtom x2]) foo (A x ::: L xs)) <-> foo (SList _ (SAtom x : xs)) foo (L ys ::: A x ::: L xs)) <-> foo (SList _ (SList _ ys : SAtom x : xs)) foo (L x) <-> foo (SList _ x) ~~~ Notice that you need to end the pattern `:::` with `Nil` for the empty list or `L xs` for matching the remainder of the list. Indeed, if you write ~~~ foo (x ::: xs) = ... ~~~ this is equivalent to : ~~~ foo (SList b (x : rest)) = let xs = SList b rest in ... ~~~ You can refer to the documentation of the `:::` constructor for more information. # Parsing S-expressions The parsing is based on [megaparsec](http://hackage.haskell.org/package/megaparsec). S-expresso allows you to customize the following : * The parser for atoms * The opening tag (usually "("), the closing tag (usually ")") and a possible dependency of the closing tag on the opening one. * If some space is required or optional between any pair of atoms. * How to parse space (ex: treat comments as whitespace) The library offers amoung others the `decodeOne` and `decode` functions. The former only reads one S-expression while the other parses many S-expressions. Both functions creates a megaparsec parser from a `SExprParser` argument. The `SExprParser` is the data type that defines how to read an S-expression. The easiest way to create a `SExprParser` is to use the function `plainSExprParser` with your own custom atom parser. This will create a parser where S-expression starts with "(", ends with ")" and space is mandatory between atoms. ~~~haskell Import Data.Void Import qualified Data.Text as T Import Text.Megaparsec Import Text.Megaparsec.Char Import qualified Text.Megaparser.Char.Lexer as L atom = some letter sexp = decode $ plainSExprParser atom -- Returns (SList () [SAtom "hello", SAtom "world"]) ex1 = parse sexp "" "(hello world)" -- Returns (SList () [SAtom "hello", SAtom "world", SList () [SAtom "bonjour"]]) ex2 = parse sexp "" " (hello world(bonjour)) " -- Returns SAtom "hola" ex2 = parse sexp "" "hola" ~~~ ## Customizing the SExprParser S-expresso provides many functions to modify the behavior of the parser. For example, you can use the functions `setTags`, `setTagsFromList`, `setSpace` and `setSpacingRule` to modify the behavior of the parser. Following on the preceding example: ~~~haskell -- setTags data MyType = List | Vector listOrVector = let sTag = (char '(' >> return List) <|> (string "#(" >> return Vector) eTag = \t -> char ')' >> return t p = setTags sTag eTag $ plainSExprParser atom in decode p -- Returns (SList List [SList Vector [SAtom "a", SAtom "b"], SAtom "c"]) ex3 = parse listOrVector "" "(#(a b) c)" -- setTagsFromList listOrVector2 = decode $ setTagsFromList [("(",")",List),("#(",")",Vector)] $ plainSExprParser atom -- Returns (SList List [SList Vector [SAtom "a", SAtom "b"], SAtom "c"]) ex4 = parse listOrVector2 "" "(#(a b) c)" -- setSpace withComments = decode $ -- See megaparsec Space in Megaparsec.Char.Lexer setSpace (L.Space Space1 (skipLineComment ";") empty) $ plainSExprParser atom -- Returns (SList () [SAtom "hello", SList () [SAtom "bonjour"]]) ex5 = parse withComments "" "(hello ;world\n (bonjour))" -- setSpacingRule optionalSpace = decode $ setSpacingRule spaceIsOptional $ plainSExprParser (some letter <|> some digitChar) -- Returns (SList () [SAtom "hello", SAtom "1234", SAtom "world"]) ex5 = parse optionalSpace "" "(hello1234world)" ~~~ You can also directly build a custom SExprParser with the constructor `SExprParser`. ## Adding Source Location If you need the source position of the atoms and s-expression, the function `withLocation` transforms an `SExprParser b a` into `SExprParser (Located b) (Located a)`. The `Located` datatype is defined [here](https://github.com/archambaultv/sexpresso/blob/master/src/Data/SExpresso/Parse/Location.hs).