pinchot-0.6.0.0: Build parsers and ASTs for context-free grammars

Safe HaskellNone
LanguageHaskell2010

Pinchot

Contents

Description

Pinchot provides a simple language that you use to write a Haskell program that describes a context-free grammar. When run, this program creates a value that stores your context-free grammar. You can then use Template Haskell to take this value and generate a series of data types that correspond to your context-free grammar. You can also use Template Haskell to create an Earley parser that will parse all strings in the context-free language.

You will need to have

{-# LANGUAGE TypeFamilies #-}

at the top of any module in which you use Template Haskell to splice in the resulting data types.

For examples, please consult Pinchot.Examples.

You should also look at the BNF Converter.

http://bnfc.digitalgrammars.com

Primary differences between BNFC and this library:

  • the BNF Converter works as a standalone binary that parses text BNF files. With Pinchot you specify your grammar in Haskell.
  • the BNF Converter currently generates many more outputs, such as LaTeX. It also generates code for many languages. Pinchot only works in Haskell.
  • the BNF Converter generates input for parser generators like Happy and Bison. Pinchot currently only generates input for the Haskell Earley library.
  • Pinchot integrates seamlessly into Haskell using Template Haskell.
  • the BNF Converter is GPL. Pinchot is BSD3.

Pinchot grows and harvests syntax trees, so it is named after Gifford Pinchot, first chief of the United States Forest Service.

Synopsis

Intervals

data Intervals a Source

Groups of terminals. Create an Intervals using include, exclude, solo and pariah. Combine Intervals using mappend, which will combine both the included and excluded terminal symbols from each operand.

include :: a -> a -> Intervals a Source

Include a range of symbols in the Intervals. For instance, to include the characters a, b, and c, use include a c.

exclude :: a -> a -> Intervals a Source

Exclude a range of symbols in the Intervals. Each symbol that is excluded is not included in the Intervals, even if it is also included.

solo :: a -> Intervals a Source

Include a single symbol.

pariah :: a -> Intervals a Source

Exclude a single symbol.

Simple production rules

data Pinchot t a Source

Constructs new Rules. t is the type of the token; often this will be Char.

Pinchot is a Monad and an Applicative so you can combine computations using the usual methods of those classes. Also, Pinchot is a MonadFix. This allows you to construct a Rule that depends on itself, and to construct sets of Rules that have mutually recursive dependencies. MonadFix also allows you to use the GHC RecursiveDo extension. Put

{-# LANGUAGE RecursiveDo #-}

at the top of your module, then use mdo instead of do. Because an mdo block is recursive, you can use a binding before it is defined, just as you can in a set of let bindings.

type RuleName = String Source

Type synonym for the name of a production rule. This will be the name of the type constructor for the corresponding type in the AST, so this must be a valid Haskell type constructor name.

If you are creating a terminal, option, list, list1, or wrap, the RuleName will also be used for the name of the single data construtor. If you are creating a nonTerminal, you will specify the name of each data constructor with AlternativeName.

type AlternativeName = String Source

Type synonym the the name of an alternative in a nonTerminal. This name must not conflict with any other data constructor, either one specified as an AlternativeName or one that was created using terminal, option, list, or list1.

data Rule t Source

A single production rule. It may be a terminal or a non-terminal.

Instances

Eq t => Eq (Rule t) Source 
Ord t => Ord (Rule t) Source 
Show t => Show (Rule t) Source 

terminal Source

Arguments

:: RuleName 
-> Intervals t

Valid terminal symbols

-> Pinchot t (Rule t) 

Creates a terminal production rule.

terminalSeq Source

Arguments

:: RuleName 
-> Seq t

Sequence of terminal symbols to recognize

-> Pinchot t (Rule t) 

Creates a production for a sequence of terminals. Useful for parsing specific words.

nonTerminal Source

Arguments

:: RuleName 
-> Seq (AlternativeName, Seq (Rule t))

Alternatives. There must be at least one alternative; otherwise, an error will result. In each pair (a, b), a will be the data constructor, so this must be a valid Haskell data constructor name. b is the sequence of production rules, which can be empty (this is how to create an epsilon production).

-> Pinchot t (Rule t) 

Creates a new non-terminal production rule.

union :: RuleName -> Seq (AlternativeName, Rule t) -> Pinchot t (Rule t) Source

Creates a new non-terminal production rule where each alternative produces only one rule.

record Source

Arguments

:: RuleName

The name of this rule, which is used both as the type name and the name of the sole data constructor.

-> Seq (Rule t)

The right-hand side of this rule. This sequence can be empty, which results in an epsilon production.

-> Pinchot t (Rule t) 

Creates a new non-terminal production rule with only one alternative where each field has a record name. The name of each record is:

_f'RULE_NAME'INDEX'FIELD_TYPE

where RULE_NAME is the name of this rule, INDEX is the index number for this field (starting with 0), and FIELD_TYPE is the type of the field itself. For an example, see Address.

Currently there is no way to change the names of the record fields.

Rules that modify other rules

list Source

Arguments

:: RuleName 
-> Rule t

The resulting Rule is a sequence of productions of this Rule; that is, this Rule may appear zero or more times.

-> Pinchot t (Rule t) 

Creates a rule for the production of a sequence of other rules.

list1 Source

Arguments

:: RuleName 
-> Rule t

The resulting Rule produces this Rule at least once.

-> Pinchot t (Rule t) 

Creates a rule for a production that appears at least once.

option Source

Arguments

:: RuleName 
-> Rule t

The resulting Rule optionally produces this Rule; that is, this Rule may appear once or not at all.

-> Pinchot t (Rule t) 

Creates a rule for a production that optionally produces another rule.

wrap Source

Arguments

:: RuleName 
-> Rule t

The resulting Rule simply wraps this Rule.

-> Pinchot t (Rule t) 

Creates a newtype wrapper.

label :: String -> Rule t -> Rule t Source

Name a Rule for use in error messages. If you do not name a rule using this combinator, the rule's type name will be used in error messages.

(<?>) :: Pinchot t (Rule t) -> String -> Pinchot t (Rule t) infixr 0 Source

Infix form of label for use in a Pinchot; handy for use in do or mdo notation.

Transforming an AST to code

earleyGrammar Source

Arguments

:: Lift t 
=> String

Module prefix. You have to make sure that the data types you created with ruleTreeToTypes or with allRulesToTypes are in scope, either because they were spliced into the same module that earleyParser is spliced into, or because they are imported into scope. The spliced Template Haskell code has to know where to look for these data types. If you did an unqualified import or if the types are in the same module as is the splice of earleyParser, just pass the empty string here. If you did a qualified import, pass the appropriate namespace here.

For example, if you used import qualified MyAst, pass "MyAst" here. If you used import qualified Data.MyLibrary.MyAst as MyLibrary.MyAst, pass "MyLibrary.MyAst" here.

For an example where the types are in the same module, see Pinchot.Examples.PostalAstRuleTree or Pinchot.Examples.PostalAstAllRules.

For an example using a qualified import, see Pinchot.Examples.QualifiedImport.

-> Pinchot t (Rule t)

Creates an Earley parser for the Rule that the Pinchot returns.

-> Q Exp 

Creates an Earley grammar for a given Rule. For examples of how to use this, see the source code for Pinchot.Examples.PostalAstRuleTree and for Pinchot.Examples.PostalAstAllRules.

type MakeOptics = Bool Source

Should optics be made?

makeOptics :: MakeOptics Source

Creates optics.

pragma at the top of the module in which you splice this in.

Creates the listed optics for each kind of Rule, as follows:

  • terminal: Prism' a b, where a is the type of the terminal token (often Char) and b is the type of this particular production. For an example, see _Comma.
>>> ',' ^? _Comma
Just (Comma ',')
>>> 'a' ^? _Comma
Nothing
>>> Comma ',' ^. re _Comma
','

Thus this gives you a safe way to insert tokens into types made with terminal (useful if you want to construct a syntax tree.)

noOptics :: MakeOptics Source

Do not make any optics.

allRulesToTypes Source

Arguments

:: Lift t 
=> MakeOptics 
-> Name

Terminal type constructor name. Typically you will use the Template Haskell quoting mechanism to get this.

-> Seq Name

What to derive. For instance, you might use Eq, Ord, and Show here. Each created data type will derive these instances.

-> Pinchot t a

The return value from the Pinchot is ignored.

-> DecsQ 

Creates data types for every Rule created in the Pinchot. The data types are created in the same order in which they were created in the Pinchot. When spliced, the DecsQ is a list of declarations, each of which is an appropriate data or newtype. For an example use of allRulesToTypes, see Pinchot.Examples.PostalAstAllRules.

ruleTreeToTypes Source

Arguments

:: Lift t 
=> MakeOptics 
-> Name

Terminal type constructor name. Typically you will use the Template Haskell quoting mechanism to get this.

-> Seq Name

What to derive. For instance, you might use Eq, Ord, and Show here. Each created data type will derive these instances.

-> Pinchot t (Rule t)

A data type is created for the Rule that the Pinchot returns, and for the ancestors of the Rule.

-> DecsQ 

Creates data types only for the Rule returned from the Pinchot, and for its ancestors.

Manipulating productions

class Production a where Source

Typeclass for all productions, which allows you to extract a sequence of terminal symbols from any production.

Associated Types

type Terminal a :: * Source

Methods

terminals :: a -> Seq (Terminal a) Source

Instances

Production Address Source 
Production MaybeSuffix Source 
Production MaybeDirection Source 
Production SpaceSuffix Source 
Production DirectionSpace Source 
Production ZipCode Source 
Production State Source 
Production City Source 
Production StreetName Source 
Production Number Source 
Production Words Source 
Production PreSpacedWords Source 
Production PreSpacedWord Source 
Production PostalWord Source 
Production Letters Source 
Production Comma Source 
Production Space Source 
Production Suffix Source 
Production Boulevard Source 
Production Way Source 
Production Avenue Source 
Production Street Source 
Production Direction Source 
Production West Source 
Production East Source 
Production South Source 
Production North Source 
Production Letter Source 
Production Digits Source 
Production Digit Source 
Production Address Source 
Production MaybeSuffix Source 
Production MaybeDirection Source 
Production SpaceSuffix Source 
Production DirectionSpace Source 
Production ZipCode Source 
Production State Source 
Production City Source 
Production StreetName Source 
Production Number Source 
Production Words Source 
Production PreSpacedWords Source 
Production PreSpacedWord Source 
Production PostalWord Source 
Production Letters Source 
Production Comma Source 
Production Space Source 
Production Suffix Source 
Production Boulevard Source 
Production Way Source 
Production Avenue Source 
Production Street Source 
Production Direction Source 
Production West Source 
Production East Source 
Production South Source 
Production North Source 
Production Letter Source 
Production Digits Source 
Production Digit Source 
Production ZipCode Source 
Production State Source 
Production City Source 
Production Comma Source 
Production Boulevard Source 
Production Way Source 
Production Avenue Source 
Production Street Source 
Production Suffix Source 
Production SpaceSuffix Source 
Production MaybeSuffix Source 
Production PreSpacedWord Source 
Production PreSpacedWords Source 
Production Letters Source 
Production Letter Source 
Production PostalWord Source 
Production Words Source 
Production StreetName Source 
Production West Source 
Production East Source 
Production South Source 
Production North Source 
Production Direction Source 
Production DirectionSpace Source 
Production MaybeDirection Source 
Production Space Source 
Production Digit Source 
Production Digits Source 
Production Number Source 
Production Address Source