pinchot-0.24.0.0: Write grammars, not parsers

Safe HaskellNone
LanguageHaskell2010

Pinchot

Contents

Description

Pinchot provides a simple language that you use to write a Haskell values that describes a context-free grammar. You can then use Template Haskell to take this value and generate a series of data types that correspond to your context-free grammar. You can also use Template Haskell to create an Earley parser that will parse all strings in the context-free language. Other handy utilities generate functions that will return all the terminal characters from a parsed production rule. It is also possible to easily determine the location (line, column, and position) of any parsed production or character.

Everything you typically need should be in this module.

For examples, please consult Pinchot.Examples.

You should also look at the BNF Converter.

http://bnfc.digitalgrammars.com

Primary differences between BNFC and this library:

  • the BNF Converter works as a standalone binary that parses text BNF files. With Pinchot you specify your grammar in Haskell.
  • the BNF Converter currently generates many more outputs, such as LaTeX. It also generates code for many languages. Pinchot only works in Haskell.
  • the BNF Converter generates input for parser generators like Happy and Bison. Pinchot currently only generates input for the Haskell Earley library.
  • Pinchot integrates seamlessly into Haskell using Template Haskell.
  • the BNF Converter is GPL. Pinchot is BSD3.

Pinchot grows and harvests syntax trees, so it is named after Gifford Pinchot, first chief of the United States Forest Service.

Synopsis

Production rules

type RuleName = String Source #

Type synonym for the name of a production rule. This will be the name of the type constructor for the corresponding type that will be created, so this must be a valid Haskell type constructor name. Typically each context-free grammar that you write will have several production rules; you will want to make sure that every RuleName that you create for a single context-free grammar is unique. However, Pinchot currently does not check for uniqueness. If you use names that are not unique, GHC will give an error message when you try to splice the resulting code, as the data types will not have unique names.

data Rule t Source #

A single production rule.

Instances

Show t => Show (Rule t) Source # 

Methods

showsPrec :: Int -> Rule t -> ShowS #

show :: Rule t -> String #

showList :: [Rule t] -> ShowS #

Generic (Rule t) Source # 

Associated Types

type Rep (Rule t) :: * -> * #

Methods

from :: Rule t -> Rep (Rule t) x #

to :: Rep (Rule t) x -> Rule t #

PrettyVal t => PrettyVal (Rule t) Source # 

Methods

prettyVal :: Rule t -> Value #

listValue :: [Rule t] -> Value

type Rep (Rule t) Source # 
type Rep (Rule t) = D1 (MetaData "Rule" "Pinchot.Types" "pinchot-0.24.0.0-FNVmzC3fiVb8ilWRaT4srD" False) (C1 (MetaCons "Rule" PrefixI True) ((:*:) (S1 (MetaSel (Just Symbol "_ruleName") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 RuleName)) ((:*:) (S1 (MetaSel (Just Symbol "_ruleDescription") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 (Maybe String))) (S1 (MetaSel (Just Symbol "_ruleType") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 (RuleType t))))))

type BranchName = String Source #

Type synonym the the name of an alternative in a nonTerminal. This name must not conflict with any other data constructor in your grammar.

terminal Source #

Arguments

:: RuleName 
-> Q (TExp (t -> Bool))

Valid terminal symbols. This is a typed Template Haskell expression. To use it, make sure you have

{-# LANGUAGE TemplateHaskell #-}

at the top of your module, and then use the Template Haskell quotes, like this:

terminal "AtoZ" [|| (\c -> c >= 'A' && c <= 'Z') ||]
-> Rule t 

Creates a terminal production rule. Example: rLetter.

nonTerminal Source #

Arguments

:: RuleName

Will be used for the name of the resulting type

-> [(BranchName, [Rule t])]

Branches of the non-terminal production rule. This list must have at least one element; otherwise, an error will result.

-> Rule t 

Creates a non-terminal production rule. This is the most flexible way to create non-terminals. You can even create a non-terminal that depends on itself. Example: rLetters.

union Source #

Arguments

:: RuleName

Will be used for the name of the resulting type

-> [Rule t]

List of branches. There must be at least one branch; otherwise a compile-time error will occur.

-> Rule t 

Creates a non-terminal production rule where each branch has only one production. This function ultimately uses nonTerminal. Each branch is assigned a BranchName that is

RULE_NAME'PRODUCTION_NAME

where RULE_NAME is the name of the rule itself, and PRODUCTION_NAME is the rule name for what is being produced.

Example: rDirection.

series Source #

Arguments

:: RuleName

Will be used for the name of the resulting type, and for the name of the sole data constructor

-> [t]

The list of tokens to use. This must have at least one item; otherwise this function will apply error. This list must be finite.

-> Rule t 

Creates a production for a sequence of terminals. Useful for parsing specific words. When used with syntaxTrees, the resulting data type is a newtype that wraps a NonEmpty (t, a), where t is the type of the token (often Char) and a is an arbitrary metadata type.

Examples: rBoulevard.

wrap Source #

Arguments

:: RuleName

Will be used for the name of the resulting data type, and for the name of the sole data constructor

-> Rule t

The resulting Rule simply wraps this Rule.

-> Rule t 

Creates a newtype wrapper. Example: rCity.

record Source #

Arguments

:: RuleName

The name of this rule, which is used both as the type name and for the name of the sole data constructor

-> [Rule t]

The right-hand side of this rule. This sequence can be empty, which results in an epsilon production.

-> Rule t 

Creates a new non-terminal production rule with only one alternative where each field has a record name. The name of each record is:

_r'RULE_NAME'INDEX'FIELD_TYPE

where RULE_NAME is the name of this rule, INDEX is the index number for this field (starting with 0), and FIELD_TYPE is the type of the field itself.

Currently there is no way to change the names of the record fields.

Example: rAddress.

opt :: Rule t -> Rule t Source #

Creates a rule for a production that optionally produces another rule. The name for the created Rule is the name of the Rule to which this function is applied, with 'Opt appended to the end.

Example: rOptNewline.

star :: Rule t -> Rule t Source #

Creates a rule for the production of a sequence of other rules. The name for the created Rule is the name of the Rule to which this function is applied, with 'Star appended.

Example: rPreSpacedWord.

plus :: Rule t -> Rule t Source #

Creates a rule for a production that appears at least once. The name for the created Rule is the name of the Rule to which this function is applied, with 'Plus appended.

Example: rDigits.

Errors

label :: Rule t -> String -> Rule t Source #

Name a Rule for use in error messages. If you do not name a rule using this combinator, the rule's type name will be used in error messages.

(<?>) :: Rule t -> String -> Rule t infixr 0 Source #

Infix synonym for label. Example: rDigit.

Qualifiers

type Qualifier = String Source #

Many functions take an argument that holds the name qualifier for the module that contains the data types created by applying a function such as syntaxTrees or earleyProduct.

You will have to make sure that these data types are in scope. The spliced Template Haskell code has to know where to look for these data types. If you did an unqualified import or if the types are in the same module as the function that takes a Qualifier argument, just pass the empty string here. If you did a qualified import, use the appropriate qualifier here.

For example, if you used import qualified MyAst, pass "MyAst" here. If you used import qualified Data.MyLibrary.MyAst as MyLibrary.MyAst, pass "MyLibrary.MyAst" here.

Creating data types corresponding to grammars

syntaxTrees Source #

Arguments

:: [Name]

What to derive, e.g. [''Eq, ''Ord, ''Show]

-> [Rule t] 
-> DecsQ 

Makes the top-level declarations for each given Rule and for all ancestors of the given Rules. Since ancestors are included, you can get the entire tree of types that you need by applying this function to a single start symbol. Example: Pinchot.Examples.SyntaxTrees.

allRulesRecord Source #

Arguments

:: Qualifier

Qualifier for data types corresponding to those created from the Rules

-> [Rule t]

A record is created that holds a value for each Rule in the Seq, as well as for every ancestor of these Rules.

-> DecsQ

When spliced, this will create a single declaration that is a record with the name Productions.

a'NAME

where NAME is the name of the type. Don't count on these records being in any particular order.

Creates a record data type that holds a value of type

Prod r String (t, a) (p t a)

where

  • r is left universally quantified
  • t is the token type (often Char)
  • a is any additional information about each token (often Loc)
  • p is the type of the particular production

This always creates a single product type whose name is Productions; currently the name cannot be configured.

Example: Pinchot.Examples.AllRulesRecord.

Typeclass instances and optics

wrappedInstances :: [Rule t] -> DecsQ Source #

Creates a Wrapped instance for each Rule and its ancestors, if there is an instance. Only terminal, wrap, opt, star, and plus get instances of Wrapped.

This must be spliced in the same module in which the syntax tree types are created; this way, no orphans are created. Since ancestors are included, you can get the entire tree of types that you need by applying this function to a single start symbol.

Example: Pinchot.Examples.SyntaxTrees.

bifunctorInstances :: [Rule t] -> DecsQ Source #

Creates an instance of Bifunctor for every Rule in the Seq as well as their ancestors.

This function must be spliced in the same module as the module in which the syntax tree types are created. This avoids problems with orphan instances. Since ancestors are included, you can get the entire tree of instances that you need by applying this function to a single start symbol.

Example: Pinchot.Examples.SyntaxTrees.

semigroupInstances :: [Rule t] -> DecsQ Source #

Where possible, creates an instance of Semigroup for every Rule in the Seq as well as their ancestors. Only star and plus rules, as well as rules that through one or more layers of wrap ultimately wrap a star or plus rule, get a Semigroup instance.

This function must be spliced in the same module as the module in which the syntax tree types are created. This avoids problems with orphan instances. Since ancestors are included, you can get the entire tree of instances that you need by applying this function to a single start symbol.

Example: Pinchot.Examples.SyntaxTrees.

monoidInstances :: [Rule t] -> DecsQ Source #

Where possible, creates an instance of Monoid for every Rule in the Seq as well as their ancestors. Only star rules, as well as rules that through one or more layers of wrap ultimately wrap a star rule, get a Monoid instance.

This function must be spliced in the same module as the module in which the syntax tree types are created. This avoids problems with orphan instances. Since ancestors are included, you can get the entire tree of instances that you need by applying this function to a single start symbol.

Example: Pinchot.Examples.SyntaxTrees.

prettyInstances :: [Rule t] -> DecsQ Source #

Creates a PrettyVal instance for each Rule in the Seq, as well as for all the ancestors of each Rule in the Seq.

This function must be spliced in the same module as the module in which the syntax tree types are created. This avoids problems with orphan instances. Since ancestors are included, you can get the entire tree of instances that you need by applying this function to a single start symbol.

Example: Pinchot.Examples.SyntaxTrees.

rulesToOptics Source #

Arguments

:: (Lift t, Data t) 
=> Qualifier

Qualifier for module containing the data types that will get optics

-> Name

Type name for the terminal

-> [Rule t] 
-> Q [Dec] 

Creates optics declarations for a Rule, if optics can be made for the Rule:

Each rule in the sequence of Rule, as well as all ancestors of those Rules, will be handled.

Example: Pinchot.Examples.RulesToOptics.

Creating Earley grammars

earleyGrammarFromRule Source #

Arguments

:: (Data t, Lift t) 
=> Qualifier

Module prefix holding the data types created with syntaxTrees

-> Rule t

Create a grammar for this Rule

-> Q Exp 

Creates an expression that has type

Grammar r (Prod r String (c, a) (p c a))

where r is left universally quantified; c is the terminal type (often Char), a is arbitrary metadata about each token (often Loc) and p is the data type corresponding to the given Rule.

Example: addressGrammar.

earleyProduct Source #

Arguments

:: (Data t, Lift t) 
=> Qualifier

Qualifier for data types corresponding to those created from the Rules

-> Qualifier

Qualifier for the type created with allRulesRecord

-> [Rule t]

Creates an Earley grammar that contains a Prod for each Rule in this list, as well as all the ancestors of these Rules.

-> ExpQ

When spliced, earleyProduct creates an expression whose type is Grammar r (Productions r t a), where Productions is the type created by allRulesRecord; r is left universally quantified; t is the token type (often Char), and a is any additional information about each token (often Loc).

Creates a Grammar that contains a Prod for every given Rule and its ancestors. Example: addressAllProductions.

Terminalizers

terminalizeRuleExp :: Qualifier -> Rule t -> Q Exp Source #

For the given rule, returns an expression that has type of either

Production a -> [(t, a)]

or

Production a -> NonEmpty (t, a)

where Production is the production corresponding to the given Rule, and t is the terminal token type. NonEmpty is returned for productions that must always contain at least one terminal symbol; for those that can be empty, Seq is returned.

Example: terminalizeAddress.

terminalizers Source #

Arguments

:: Qualifier

Qualifier for the module containing the data types created from the Rules

-> [Rule t] 
-> Q [Dec] 

For all the given rules and their ancestors, creates declarations that reduce the rule and all its ancestors to terminal symbols. Each rule gets a declaration named t'RULE_NAME where RULE_NAME is the name of the rule. The type of the declaration is either

Production a -> [(t, a)]

or

Production a -> NonEmpty (t, a)

where Production is the production corresponding to the given Rule, t is the terminal token type (often Char), and a is arbitrary metadata about each token (often Loc). NonEmpty is returned for productions that must always contain at least one terminal symbol; for those that can be empty, Seq is returned.

Example: Pinchot.Examples.Terminalize.

Locations

data Loc Source #

A location.

Constructors

Loc 

Fields

Instances

Eq Loc Source # 

Methods

(==) :: Loc -> Loc -> Bool #

(/=) :: Loc -> Loc -> Bool #

Data Loc Source # 

Methods

gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> Loc -> c Loc #

gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c Loc #

toConstr :: Loc -> Constr #

dataTypeOf :: Loc -> DataType #

dataCast1 :: Typeable (* -> *) t => (forall d. Data d => c (t d)) -> Maybe (c Loc) #

dataCast2 :: Typeable (* -> * -> *) t => (forall d e. (Data d, Data e) => c (t d e)) -> Maybe (c Loc) #

gmapT :: (forall b. Data b => b -> b) -> Loc -> Loc #

gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> Loc -> r #

gmapQr :: (r' -> r -> r) -> r -> (forall d. Data d => d -> r') -> Loc -> r #

gmapQ :: (forall d. Data d => d -> u) -> Loc -> [u] #

gmapQi :: Int -> (forall d. Data d => d -> u) -> Loc -> u #

gmapM :: Monad m => (forall d. Data d => d -> m d) -> Loc -> m Loc #

gmapMp :: MonadPlus m => (forall d. Data d => d -> m d) -> Loc -> m Loc #

gmapMo :: MonadPlus m => (forall d. Data d => d -> m d) -> Loc -> m Loc #

Ord Loc Source # 

Methods

compare :: Loc -> Loc -> Ordering #

(<) :: Loc -> Loc -> Bool #

(<=) :: Loc -> Loc -> Bool #

(>) :: Loc -> Loc -> Bool #

(>=) :: Loc -> Loc -> Bool #

max :: Loc -> Loc -> Loc #

min :: Loc -> Loc -> Loc #

Read Loc Source # 
Show Loc Source # 

Methods

showsPrec :: Int -> Loc -> ShowS #

show :: Loc -> String #

showList :: [Loc] -> ShowS #

Generic Loc Source # 

Associated Types

type Rep Loc :: * -> * #

Methods

from :: Loc -> Rep Loc x #

to :: Rep Loc x -> Loc #

PrettyVal Loc Source # 

Methods

prettyVal :: Loc -> Value #

listValue :: [Loc] -> Value

type Rep Loc Source # 

locations :: Traversable t => t Char -> t (Char, Loc) Source #

Adds locations to a list of characters.

noLocations :: Functor f => f a -> f (a, ()) Source #

Takes a list of tokens and assigns empty locations.

Running parsers with locations

locatedFullParses Source #

Arguments

:: (forall r. Grammar r (Prod r String (Char, Loc) (p Char Loc)))

Earley grammar with production that you want to parse.

-> [Char]

Source text

-> ([p Char Loc], Report String [(Char, Loc)])

A list of successful parses that when to the end of the source string, along with the Earley report showing possible errors.

Obtains all full Earley parses from a given input string, after assigning a location to every Char. Example: address.