| Safe Haskell | None |
|---|---|
| Language | Haskell2010 |
Pinchot
Contents
Description
Pinchot provides a simple language that you use to write a Haskell values that describes a context-free grammar. You can then use Template Haskell to take this value and generate a series of data types that correspond to your context-free grammar. You can also use Template Haskell to create an Earley parser that will parse all strings in the context-free language. Other handy utilities generate functions that will return all the terminal characters from a parsed production rule. It is also possible to easily determine the location (line, column, and position) of any parsed production or character.
Everything you typically need should be in this module.
For examples, please consult Pinchot.Examples.
You should also look at the BNF Converter.
http://bnfc.digitalgrammars.com
Primary differences between BNFC and this library:
- the BNF Converter works as a standalone binary that parses text BNF files. With Pinchot you specify your grammar in Haskell.
- the BNF Converter currently generates many more outputs, such as LaTeX. It also generates code for many languages. Pinchot only works in Haskell.
- the BNF Converter generates input for parser generators like Happy and Bison. Pinchot currently only generates input for the Haskell Earley library.
- Pinchot integrates seamlessly into Haskell using Template Haskell.
- the BNF Converter is GPL. Pinchot is BSD3.
Pinchot grows and harvests syntax trees, so it is named after Gifford Pinchot, first chief of the United States Forest Service.
- data Intervals a
- include :: a -> a -> Intervals a
- exclude :: a -> a -> Intervals a
- solo :: a -> Intervals a
- pariah :: a -> Intervals a
- data NonEmpty a = NonEmpty {}
- front :: forall a. Lens' (NonEmpty a) a
- rest :: forall a. Lens' (NonEmpty a) (Seq a)
- flatten :: NonEmpty a -> Seq a
- seqToNonEmpty :: Seq a -> Maybe (NonEmpty a)
- prependSeq :: Seq a -> NonEmpty a -> NonEmpty a
- appendSeq :: NonEmpty a -> Seq a -> NonEmpty a
- append :: NonEmpty a -> NonEmpty a -> NonEmpty a
- singleton :: a -> NonEmpty a
- type RuleName = String
- data Rule t
- type BranchName = String
- terminal :: RuleName -> Intervals t -> Rule t
- nonTerminal :: RuleName -> Seq (BranchName, Seq (Rule t)) -> Rule t
- union :: RuleName -> Seq (Rule t) -> Rule t
- terminals :: RuleName -> String -> Rule Char
- wrap :: RuleName -> Rule t -> Rule t
- record :: RuleName -> Seq (Rule t) -> Rule t
- opt :: Rule t -> Rule t
- star :: Rule t -> Rule t
- plus :: Rule t -> Rule t
- label :: Rule t -> String -> Rule t
- (<?>) :: Rule t -> String -> Rule t
- type Qualifier = String
- syntaxTrees :: Name -> [Name] -> Seq (Rule t) -> DecsQ
- allRulesRecord :: Qualifier -> Name -> Seq (Rule t) -> DecsQ
- wrappedInstances :: Seq (Rule t) -> DecsQ
- rulesToOptics :: Lift t => Qualifier -> Name -> Seq (Rule t) -> Q [Dec]
- earleyGrammarFromRule :: Lift t => Qualifier -> Rule t -> Q Exp
- earleyProduct :: Lift t => Qualifier -> Qualifier -> Seq (Rule t) -> ExpQ
- terminalizeRuleExp :: Qualifier -> Rule t -> Q Exp
- terminalizers :: Qualifier -> Name -> Seq (Rule t) -> Q [Dec]
- data Loc = Loc {}
- line :: Lens' Loc Int
- col :: Lens' Loc Int
- pos :: Lens' Loc Int
- locations :: FoldableLL full Char => full -> Seq (Char, Loc)
- noLocations :: FoldableLL full item => full -> Seq (item, ())
- locatedFullParses :: FoldableLL full Char => (forall r. Grammar r (Prod r String (Char, Loc) (p Char Loc))) -> full -> ([p Char Loc], Report String (Seq (Char, Loc)))
Intervals
Non-empty
A non-empty sequence.
seqToNonEmpty :: Seq a -> Maybe (NonEmpty a) Source
append :: NonEmpty a -> NonEmpty a -> NonEmpty a Source
Associative operation that appends to NonEmpty.
Production rules
Type synonym for the name of a production rule. This will be the
name of the type constructor for the corresponding type that will
be created, so this must be a valid Haskell type constructor name.
Typically each context-free grammar that you write will have
several production rules; you will want to make sure that every
RuleName that you create for a single context-free grammar is
unique. However, Pinchot currently does not check for
uniqueness. If you use names that are not unique, GHC will give
an error message when you try to splice the resulting code, as
the data types will not have unique names.
A single production rule.
type BranchName = String Source
Type synonym the the name of an alternative in a nonTerminal.
This name must not conflict with any other data constructor in
your grammar.
Creates a terminal production rule. Example:
rLetter.
Arguments
| :: RuleName | Will be used for the name of the resulting type |
| -> Seq (BranchName, Seq (Rule t)) | Branches of the non-terminal production rule. This |
| -> Rule t |
Creates a non-terminal production rule. This is the most
flexible way to create non-terminals. You can even create a
non-terminal that depends on itself. Example:
rLetters.
Arguments
| :: RuleName | Will be used for the name of the resulting type |
| -> Seq (Rule t) | List of branches. There must be at least one branch; otherwise a compile-time error will occur. |
| -> Rule t |
Creates a non-terminal production rule where each branch has
only one production. This function ultimately uses
nonTerminal. Each branch is assigned a BranchName that is
RULE_NAME'PRODUCTION_NAME
where RULE_NAME is the name of the rule itself, and
PRODUCTION_NAME is the rule name for what is being produced.
Example: rDirection.
Arguments
| :: RuleName | Will be used for the name of the resulting type, and for the name of the sole data constructor |
| -> String | |
| -> Rule Char |
Creates a production for a sequence of terminals. Useful for
parsing specific words. Ultimately this is simply a function
that creates a Rule using the record function.
In terminals n s, For each Char in the String, a Rule is
created whose RuleName is n followed by an apostrophe
followed by the index of the position of the Char.
Examples: rBoulevard.
Arguments
| :: RuleName | Will be used for the name of the resulting data type, and for the name of the sole data constructor |
| -> Rule t | |
| -> Rule t |
Creates a newtype wrapper. Example:
rCity.
Arguments
| :: RuleName | The name of this rule, which is used both as the type name and for the name of the sole data constructor |
| -> Seq (Rule t) | The right-hand side of this rule. This sequence can be empty, which results in an epsilon production. |
| -> Rule t |
Creates a new non-terminal production rule with only one alternative where each field has a record name. The name of each record is:
_r'RULE_NAME'INDEX'FIELD_TYPE
where RULE_NAME is the name of this rule, INDEX is the index number
for this field (starting with 0), and FIELD_TYPE is the type of the
field itself.
Currently there is no way to change the names of the record fields.
Example: rAddress.
opt :: Rule t -> Rule t Source
Creates a rule for a production that optionally produces another
rule. The name for the created Rule is the name of the Rule to
which this function is applied, with 'Opt appended to the end.
Example: rOptNewline.
star :: Rule t -> Rule t Source
Creates a rule for the production of a sequence of other rules.
The name for the created Rule is the name of the Rule to which
this function is applied, with 'Star appended.
Example: rPreSpacedWord.
Errors
label :: Rule t -> String -> Rule t Source
Name a Rule for use in error messages. If you do not name a
rule using this combinator, the rule's type name will be used in
error messages.
Qualifiers
type Qualifier = String Source
Many functions take an argument that holds the name qualifier
for the module that contains the data types created by applying a
function such as syntaxTrees or
earleyProduct.
You will have to make sure that these data types are in scope.
The spliced Template Haskell code has to know where to
look for these data types. If you did an unqualified import or
if the types are in the same module as the function that takes a
Qualifier argument, just pass the empty string here. If you did a
qualified import, use the appropriate qualifier here.
For example, if you used import qualified MyAst, pass
"MyAst" here. If you used import qualified
Data.MyLibrary.MyAst as MyLibrary.MyAst, pass
"MyLibrary.MyAst" here.
I recommend that you always create a new module and that all you
do in that module is apply syntaxTrees or
earleyProduct, and that you then perform an import
qualified to bring those names into scope in the module in which
you use a function that takes a Qualifier argument. This
avoids unlikely, but possible, issues that could otherwise arise
due to naming conflicts.
Creating data types corresponding to grammars
Arguments
| :: Name | Name of terminal type. Typically you will get this from the
Template Haskell quoting mechanism, e.g. |
| -> [Name] | What to derive, e.g. |
| -> Seq (Rule t) | |
| -> DecsQ |
Makes the top-level declarations for each given Rule and for
all ancestors of the given Rules. Since ancestors are
included, you can get the entire tree of types that you need by
applying this function to a single start symbol. Example:
Pinchot.Examples.SyntaxTrees.
Arguments
| :: Qualifier | Qualifier for data types corresponding to those created from
the |
| -> Name | Name of terminal type. Typically you will get this through
the Template Haskell quoting mechanism, such as |
| -> Seq (Rule t) | A record is created that holds a value for each |
| -> DecsQ | When spliced, this will create a single declaration that is a
record with the name a'NAME where |
Creates a record data type that holds a value of type
ProdrString(t, a) (p t a)
where
ris left universally quantifiedtis the token type (oftenChar)ais any additional information about each token (oftenLoc)pis the type of the particular production
This always creates a single product type whose name is
Productions; currently the name cannot be configured.
Example: Pinchot.Examples.SyntaxTrees.
Wrappers and optics
wrappedInstances :: Seq (Rule t) -> DecsQ Source
Creates a Wrapped instance for each Rule and its
ancestors, if there is an instance.
Only terminal, wrap,
opt, star, and plus
get instances of Wrapped.
This must be spliced in the same module in which the syntax tree types are created; this way, no orphans are created. Since ancestors are included, you can get the entire tree of types that you need by applying this function to a single start symbol.
Example: Pinchot.Examples.SyntaxTrees.
Arguments
| :: Lift t | |
| => Qualifier | Qualifier for module containing the data types that will get optics |
| -> Name | Type name for the terminal |
| -> Seq (Rule t) | |
| -> Q [Dec] |
Creates optics declarations for a Rule, if optics can
be made for the Rule:
terminalgets a singlePrismnonTerminalgets aPrismfor each constructorrecordgets a singleLenswrap,opt,star, andplusdo not get optics.
Each rule in the sequence of Rule, as well as all ancestors of
those Rules, will be handled.
Example: Pinchot.Examples.RulesToOptics.
Creating Earley grammars
Arguments
| :: Lift t | |
| => Qualifier | Module prefix holding the data types created with
|
| -> Rule t | Create a grammar for this |
| -> Q Exp |
Creates an expression that has type
Grammar r (Prod r String (c, a) (p c a))
where r is left universally quantified; c is the terminal
type (often Char), a is arbitrary metadata about each token
(often Loc) and p is the data type corresponding to
the given Rule.
Example: addressGrammar.
Arguments
| :: Lift t | |
| => Qualifier | Qualifier for data types corresponding to those created from
the |
| -> Qualifier | Qualifier for the type created with |
| -> Seq (Rule t) | Creates an Earley grammar that contains a |
| -> ExpQ | When spliced, |
Creates a Grammar that contains a
Prod for every given Rule and its ancestors.
Example: addressAllProductions.
Terminalizers
terminalizeRuleExp :: Qualifier -> Rule t -> Q Exp Source
For the given rule, returns an expression that has type of either
Production a -> Seq (t, a)
or
Production a -> NonEmpty (t, a)
where Production is the production corresponding to the given
Rule, and t is the terminal token type. NonEmpty is
returned for productions that must always contain at least one
terminal symbol; for those that can be empty, Seq is returned.
Example: terminalizeAddress.
Arguments
| :: Qualifier | Qualifier for the module containing the data types created
from the |
| -> Name | Name of terminal type. Typically you will get this through
the Template Haskell quoting mechanism, such as |
| -> Seq (Rule t) | |
| -> Q [Dec] |
For all the given rules and their ancestors, creates
declarations that reduce the rule and all its ancestors to
terminal symbols. Each rule gets a declaration named
t'RULE_NAME where RULE_NAME is the name of the rule. The
type of the declaration is either
Production a -> Seq (t, a)
or
Production a -> NonEmpty (t, a)
where Production is the production corresponding to the given
Rule, t is the terminal token type (often Char), and a is
arbitrary metadata about each token (often Loc). NonEmpty is
returned for productions that must always contain at least one
terminal symbol; for those that can be empty, Seq is returned.
Example: Pinchot.Examples.Terminalize.
Locations
A location.
locations :: FoldableLL full Char => full -> Seq (Char, Loc) Source
Takes any ListLike value based on Char (Seq, Text,
String, etc.) and creates a Seq which pairs each Char with
its location. Example: locatedFullParses.
noLocations :: FoldableLL full item => full -> Seq (item, ()) Source
Breaks a ListLike into a Seq but does not assign locations.
Running parsers with locations
Arguments
| :: FoldableLL full Char | |
| => (forall r. Grammar r (Prod r String (Char, Loc) (p Char Loc))) | Earley grammar with production that you want to parse. |
| -> full | |
| -> ([p Char Loc], Report String (Seq (Char, Loc))) | A list of successful parses that when to the end of the source string, along with the Earley report showing possible errors. |