Safe Haskell | None |
---|---|
Language | Haskell2010 |
Pinchot provides a simple language that you use to write a Haskell values that describes a context-free grammar. You can then use Template Haskell to take this value and generate a series of data types that correspond to your context-free grammar. You can also use Template Haskell to create an Earley parser that will parse all strings in the context-free language. Other handy utilities generate functions that will return all the terminal characters from a parsed production rule. It is also possible to easily determine the location (line, column, and position) of any parsed production or character.
Everything you typically need should be in this module.
For examples, please consult Pinchot.Examples.
You should also look at the BNF Converter.
http://bnfc.digitalgrammars.com
Primary differences between BNFC and this library:
- the BNF Converter works as a standalone binary that parses text BNF files. With Pinchot you specify your grammar in Haskell.
- the BNF Converter currently generates many more outputs, such as LaTeX. It also generates code for many languages. Pinchot only works in Haskell.
- the BNF Converter generates input for parser generators like Happy and Bison. Pinchot currently only generates input for the Haskell Earley library.
- Pinchot integrates seamlessly into Haskell using Template Haskell.
- the BNF Converter is GPL. Pinchot is BSD3.
Pinchot grows and harvests syntax trees, so it is named after Gifford Pinchot, first chief of the United States Forest Service.
- data Intervals a
- include :: a -> a -> Intervals a
- exclude :: a -> a -> Intervals a
- solo :: a -> Intervals a
- pariah :: a -> Intervals a
- data NonEmpty a = NonEmpty {}
- front :: forall a. Lens' (NonEmpty a) a
- rest :: forall a. Lens' (NonEmpty a) (Seq a)
- flatten :: NonEmpty a -> Seq a
- seqToNonEmpty :: Seq a -> Maybe (NonEmpty a)
- prependSeq :: Seq a -> NonEmpty a -> NonEmpty a
- appendSeq :: NonEmpty a -> Seq a -> NonEmpty a
- append :: NonEmpty a -> NonEmpty a -> NonEmpty a
- singleton :: a -> NonEmpty a
- type RuleName = String
- data Rule t
- type BranchName = String
- terminal :: RuleName -> Intervals t -> Rule t
- nonTerminal :: RuleName -> Seq (BranchName, Seq (Rule t)) -> Rule t
- union :: RuleName -> Seq (Rule t) -> Rule t
- terminals :: RuleName -> String -> Rule Char
- wrap :: RuleName -> Rule t -> Rule t
- record :: RuleName -> Seq (Rule t) -> Rule t
- opt :: Rule t -> Rule t
- star :: Rule t -> Rule t
- plus :: Rule t -> Rule t
- label :: Rule t -> String -> Rule t
- (<?>) :: Rule t -> String -> Rule t
- type Qualifier = String
- syntaxTrees :: [Name] -> Seq (Rule t) -> DecsQ
- allRulesRecord :: Qualifier -> Seq (Rule t) -> DecsQ
- wrappedInstances :: Seq (Rule t) -> DecsQ
- bifunctorInstances :: Seq (Rule t) -> DecsQ
- semigroupInstances :: Seq (Rule t) -> DecsQ
- monoidInstances :: Seq (Rule t) -> DecsQ
- rulesToOptics :: Lift t => Qualifier -> Name -> Seq (Rule t) -> Q [Dec]
- earleyGrammarFromRule :: Lift t => Qualifier -> Rule t -> Q Exp
- earleyProduct :: Lift t => Qualifier -> Qualifier -> Seq (Rule t) -> ExpQ
- terminalizeRuleExp :: Qualifier -> Rule t -> Q Exp
- terminalizers :: Qualifier -> Seq (Rule t) -> Q [Dec]
- data Loc = Loc {}
- line :: Lens' Loc Int
- col :: Lens' Loc Int
- pos :: Lens' Loc Int
- locations :: FoldableLL full Char => full -> Seq (Char, Loc)
- noLocations :: FoldableLL full item => full -> Seq (item, ())
- locatedFullParses :: FoldableLL full Char => (forall r. Grammar r (Prod r String (Char, Loc) (p Char Loc))) -> full -> ([p Char Loc], Report String (Seq (Char, Loc)))
Intervals
Non-empty
A non-empty sequence.
Monad NonEmpty Source # | |
Functor NonEmpty Source # | |
Applicative NonEmpty Source # | |
Foldable NonEmpty Source # | |
Traversable NonEmpty Source # | |
Eq a => Eq (NonEmpty a) Source # | |
Data a => Data (NonEmpty a) Source # | |
Ord a => Ord (NonEmpty a) Source # | |
Show a => Show (NonEmpty a) Source # | |
Semigroup (NonEmpty a) Source # | |
append :: NonEmpty a -> NonEmpty a -> NonEmpty a Source #
Associative operation that appends to NonEmpty
.
Production rules
type RuleName = String Source #
Type synonym for the name of a production rule. This will be the
name of the type constructor for the corresponding type that will
be created, so this must be a valid Haskell type constructor name.
Typically each context-free grammar that you write will have
several production rules; you will want to make sure that every
RuleName
that you create for a single context-free grammar is
unique. However, Pinchot currently does not check for
uniqueness. If you use names that are not unique, GHC will give
an error message when you try to splice the resulting code, as
the data types will not have unique names.
A single production rule.
type BranchName = String Source #
Type synonym the the name of an alternative in a nonTerminal
.
This name must not conflict with any other data constructor in
your grammar.
Creates a terminal production rule. Example:
rLetter
.
:: RuleName | Will be used for the name of the resulting type |
-> Seq (BranchName, Seq (Rule t)) | Branches of the non-terminal production rule. This |
-> Rule t |
Creates a non-terminal production rule. This is the most
flexible way to create non-terminals. You can even create a
non-terminal that depends on itself. Example:
rLetters
.
:: RuleName | Will be used for the name of the resulting type |
-> Seq (Rule t) | List of branches. There must be at least one branch; otherwise a compile-time error will occur. |
-> Rule t |
Creates a non-terminal production rule where each branch has
only one production. This function ultimately uses
nonTerminal
. Each branch is assigned a BranchName
that is
RULE_NAME'PRODUCTION_NAME
where RULE_NAME
is the name of the rule itself, and
PRODUCTION_NAME
is the rule name for what is being produced.
Example: rDirection
.
:: RuleName | Will be used for the name of the resulting type, and for the name of the sole data constructor |
-> String | |
-> Rule Char |
Creates a production for a sequence of terminals. Useful for
parsing specific words. Ultimately this is simply a function
that creates a Rule
using the record
function.
In terminals n s
, For each Char
in the String
, a Rule
is
created whose RuleName
is n
followed by an apostrophe
followed by the index of the position of the Char
.
Examples: rBoulevard
.
:: RuleName | Will be used for the name of the resulting data type, and for the name of the sole data constructor |
-> Rule t | |
-> Rule t |
Creates a newtype wrapper. Example:
rCity
.
:: RuleName | The name of this rule, which is used both as the type name and for the name of the sole data constructor |
-> Seq (Rule t) | The right-hand side of this rule. This sequence can be empty, which results in an epsilon production. |
-> Rule t |
Creates a new non-terminal production rule with only one alternative where each field has a record name. The name of each record is:
_r'RULE_NAME'INDEX'FIELD_TYPE
where RULE_NAME
is the name of this rule, INDEX
is the index number
for this field (starting with 0), and FIELD_TYPE
is the type of the
field itself.
Currently there is no way to change the names of the record fields.
Example: rAddress
.
opt :: Rule t -> Rule t Source #
Creates a rule for a production that optionally produces another
rule. The name for the created Rule
is the name of the Rule
to
which this function is applied, with 'Opt
appended to the end.
Example: rOptNewline
.
star :: Rule t -> Rule t Source #
Creates a rule for the production of a sequence of other rules.
The name for the created Rule
is the name of the Rule
to which
this function is applied, with 'Star
appended.
Example: rPreSpacedWord
.
Errors
label :: Rule t -> String -> Rule t Source #
Name a Rule
for use in error messages. If you do not name a
rule using this combinator, the rule's type name will be used in
error messages.
Qualifiers
type Qualifier = String Source #
Many functions take an argument that holds the name qualifier
for the module that contains the data types created by applying a
function such as syntaxTrees
or
earleyProduct
.
You will have to make sure that these data types are in scope.
The spliced Template Haskell code has to know where to
look for these data types. If you did an unqualified import
or
if the types are in the same module as the function that takes a
Qualifier
argument, just pass the empty string here. If you did a
qualified import, use the appropriate qualifier here.
For example, if you used import qualified MyAst
, pass
"MyAst"
here. If you used import qualified
Data.MyLibrary.MyAst as MyLibrary.MyAst
, pass
"MyLibrary.MyAst"
here.
I recommend that you always create a new module and that all you
do in that module is apply syntaxTrees
or
earleyProduct
, and that you then perform an import
qualified
to bring those names into scope in the module in which
you use a function that takes a Qualifier
argument. This
avoids unlikely, but possible, issues that could otherwise arise
due to naming conflicts.
Creating data types corresponding to grammars
Makes the top-level declarations for each given Rule
and for
all ancestors of the given Rule
s. Since ancestors are
included, you can get the entire tree of types that you need by
applying this function to a single start symbol. Example:
Pinchot.Examples.SyntaxTrees.
:: Qualifier | Qualifier for data types corresponding to those created from
the |
-> Seq (Rule t) | A record is created that holds a value for each |
-> DecsQ | When spliced, this will create a single declaration that is a
record with the name a'NAME where |
Creates a record data type that holds a value of type
Prod
rString
(t, a) (p t a)
where
r
is left universally quantifiedt
is the token type (oftenChar
)a
is any additional information about each token (oftenLoc
)p
is the type of the particular production
This always creates a single product type whose name is
Productions
; currently the name cannot be configured.
Example: Pinchot.Examples.AllRulesRecord.
Typeclass instances and optics
wrappedInstances :: Seq (Rule t) -> DecsQ Source #
Creates a Wrapped
instance for each Rule
and its
ancestors, if there is an instance.
Only terminal
, wrap
,
opt
, star
, and plus
get instances of Wrapped
.
This must be spliced in the same module in which the syntax tree types are created; this way, no orphans are created. Since ancestors are included, you can get the entire tree of types that you need by applying this function to a single start symbol.
Example: Pinchot.Examples.SyntaxTrees.
bifunctorInstances :: Seq (Rule t) -> DecsQ Source #
Creates an instance of Bifunctor
for every Rule
in the Seq
as well as their ancestors.
This function must be spliced in the same module as the module in which the syntax tree types are created. This avoids problems with orphan instances. Since ancestors are included, you can get the entire tree of instances that you need by applying this function to a single start symbol.
Example: Pinchot.Examples.SyntaxTrees.
semigroupInstances :: Seq (Rule t) -> DecsQ Source #
Where possible, creates an instance of Semigroup
for every Rule
in the Seq
as well as their ancestors. Only
star
and plus
rules, as well as rules that through one or
more layers of wrap
ultimately wrap a star
or plus
rule,
get a Semigroup
instance.
This function must be spliced in the same module as the module in which the syntax tree types are created. This avoids problems with orphan instances. Since ancestors are included, you can get the entire tree of instances that you need by applying this function to a single start symbol.
Example: Pinchot.Examples.SyntaxTrees.
monoidInstances :: Seq (Rule t) -> DecsQ Source #
Where possible, creates an instance of Monoid
for every Rule
in the Seq
as well as their ancestors. Only
star
rules, as well as rules that through one or
more layers of wrap
ultimately wrap a star
rule,
get a Monoid
instance.
This function must be spliced in the same module as the module in which the syntax tree types are created. This avoids problems with orphan instances. Since ancestors are included, you can get the entire tree of instances that you need by applying this function to a single start symbol.
Example: Pinchot.Examples.SyntaxTrees.
:: Lift t | |
=> Qualifier | Qualifier for module containing the data types that will get optics |
-> Name | Type name for the terminal |
-> Seq (Rule t) | |
-> Q [Dec] |
Creates optics declarations for a Rule
, if optics can
be made for the Rule
:
terminal
gets a singlePrism
nonTerminal
gets aPrism
for each constructorrecord
gets a singleLens
wrap
,opt
,star
, andplus
do not get optics.
Each rule in the sequence of Rule
, as well as all ancestors of
those Rule
s, will be handled.
Example: Pinchot.Examples.RulesToOptics.
Creating Earley grammars
earleyGrammarFromRule Source #
:: Lift t | |
=> Qualifier | Module prefix holding the data types created with
|
-> Rule t | Create a grammar for this |
-> Q Exp |
Creates an expression that has type
Grammar
r (Prod r String (c, a) (p c a))
where r
is left universally quantified; c
is the terminal
type (often Char
), a
is arbitrary metadata about each token
(often Loc
) and p
is the data type corresponding to
the given Rule
.
Example: addressGrammar
.
:: Lift t | |
=> Qualifier | Qualifier for data types corresponding to those created from
the |
-> Qualifier | Qualifier for the type created with |
-> Seq (Rule t) | Creates an Earley grammar that contains a |
-> ExpQ | When spliced, |
Creates a Grammar
that contains a
Prod
for every given Rule
and its ancestors.
Example: addressAllProductions
.
Terminalizers
terminalizeRuleExp :: Qualifier -> Rule t -> Q Exp Source #
For the given rule, returns an expression that has type of either
Production a -> Seq (t, a)
or
Production a -> NonEmpty (t, a)
where Production
is the production corresponding to the given
Rule
, and t
is the terminal token type. NonEmpty
is
returned for productions that must always contain at least one
terminal symbol; for those that can be empty, Seq
is returned.
Example: terminalizeAddress
.
:: Qualifier | Qualifier for the module containing the data types created
from the |
-> Seq (Rule t) | |
-> Q [Dec] |
For all the given rules and their ancestors, creates
declarations that reduce the rule and all its ancestors to
terminal symbols. Each rule gets a declaration named
t'RULE_NAME
where RULE_NAME
is the name of the rule. The
type of the declaration is either
Production a -> Seq (t, a)
or
Production a -> NonEmpty (t, a)
where Production
is the production corresponding to the given
Rule
, t
is the terminal token type (often Char
), and a
is
arbitrary metadata about each token (often Loc
). NonEmpty
is
returned for productions that must always contain at least one
terminal symbol; for those that can be empty, Seq
is returned.
Example: Pinchot.Examples.Terminalize.
Locations
A location.
locations :: FoldableLL full Char => full -> Seq (Char, Loc) Source #
Takes any ListLike value based on Char
(Seq
, Text
,
String
, etc.) and creates a Seq
which pairs each Char
with
its location. Example: locatedFullParses
.
noLocations :: FoldableLL full item => full -> Seq (item, ()) Source #
Breaks a ListLike into a Seq
but does not assign locations.
Running parsers with locations
:: FoldableLL full Char | |
=> (forall r. Grammar r (Prod r String (Char, Loc) (p Char Loc))) | Earley grammar with production that you want to parse. |
-> full | |
-> ([p Char Loc], Report String (Seq (Char, Loc))) | A list of successful parses that when to the end of the source string, along with the Earley report showing possible errors. |