Earley-0.12.0.1: Parsing all context-free grammars using Earley's algorithm.

Safe HaskellNone
LanguageHaskell2010

Text.Earley

Contents

Description

Parsing all context-free grammars using Earley's algorithm.

Synopsis

Context-free grammars

data Prod r e t a Source #

A production.

The type parameters are:

a: The return type of the production.

t for terminal: The type of the terminals that the production operates on.

e for expected: The type of names, used for example to report expected tokens.

r for rule: The type of a non-terminal. This plays a role similar to the s in the type ST s a. Since the parser function expects the r to be universally quantified, there is not much to do with this parameter other than leaving it universally quantified.

As an example, Prod r String Char Int is the type of a production that returns an Int, operates on (lists of) characters and reports String names.

Most of the functionality of Prods is obtained through its instances, e.g. Functor, Applicative, and Alternative.

Instances

Functor (Prod r e t) Source # 

Methods

fmap :: (a -> b) -> Prod r e t a -> Prod r e t b #

(<$) :: a -> Prod r e t b -> Prod r e t a #

Applicative (Prod r e t) Source # 

Methods

pure :: a -> Prod r e t a #

(<*>) :: Prod r e t (a -> b) -> Prod r e t a -> Prod r e t b #

(*>) :: Prod r e t a -> Prod r e t b -> Prod r e t b #

(<*) :: Prod r e t a -> Prod r e t b -> Prod r e t a #

Alternative (Prod r e t) Source # 

Methods

empty :: Prod r e t a #

(<|>) :: Prod r e t a -> Prod r e t a -> Prod r e t a #

some :: Prod r e t a -> Prod r e t [a] #

many :: Prod r e t a -> Prod r e t [a] #

(IsString t, Eq t, (~) * a t) => IsString (Prod r e t a) Source #

String literals can be interpreted as Terminals that match that string.

>>> :set -XOverloadedStrings
>>> import Data.Text (Text)
>>> let determiner = "the" <|> "a" <|> "an" :: Prod r e Text Text

Methods

fromString :: String -> Prod r e t a #

Monoid (Prod r e t a) Source # 

Methods

mempty :: Prod r e t a #

mappend :: Prod r e t a -> Prod r e t a -> Prod r e t a #

mconcat :: [Prod r e t a] -> Prod r e t a #

terminal :: (t -> Maybe a) -> Prod r e t a Source #

Match a token for which the given predicate returns Just a, and return the a.

(<?>) :: Prod r e t a -> e -> Prod r e t a infixr 0 Source #

A named production (used for reporting expected things).

data Grammar r a Source #

A context-free grammar.

The type parameters are:

a: The return type of the grammar (often a Prod).

r for rule: The type of a non-terminal. This plays a role similar to the s in the type ST s a. Since the parser function expects the r to be universally quantified, there is not much to do with this parameter other than leaving it universally quantified.

Most of the functionality of Grammars is obtained through its instances, e.g. Monad and MonadFix. Note that GHC has syntactic sugar for MonadFix: use {-# LANGUAGE RecursiveDo #-} and mdo instead of do.

Instances

Monad (Grammar r) Source # 

Methods

(>>=) :: Grammar r a -> (a -> Grammar r b) -> Grammar r b #

(>>) :: Grammar r a -> Grammar r b -> Grammar r b #

return :: a -> Grammar r a #

fail :: String -> Grammar r a #

Functor (Grammar r) Source # 

Methods

fmap :: (a -> b) -> Grammar r a -> Grammar r b #

(<$) :: a -> Grammar r b -> Grammar r a #

MonadFix (Grammar r) Source # 

Methods

mfix :: (a -> Grammar r a) -> Grammar r a #

Applicative (Grammar r) Source # 

Methods

pure :: a -> Grammar r a #

(<*>) :: Grammar r (a -> b) -> Grammar r a -> Grammar r b #

(*>) :: Grammar r a -> Grammar r b -> Grammar r b #

(<*) :: Grammar r a -> Grammar r b -> Grammar r a #

rule :: Prod r e t a -> Grammar r (Prod r e t a) Source #

Create a new non-terminal by giving its production.

Derived operators

satisfy :: (t -> Bool) -> Prod r e t t Source #

Match a token that satisfies the given predicate. Returns the matched token.

token :: Eq t => t -> Prod r e t t Source #

Match a single token.

namedToken :: Eq t => t -> Prod r t t t Source #

Match a single token and give it the name of the token.

list :: Eq t => [t] -> Prod r e t [t] Source #

Match a list of tokens in sequence.

listLike :: (Eq t, ListLike i t) => i -> Prod r e t i Source #

Match a ListLike of tokens in sequence.

Deprecated operators

symbol :: Eq t => t -> Prod r e t t Source #

Deprecated: Use token instead

namedSymbol :: Eq t => t -> Prod r e t t Source #

Deprecated: Use namedToken instead

word :: Eq t => [t] -> Prod r e t [t] Source #

Deprecated: Use list or listLike instead

Parsing

data Report e i Source #

A parsing report, which contains fields that are useful for presenting errors to the user if a parse is deemed a failure. Note however that we get a report even when we successfully parse something.

Constructors

Report 

Fields

  • position :: Int

    The final position in the input (0-based) that the parser reached.

  • expected :: [e]

    The named productions processed at the final position.

  • unconsumed :: i

    The part of the input string that was not consumed, which may be empty.

Instances

(Eq i, Eq e) => Eq (Report e i) Source # 

Methods

(==) :: Report e i -> Report e i -> Bool #

(/=) :: Report e i -> Report e i -> Bool #

(Ord i, Ord e) => Ord (Report e i) Source # 

Methods

compare :: Report e i -> Report e i -> Ordering #

(<) :: Report e i -> Report e i -> Bool #

(<=) :: Report e i -> Report e i -> Bool #

(>) :: Report e i -> Report e i -> Bool #

(>=) :: Report e i -> Report e i -> Bool #

max :: Report e i -> Report e i -> Report e i #

min :: Report e i -> Report e i -> Report e i #

(Read i, Read e) => Read (Report e i) Source # 
(Show i, Show e) => Show (Report e i) Source # 

Methods

showsPrec :: Int -> Report e i -> ShowS #

show :: Report e i -> String #

showList :: [Report e i] -> ShowS #

data Result s e i a Source #

The result of a parse.

Constructors

Ended (Report e i)

The parser ended.

Parsed (ST s [a]) Int i (ST s (Result s e i a))

The parser parsed a number of as. These are given as a computation, ST s [a] that constructs the as when run. We can thus save some work by ignoring this computation if we do not care about the results. The Int is the position in the input where these results were obtained, the i the rest of the input, and the last component is the continuation.

Instances

Functor (Result s e i) Source # 

Methods

fmap :: (a -> b) -> Result s e i a -> Result s e i b #

(<$) :: a -> Result s e i b -> Result s e i a #

type Parser e i a = forall s. i -> ST s (Result s e i a) Source #

parser :: ListLike i t => (forall r. Grammar r (Prod r e t a)) -> Parser e i a Source #

Create a parser from the given grammar.

allParses :: Parser e i a -> i -> ([(a, Int)], Report e i) Source #

Return all parses from the result of a given parser. The result may contain partial parses. The Ints are the position at which a result was produced.

The elements of the returned list of results are sorted by their position in ascending order. If there are multiple results at the same position they are returned in an unspecified order.

fullParses :: ListLike i t => Parser e i a -> i -> ([a], Report e i) Source #

Return all parses that reached the end of the input from the result of a given parser.

If there are multiple results they are returned in an unspecified order.

Recognition

report :: Parser e i a -> i -> Report e i Source #

See e.g. how far the parser is able to parse the input string before it fails. This can be much faster than getting the parse results for highly ambiguous grammars.

Language generation

type Generator t a = forall s. ST s (Result s t a) Source #

generator :: (forall r. Grammar r (Prod r e t a)) -> [t] -> Generator t a Source #

Create a language generator for given grammar and list of allowed tokens.

language :: Generator t a -> [(a, [t])] Source #

Run a generator, returning all members of the language.

The members are returned as parse results paired with the list of tokens used to produce the result. The elements of the returned list of results are sorted by their length in ascending order. If there are multiple results of the same length they are returned in an unspecified order.

upTo :: Int -> Generator t a -> [(a, [t])] Source #

upTo n gen runs the generator gen, returning all members of the language that are of length less than or equal to n.

The members are returned as parse results paired with the list of tokens used to produce the result. The elements of the returned list of results are sorted by their length in ascending order. If there are multiple results of the same length they are returned in an unspecified order.

exactly :: Int -> Generator t a -> [(a, [t])] Source #

exactly n gen runs the generator gen, returning all members of the language that are of length equal to n.

The members are returned as parse results paired with the list of tokens used to produce the result. If there are multiple results they are returned in an unspecified order.