Text.ParserCombinators.UU.Core

Contents

Provides
Eof
Location
The type describing parsers: P
- Parsers are functors: fmap
- Parsers are Applicative: <*>, <*, *> and pure
- Parsers are Alternative: <|> and empty
- An alternative for the Alternative, which is greedy: <<|>
- Parsers can recognise single tokens: pSym and pSymExt
- Parsers are Monads: >>= and return
Additional useful combinators
Maintaining Progress Information
Auxiliary functions and types
- Checking for non-sensical combinations: must_be_non_empty and must_be_non_empties
- The type Nat for describing the minimal number of tokens consumed

Description

The module Core contains the basic functionality of the parser library. It uses the breadth-first module to realise online generation of results, the error correction administration, dealing with ambigous grammars; it defines the types of the elementary parsers and recognisers involved.For typical use cases of the libray see the module Text.ParserCombinators.UU.Examples

Synopsis

`Provides`

class Provides state symbol token | state symbol -> token whereSource

The function splitState playes a crucial role in splitting up the state. The symbol parameter tells us what kind of thing, and even which value of that kind, is expected from the input. The state and and the symbol type together determine what kind of token has to be returned. Since the function is overloaded we do not have to invent all kind of different names for our elementary parsers.

Methods

splitState :: symbol -> (token -> state -> Steps a) -> state -> Steps aSource

Instances

(Eq a, Show a, IsLocationUpdatedBy loc a) => Provides (Str a loc) a a
(Show a, Eq a, IsLocationUpdatedBy loc [a]) => Provides (Str a loc) (Token a) [a]
(Show a, IsLocationUpdatedBy loc [a]) => Provides (Str a loc) (Munch a) [a]
(Ord a, Show a, IsLocationUpdatedBy loc a) => Provides (Str a loc) (a, a) a
(Show a, IsLocationUpdatedBy loc a) => Provides (Str a loc) (a -> Bool, String, a) a

`Eof`

class Eof state whereSource

Methods

eof :: state -> Bool Source

deleteAtEnd :: state -> Maybe (Cost, state)Source

Instances

Show a => Eof (Str a loc)

`Location`

class IsLocationUpdatedBy loc a whereSource

The input state may contain a location which can be used in error messages. Since we do not want to fix our input to be just a String we provide an interface which can be used to advance the location by passing its information in the function splitState

Methods

advance :: loc -> a -> locSource

Instances

IsLocationUpdatedBy (Int, Int) Char
IsLocationUpdatedBy (Int, Int) String

The type describing parsers: `P`

data P st a Source

Constructors

P (forall r. (a -> st -> Steps r) -> st -> Steps r) (forall r. (st -> Steps r) -> st -> Steps (a, r)) (forall r. (st -> Steps r) -> st -> Steps r) Nat (Maybe a)

Instances

Monad (P st)
Functor (P state)
MonadPlus (P st)
Applicative (P state)
Alternative (P state)

Parsers are functors: `fmap`

Parsers are Applicative: `<>`, `<`, `*>` and `pure`

Parsers are Alternative: `<|>` and `empty`

An alternative for the Alternative, which is greedy: `<<|>`

Parsers can recognise single tokens: `pSym` and `pSymExt`

pSymExt :: Provides state symbol token => Nat -> Maybe token -> symbol -> P state tokenSource

Many parsing libraries do not make a distinction between the terminal symbols of the language recognised and the tokens actually constructed from the input. This happens e.g. if we want to recognise an integer or an identifier: we are also interested in which integer occurred in the input, or which identifier. The function pSymExt takes as argument a value of some type symbol, and returns a value of type token. The parser will in general depend on some state which is maintained holding the input. The functional dependency fixes the token type, based on the symbol type and the type of the parser p. Since pSymExt is overloaded both the type and the value of symbol determine how to decompose the input in a token and the remaining input. pSymExt takes two extra parameters: one describing the minimal numer of tokens recognised, and the second whether the symbol can recognise the empty string and the value which is to be returned in that case

pSym :: Provides state symbol token => symbol -> P state tokenSource

pSym covers the most common case of recognsiing a symbol: a single token is removed form the input, and it cannot recognise the empty string

Parsers are Monads: `>>=` and `return`

Additional useful combinators

Controlling the text of error reporting: `<?>`

(<?>) :: P state a -> String -> P state aSource

The parsers build a list of symbols which are expected at a specific point. This list is used to report errors. Quite often it is more informative to get e.g. the name of the non-terminal. The <?> combinator replaces this list of symbols by it's righ-hand side argument.

Parsers can be disambiguated using micro-steps: `micro`

Dealing with (non-empty) Ambigous parsers: `amb`

amb :: P st a -> P st [a]Source

Parse errors can be retreived from the state: `pErrors`

class Stores state error | state -> error whereSource

getErrors retreives the correcting steps made since the last time the function was called. The result can, using a monad, be used to control how to-- proceed with the parsing process.

Methods

getErrors :: state -> ([error], state)Source

Instances

Stores (Str a loc) (Error loc)

pErrors :: Stores st error => P st [error]Source

The current position can be retreived from the state: `pPos`

class HasPosition state pos | state -> pos whereSource

pPos retreives the correcting steps made since the last time the function was called. The result can, using a monad, be used to control how to-- proceed with the parsing process.

Methods

getPos :: state -> posSource

Instances

HasPosition (Str a loc) loc

pPos :: HasPosition st pos => P st posSource

Starting and finalising the parsing process: `pEnd` and `parse`

pEnd :: (Stores st error, Eof st) => P st [error]Source

The function pEnd should be called at the end of the parsing process. It deletes any unsonsumed input, and reports its preence as an eror.

parse :: Eof t => P t a -> t -> aSource

The state may be temporarily change type: `pSwitch`

pSwitch :: (st1 -> (st2, st2 -> st1)) -> P st2 a -> P st1 aSource

pSwitch takes the current state and modifies it to a different type of state to which its argument parser is applied. The second component of the result is a function which converts the remaining state of this parser back into a valuee of the original type.

Maintaining Progress Information

type Cost = Int Source

The data type Steps is the core data type around which the parsers are constructed. It is a describes a tree structure of streams containing (in an interleaved way) both the online result of the parsing process, and progress information. Recognising an input token should correspond to a certain amount of Progress, which tells how much of the input state was consumed. The Progress is used to implement the breadth-first search process, in which alternatives are examined in a more-or-less synchonised way. The meaning of the various Step constructors is as follows:

Step: A token was succesfully recognised, and as a result the input was advanced by the distance Progress
Apply: The type of value represented by the Steps changes by applying the function parameter.
Fail: A correcting step has to made to the input; the first parameter contains information about what was expected in the input, and the second parameter describes the various corrected alternatives, each with an associated Cost
Micro: A small cost is inserted in the sequence, which is used to disambiguate. Use with care!

The last two alternatives play a role in recognising ambigous non-terminals. For a full description see the technical report referred to from the README file..

type Progress = Int Source

type Strings = [String]Source

data Steps a whereSource

Constructors

Step :: Progress -> Steps a -> Steps a
Apply :: forall a b. (b -> a) -> Steps b -> Steps a
Fail :: Strings -> [Strings -> (Cost, Steps a)] -> Steps a
Micro :: Cost -> Steps a -> Steps a
End_h :: ([a], [a] -> Steps r) -> Steps (a, r) -> Steps (a, r)
End_f :: [Steps a] -> Steps a -> Steps a

eval :: Steps a -> aSource

push :: v -> Steps r -> Steps (v, r)Source

apply :: Steps (b -> a, (b, r)) -> Steps (a, r)Source

pushapply :: (b -> a) -> Steps (b, r) -> Steps (a, r)Source

norm :: Steps a -> Steps aSource

best :: Steps a -> Steps a -> Steps aSource

best' :: Steps b -> Steps b -> Steps bSource

getCheapest :: Int -> [(Int, Steps a)] -> Steps aSource

traverse :: Int -> Steps a -> Int -> Int -> Int Source

removeEnd_h :: Steps (a, r) -> Steps rSource

removeEnd_f :: Steps r -> Steps [r]Source

Auxiliary functions and types

Checking for non-sensical combinations: `must_be_non_empty` and `must_be_non_empties`

must_be_non_empty :: [Char] -> P t t1 -> t2 -> t2Source

The function checks wehther its second argument is a parser which can recognise the mety sequence. If so an error message is given using the name of the context. If not then the third argument is returned. This is useful in testing for loogical combinations. For its use see the module Text>parserCombinators.UU.Derived

must_be_non_empties :: [Char] -> P t1 t -> P t3 t2 -> t4 -> t4Source

This function is similar to the above, but can be used in situations where we recognise a sequence of elements separated by other elements. This does not make sense if both parsers can recognise the empty string. Your grammar is then highly ambiguous.

The type `Nat` for describing the minimal number of tokens consumed

data Nat Source

The data type Nat is used to represent the minimal length of a parser. Care should be taken in order to not evaluate the right hand side of the binary functions nat_min and `nat-add` more than necesssary.

Constructors

Zero
Succ Nat
Infinite

Instances

Show Nat

module Control.Applicative

Provides

Eof

Location

The type describing parsers: P

Parsers are functors: fmap

Parsers are Applicative: <*>, <*, *> and pure

Parsers are Alternative: <|> and empty

An alternative for the Alternative, which is greedy: <<|>

Parsers can recognise single tokens: pSym and pSymExt

Parsers are Monads: >>= and return

Additional useful combinators

Controlling the text of error reporting: <?>

Parsers can be disambiguated using micro-steps: micro

Dealing with (non-empty) Ambigous parsers: amb

Parse errors can be retreived from the state: pErrors

The current position can be retreived from the state: pPos

Starting and finalising the parsing process: pEnd and parse

The state may be temporarily change type: pSwitch

Maintaining Progress Information

Auxiliary functions and types

Checking for non-sensical combinations: must_be_non_empty and must_be_non_empties

The type Nat for describing the minimal number of tokens consumed

`Provides`

`Eof`

`Location`

The type describing parsers: `P`

Parsers are functors: `fmap`

Parsers are Applicative: `<>`, `<`, `*>` and `pure`

Parsers are Alternative: `<|>` and `empty`

An alternative for the Alternative, which is greedy: `<<|>`

Parsers can recognise single tokens: `pSym` and `pSymExt`

Parsers are Monads: `>>=` and `return`

Controlling the text of error reporting: `<?>`

Parsers can be disambiguated using micro-steps: `micro`

Dealing with (non-empty) Ambigous parsers: `amb`

Parse errors can be retreived from the state: `pErrors`

The current position can be retreived from the state: `pPos`

Starting and finalising the parsing process: `pEnd` and `parse`

The state may be temporarily change type: `pSwitch`

Checking for non-sensical combinations: `must_be_non_empty` and `must_be_non_empties`

The type `Nat` for describing the minimal number of tokens consumed