FormalGrammars-0.2.0.0: (Context-free) grammars in formal language theory

Safe HaskellNone
LanguageHaskell2010

FormalLanguage.CFG.Grammar.Types

Description

The data types that define a CFG.

Synopsis

Documentation

data Index Source

Encode the index of the syntactic or terminal variable.

In case of grammar-based indexing, keep indexRange empty. The indexStep keeps track of any +k / -k given in the production rules.

We allow indexing terminals now, too. When glueing together terminals, one might want to be able to differentiate between terminals.

Constructors

Index 

Instances

newtype SymbolName Source

Newtype wrapper for symbol names.

Constructors

SymbolName 

Fields

_getSteName :: String
 

newtype Tape Source

The tape, a terminal operates on. Terminals on different tapes could still have the same SymbolName but different type and input!

Constructors

Tape 

Fields

_getTape :: Int
 

Instances

data SynTermEps Source

Symbols, potentially with an index or more than one.

Constructors

SynVar

Syntactic variables.

Fields

_name :: SymbolName
 
_index :: [Index]
 
SynTerm 

Fields

_name :: SymbolName
 
_index :: [Index]
 
Term

Regular old terminal symbol -- reads stuff from the input.

Fields

_name :: SymbolName
 
_index :: [Index]
 
Deletion

This sym denotes the case, where we have an Deletion terminal, i.e. something is matched to nothing. This is actually just a regular terminal symbol, we just treat it differently.

Epsilon

Finally, a real epsilon. Again, these are somewhat regular terminal symbols, but it is important to be able to recognize these, when trying to create outside variants of our algorithms.

newtype Symbol Source

The length of the list encodes the dimension of the symbol. Forms a monoid over dimensional concatenation.

Constructors

Symbol 

newtype AttributeFunction Source

The name of an attribute function

Constructors

Attr 

Fields

_getAttr :: String
 

data Rule Source

Production rules for at-most CFGs.

Constructors

Rule 

Fields

_lhs :: Symbol

the left-hand side of the rule

_attr :: [AttributeFunction]

the attribute for this rule

_rhs :: [Symbol]

the right-hand side with a collection of terminals and syntactic variables

Instances

data Grammar Source

Complete descrition of a grammar. In principle it would be enough to hold _rules and the _start symbol name. We also store dimensionless names for syntactiv variables, and terminals. This makes certain checks easier or possible.

We store all single-tape symbol names dimensionless. This means that, for terminals, symbols with the same name have the same tape. This is slightly inconvenient for special applications (say Protein-DNA alignment) but one can easily rename terminals.

TODO better way to handle indexed symbols?

Constructors

Grammar 

Fields

_synvars :: Map SymbolName SynTermEps

regular syntactic variables, without dimension

_synterms :: Map SymbolName SynTermEps

Terminal synvars are somewhat weird. They are used in Outside grammars, and hold previously calculated inside values.

_termvars :: Map SymbolName SynTermEps

regular terminal symbols

_outside :: DerivedGrammar

Is this an automatically derived outside grammar

_rules :: Set Rule

set of production rules

_start :: Symbol

start symbol

_params :: Map String Index

any global variables

_grammarName :: String

grammar name

_write :: Bool

some grammar file requested this grammar to be expanded into code -- TODO remove, we have an emission queue