Copyright | (c) 2017 Johannes Hildén |
---|---|
License | BSD3 |
Maintainer | johannes@isomorphic.co |
Stability | experimental |
Portability | GHC |
Safe Haskell | Safe |
Language | Haskell2010 |
Background
In formal language theory, the Dyck language consists of all strings of
evenly balanced left and right parentheses, brackets, or some other
symbols, together with the empty word. Words in this language (named
after German mathematician Walther von Dyck) are known as Dyck words,
some examples of which are ()()()
, (())((()))
, and ((()()))()
.
The type of Dyck language considered here is defined over a binary alphabet. We will take this alphabet to be the set Σ = {(, )} in the following examples. The binary Dyck language is the subset of Σ* (the Kleene closure of Σ) of all words that satisfy two conditions:
- The number of left brackets must be the same as the number of right brackets.
- Going from left to right, for each character read, the total number of right brackets visited must be less than or equal to the number of left brackets up to the current position.
E.g., (()(()
and ())(())()
are not Dyck words.
When regarded as a combinatorial class—with the size
of a word defined as
the number of bracket pairs it contains—the counting sequence associated
with the Dyck language is the Catalan numbers.
\[ \begin{array}{ccl} \text{Size} & \text{Count} & \text{Words} \\ 0 & 1 & \epsilon \\ 1 & 1 & ⟨⟩ \\ 2 & 2 & ⟨⟩⟨⟩, \ ⟨⟨⟩⟩ \\ 3 & 5 & ⟨⟩⟨⟩⟨⟩, \ ⟨⟩⟨⟨⟩⟩, \ ⟨⟨⟩⟩⟨⟩, \ ⟨⟨⟩⟨⟩⟩, \ ⟨⟨⟨⟩⟩⟩ \\ 4 & 14 & ⟨⟩⟨⟩⟨⟩⟨⟩, \ ⟨⟩⟨⟩⟨⟨⟩⟩, \ ⟨⟩⟨⟨⟩⟩⟨⟩, \ ⟨⟩⟨⟨⟩⟨⟩⟩, \ ⟨⟩⟨⟨⟨⟩⟩⟩, \ ⟨⟨⟩⟩⟨⟩⟨⟩, \ ⟨⟨⟩⟩⟨⟨⟩⟩, \ ⟨⟨⟩⟨⟩⟩⟨⟩, \\ & & ⟨⟨⟨⟩⟩⟩⟨⟩, \ ⟨⟨⟩⟨⟩⟨⟩⟩, \ ⟨⟨⟩⟨⟨⟩⟩⟩, \ ⟨⟨⟨⟩⟩⟨⟩⟩, \ ⟨⟨⟨⟩⟨⟩⟩⟩, \ ⟨⟨⟨⟨⟩⟩⟩⟩ \\ 5 & 42 & ⟨⟩⟨⟩⟨⟩⟨⟩⟨⟩, \ ⟨⟩⟨⟩⟨⟩⟨⟨⟩⟩, \ ⟨⟩⟨⟩⟨⟨⟩⟩⟨⟩, \ ⟨⟩⟨⟩⟨⟨⟩⟨⟩⟩, \ ⟨⟩⟨⟩⟨⟨⟨⟩⟩⟩, \ \dots, \ ⟨⟨⟨⟨⟨⟩⟩⟩⟩⟩ \\ 6 & 132 & \dots \end{array} \]
- type Size = Integer
- type Rank = Integer
- data DyckWord
- empty :: DyckWord
- size :: DyckWord -> Size
- setAlphabet :: Char -> Char -> DyckWord -> DyckWord
- concatWords :: DyckWord -> DyckWord -> DyckWord
- fromText :: Text -> Either String DyckWord
- fromText' :: Text -> DyckWord
- toText :: DyckWord -> Text
- rank :: DyckWord -> Rank
- rankRelative :: DyckWord -> Rank
- unrank :: Rank -> DyckWord
- unrankRelative :: Size -> Rank -> Maybe DyckWord
- unrankRelative' :: Size -> Rank -> DyckWord
- wordsOfSize :: Size -> [DyckWord]
Types
Opaque Dyck word type. For functions to build Dyck words, see:
empty
,fromText
,fromText'
,unrank
,unrankRelative
, andunrankRelative'
.
Conceptually, every non-empty Dyck word has the form (a)b, where a and b are Dyck words. The BNF grammar for this language is given by \( \omega = \epsilon \mid ( \omega ) \omega \).
Eq DyckWord Source # | Two Dyck words are considered equal when they have the same absolute rank.
|
Ord DyckWord Source # | Dyck words of the same size are ordered lexicographically, and a smaller Dyck word always gets precedence over a bigger one.
|
Show DyckWord Source # | |
Monoid DyckWord Source # | The Dyck language forms a monoid under |
Creating and inspecting Dyck words
size :: DyckWord -> Size Source #
The size of a Dyck word is the number of bracket pairs it contains, i.e., half of the length of the word's string representation. Inductively, it can be defined as
\[ \text{size}\;w = \begin{cases} 0 && \text{if} \; w = \epsilon \\ 1 + \text{size}\;u + \text{size}\;v && \text{if} \; w = (u)v. \end{cases} \]
Change the alphabet of a Dyck word. An alphabet has two characters, here referred to as the left and right symbol, respectively. For example:
>>>
toText (setAlphabet 'x' 'o' (unrank 55))
xoxxoxxooo
concatWords :: DyckWord -> DyckWord -> DyckWord Source #
Concatenate two Dyck words. Corresponds to ordinary string concatenation.
\[ \begin{align} \epsilon +\!\!\!+\;w &= w \\ (u)v +\!\!\!+\;w &= (u)[ v +\!\!\!+\; w ] \end{align} \]
If the two words have different alphabets, the concatenated word will inherit the first word's symbol set.
Textual form
Ranking and unranking
To rank a Dyck word is to determine its position in some ordered sequence of words. The dual of this—to produce the Dyck word corresponding to a position in said sequence—is called unranking. The order we are interested in here is known as shortlex, and it demands that a smaller Dyck word always gets precedence over a bigger one. When comparing words of the same size, normal lexicographical order applies.
Relative vs. absolute rank
In this library, we adopt the following (non-standard) terminology.
The relative rank of a Dyck word w is its position in the sequence of
those words with the same size as w. By contrast, the absolute rank
means a word's position in the shortlex sequence of all Dyck words.
For example: The (absolute) rank of (((())))
is 9, but the relative rank
of the same word is 0, since it is the first word of size four.
\[ \begin{array}{lccc} \text{Word} & \text{Size} & \text{Rank} & \text{Relative rank} \\ \epsilon & 0 & 0 & 0 \\ ⟨⟩ & 1 & 1 & 0 \\ ⟨⟨⟩⟩ & 2 & 2 & 0 \\ ⟨⟩⟨⟩ & 2 & 3 & 1 \\ ⟨⟨⟨⟩⟩⟩ & 3 & 4 & 0 \\ ⟨⟨⟩⟨⟩⟩ & 3 & 5 & 1 \\ ⟨⟨⟩⟩⟨⟩ & 3 & 6 & 2 \\ ⟨⟩⟨⟨⟩⟩ & 3 & 7 & 3 \\ ⟨⟩⟨⟩⟨⟩ & 3 & 8 & 4 \\ ⟨⟨⟨⟨⟩⟩⟩⟩ & 4 & 9 & 0 \\ ⟨⟨⟨⟩⟨⟩⟩⟩ & 4 & 10 & 1 \\ \;\;\;\; \vdots & \vdots & \vdots & \vdots \end{array} \]
If we let \(r(w)\) denote the relative rank of a word \(w\), and \(C_i\) the \(i^{th}\) Catalan number, then the absolute rank \(R(w)\) of \(w\) is given by the formula \( R(w) = r(w) + \sum_{i=0}^{s-1} C_i, \) where \(s\) is the size of \(w\).
rank :: DyckWord -> Rank Source #
Return the absolute rank of a Dyck word. That is, its position in the (shortlex) ordered sequence of all Dyck words.
rankRelative :: DyckWord -> Rank Source #
Return the relative rank of a Dyck word.
unrank :: Rank -> DyckWord Source #
Return the n-th Dyck word in the (shortlex) ordered sequence of all Dyck words.
unrankRelative' :: Size -> Rank -> DyckWord Source #
Unsafe version of unrankRelative
. This function throws an error if the
given rank is outside of the valid range.
wordsOfSize :: Size -> [DyckWord] Source #
Return a lexicographically ordered list with all Dyck words of a specific size.