Safe Haskell | Safe-Inferred |
---|---|
Language | Haskell2010 |
Data.Position
Contents
- data Position = Position {}
- type Advance s = RE s (Position -> Position)
- data Positioned a = Positioned a Position
- type PosRE s a = RE (Positioned s) (Positioned a)
- zeroPosition :: Position
- firstPosition :: Position
- emptyAdvance :: Advance s
- defaultAdvance :: Advance s
- psymAdvance :: (s -> Bool) -> (Position -> Position) -> Advance s
- symAdvance :: Eq s => s -> (Position -> Position) -> Advance s
- linecharAdvance :: Eq s => s -> Int -> Advance s
- stringAdvance :: Eq s => [s] -> (Position -> Position) -> Advance s
- newlineAdvance :: Eq s => [s] -> Advance s
- commonAdvance :: Int -> Bool -> Bool -> Bool -> Bool -> Advance Char
- (<++>) :: Advance s -> Advance s -> Advance s
- tryAdvance :: Advance s -> Position -> [s] -> (Position, [s])
- tryAdvanceC :: Advance s -> Position -> s -> Position
- advance :: Advance s -> Position -> [s] -> (Position, [s])
- advanceC :: Advance s -> Position -> s -> Position
- defaultAnnotate :: Position -> [s] -> [Positioned s]
- enrichOnce :: Advance s -> Position -> [s] -> ([Positioned s], Position, [s])
- enrichOnceD :: (Position -> [s] -> [Positioned s]) -> Advance s -> Advance s -> Position -> [s] -> ([Positioned s], Position, [s])
- enrich :: Advance s -> [s] -> ([Positioned s], Position)
- enrichD :: (Position -> [s] -> [Positioned s]) -> Advance s -> Advance s -> [s] -> ([Positioned s], Position, [s])
- bless :: RE s a -> PosRE s a
- tokens :: Advance s -> RE s a -> [s] -> ([Positioned a], Maybe (Positioned s))
- textInfo :: Advance s -> [s] -> (Int, Int, Int)
Types
Represents a position in a text. The intended usage is holding the next available position in a file. In other words: If a character would be appended to the file, what its position would be.
Constructors
Position | |
type Advance s = RE s (Position -> Position) Source
Represents an advancement of the next available position marker due to
reading a character. For example, the letter A moves forward by one column,
while linefeed ('\n'
) moves to the beginning of the next line.
The character type is a type parameter.
An advance includes a pattern and a change. The pattern determines to which characters, or character sequences, this advance applies. The change determines how to advance the position in the pattern is matched. It can also choose different advances depending on the match, e.g. "move 1 column if matched "a" and move 4 columns if matched "t".
data Positioned a Source
A value with a position attached.
Constructors
Positioned a Position |
Instances
Eq a => Eq (Positioned a) | |
Show a => Show (Positioned a) |
type PosRE s a = RE (Positioned s) (Positioned a) Source
Applicative regex (Text.Regex.Applicative) which takes position-tagged symbols and returns a position-tagged result.
Special Positions
zeroPosition :: Position Source
The position before the first character in a file, to be used as an initial value before reading actual characters.
firstPosition :: Position Source
The position of the first character in a file.
Special Advances
emptyAdvance :: Advance s Source
The zero advance. It doesn't match any input and doesn't consume any characters. Applying it doesn't change the position.
defaultAdvance :: Advance s Source
The default advance when reading a character, e.g. a letter or a digit. The new character would have column number higher by 1, and character index higher by once (advances by 1 for each character read). The pattern accepts any single character.
Creating Advances
psymAdvance :: (s -> Bool) -> (Position -> Position) -> Advance s Source
Create an advance for a single character based on a predicate.
symAdvance :: Eq s => s -> (Position -> Position) -> Advance s Source
Create an advance for the given character.
Create an advance for a line character with the specified width. This is mainly useful for tabs and perhaps the various space characters in Unicode. Example for tab:
tabAdv = linecharAdvance '\t' 8
stringAdvance :: Eq s => [s] -> (Position -> Position) -> Advance s Source
Create an advance for the given character sequence.
newlineAdvance :: Eq s => [s] -> Advance s Source
Create an advance for a character or sequence of characters expressing a newline, i.e. starting a new line. As the advance expresses the position after the character, applying the advance results with a position at column 1.
Arguments
:: Int | Tab width (usually 2, 4 or 8) |
-> Bool | Whether carriage return (CR) counts as a newline |
-> Bool | Whether linefeed (LF) counts as a newline |
-> Bool | Whether the sequence CR LF counts as a newline |
-> Bool | Whether formfeed (FF) counts as a newline |
-> Advance Char |
Create a set of common advances supporting tabs and newlines. More
advances can easily be added by |
ing them to the result. The result
doesn't include the default advance.
(<++>) :: Advance s -> Advance s -> Advance s infixl 4 Source
Concatenate two advances into a single advance accepting their patterns
in order, and applying the advances on top of each other. For example,
concatenating an advance for
and an advance for a
results with an
advance accepting b
"ab"
and moving the position 2 columns forward.
Applying Advances
tryAdvance :: Advance s -> Position -> [s] -> (Position, [s]) Source
Given a list of remaining characters to read, the next position in the file and a set of advance rules, try to consume characters once and determine what is the next position after reading them. Example:
>>>
tryAdvance defaultAdvance (Position 1 1 1) "abc"
(Position 1 2 2,"bc")
If there is no match, it returns the input position and the input list, i.e. no characters will be consumed.
tryAdvanceC :: Advance s -> Position -> s -> Position Source
Like tryAdvance
, but reads one character at most. In the general case
you'll want to use tryAdvance
, because tryAdvanceC
breaks chains. For
example, while tryAdvance
can recognize "rn"
as a single newline,
tryAdvanceC
will consume only the '\r'
, splitting the string into 2
newlines.
If there is no match, the input position is returned.
advance :: Advance s -> Position -> [s] -> (Position, [s]) Source
Given a list of remaining characters to read, the next position in the file and a set of advance rules, consume characters once and determine what is the next position after reading them.
The defaultAdvance
is appended (using <|>
) to the given advance.
Therefore, if the given list isn't empty, at leat character will be
consumed. The intended use is to encode all the special cases (tab,
newlines, non-spacing marks, etc.) in the given advance, and let the
defaultAdvance
catch the rest.
advanceC :: Advance s -> Position -> s -> Position Source
Like advance
, but reads exactly one character. Patterns which require
more than one character fail to match. Like tryAdvanceC
, but has the
defaultAdvance
appended, which means is always consumes given a non-empty
list.
Utilities Based on Advances
defaultAnnotate :: Position -> [s] -> [Positioned s] Source
Given the next position and a list matched there, annotate the symbols with position information. For a single character, it is simply the given position. For a sequence, this annotation assigns all the symbols the same line and column, incrementing only the character index.
>>>
defaultAnnotate (Position 1 1 1) "a"
[Positioned 'a' (Position 1 1 1)]
>>>
defaultAnnotate (Position 1 1 1) "\r\n"
[Positioned '\r' (Position 1 1 1), Positioned '\n' (Position 1 1 2)]
The last example would give the same positions to any list of the same
length, e.g. "ab"
instead of "rn"
.
enrichOnce :: Advance s -> Position -> [s] -> ([Positioned s], Position, [s]) Source
Given an advance rule, the next available position and a symbol list,
consume symbols once. Return a list of them, annotated with position
information, as well as the next position and the rest of the input.
On empty input, return []
, the given position and the input list.
If more than one character is matched, the sequence is annotated with consecutive character indices, but with the same line and column.
>>>
enrichOnce (newlineAdvance "\r\n") (Position 1 1 1) "\r\nhello"
( [ Positioned '\r' (Position 1 1 1) , Positioned '\n' (Position 1 1 2) ] , Position 2 1 3 , "hello" )
Arguments
:: (Position -> [s] -> [Positioned s]) | annotation function |
-> Advance s | default advance |
-> Advance s | advance rule |
-> Position | initial position |
-> [s] | input list |
-> ([Positioned s], Position, [s]) |
Given an advance rule, the next available position and a symbol list, try
to consume symbols once. If consumed, return a list of them, annotated with
position information, as well as the next position and the rest of the
input. Otherwise, return []
, the given position and the input list.
If more than one character is matched, the sequence is annotated using the function passed as the first parameter.
>>>
let ann = defaultAnnotate; adv = empty
>>>
enrichOnceD ann adv (newlineAdvance "\r\n") (Position 1 1 1) "\r\nhello"
( [ Positioned '\r' (Position 1 1 1) , Positioned '\n' (Position 1 1 2) ] , Position 2 1 3 , "hello" )
enrich :: Advance s -> [s] -> ([Positioned s], Position) Source
Given a list of symbols, annotate it with position based on advance rules. Each symbol is annotated with its position in the text. In addition to the annotated list, the next available position is returned (i.e. the position of the next symbol, if another symbol were appended to the list).
>>>
enrich defaultAdvance "abc"
( [ Positioned 'a' (Position 1 1 1)) , Positioned 'b' (Position 1 2 2)) ] , Position 1 3 3 )
It is implemented using the defaultAdvance
as a default, i.e. the entire
list is always consumed.
enrichD :: (Position -> [s] -> [Positioned s]) -> Advance s -> Advance s -> [s] -> ([Positioned s], Position, [s]) Source
Like enrich
, but takes an annotation function as the first parameter,
and a default advance as the second parameter. The rest of the parameters
are the same ones enrich
takes. It allows using custom defaults. To have
no default advance, pass empty
.
Since a match of the whole list isn't guaranteed, there is an additional
list in the return type, containing the rest of the input. If the entire
input is matched, that list will be []
. If no input is matched at all,
the annotated list is []
, the position is firstPosition
and the
additional list (rest of input) is the input list.
bless :: RE s a -> PosRE s a Source
Given a regex, create an equivalent position-aware regex. The resulting regex reads position-tagged symbols, and returns a position-tagged result.
Arguments
:: Advance s | Advance rule for position tagging, e.g. made with
|
-> RE s a | Regex which selects and returns a single token |
-> [s] | Input list of symbols |
-> ([Positioned a], Maybe (Positioned s)) | List of tokens matched. If the entire input was
matched, the second element is |
Tokenize an input list and get list of tokens. If there was an error (no regex match), get the text position at which it happened.
textInfo :: Advance s -> [s] -> (Int, Int, Int) Source
Get some numbers describing the given text (list of symbols):
- The total number of lines
- The length (number of columns) of the last line
- The total number of characters
Note that this probably isn't the fastest implementation. It's possible to compute directly by counting the lines and the characters. This function is here anyway, as a demonstration of using this library.
>>>
let adv = commonAdvance 4 True True True True
>>>
textInfo adv "Hello world!\nHow are you?\nWonderful!"
(3,11,36)