flatparse-0.4.1.0: High-performance parsing from strict bytestrings
Safe HaskellSafe-Inferred
LanguageHaskell2010

FlatParse.Basic

Description

Parser supporting custom error types and embeddable IO or ST actions, but no other bells and whistles.

If you need efficient indentation parsing, consider FlatParse.Stateful instead.

Synopsis

Parser types

newtype ParserT (st :: ZeroBitType) e a Source #

ParserT st e a is a parser with a state token type st, an error type e and a return type a. The different state token types support different embedded effects; see Parser, ParserIO and ParserST below.

Constructors

ParserT 

Fields

Instances

Instances details
MonadIO (ParserIO e) Source #

You may lift IO actions into a ParserIO using liftIO.

Instance details

Defined in FlatParse.Basic.Parser

Methods

liftIO :: IO a -> ParserIO e a #

Alternative (ParserT st e) Source #

By default, parser choice (<|>) arbitrarily backtracks on parser failure.

Instance details

Defined in FlatParse.Basic.Parser

Methods

empty :: ParserT st e a #

(<|>) :: ParserT st e a -> ParserT st e a -> ParserT st e a #

some :: ParserT st e a -> ParserT st e [a] #

many :: ParserT st e a -> ParserT st e [a] #

Applicative (ParserT st e) Source # 
Instance details

Defined in FlatParse.Basic.Parser

Methods

pure :: a -> ParserT st e a #

(<*>) :: ParserT st e (a -> b) -> ParserT st e a -> ParserT st e b #

liftA2 :: (a -> b -> c) -> ParserT st e a -> ParserT st e b -> ParserT st e c #

(*>) :: ParserT st e a -> ParserT st e b -> ParserT st e b #

(<*) :: ParserT st e a -> ParserT st e b -> ParserT st e a #

Functor (ParserT st e) Source # 
Instance details

Defined in FlatParse.Basic.Parser

Methods

fmap :: (a -> b) -> ParserT st e a -> ParserT st e b #

(<$) :: a -> ParserT st e b -> ParserT st e a #

Monad (ParserT st e) Source # 
Instance details

Defined in FlatParse.Basic.Parser

Methods

(>>=) :: ParserT st e a -> (a -> ParserT st e b) -> ParserT st e b #

(>>) :: ParserT st e a -> ParserT st e b -> ParserT st e b #

return :: a -> ParserT st e a #

MonadPlus (ParserT st e) Source # 
Instance details

Defined in FlatParse.Basic.Parser

Methods

mzero :: ParserT st e a #

mplus :: ParserT st e a -> ParserT st e a -> ParserT st e a #

type Parser = ParserT PureMode Source #

The type of pure parsers.

type ParserIO = ParserT IOMode Source #

The type of parsers which can embed IO actions.

type ParserST s = ParserT (STMode s) Source #

The type of parsers which can embed ST actions.

Running parsers

data Result e a Source #

Higher-level boxed data type for parsing results.

Constructors

OK a !ByteString

Contains return value and unconsumed input.

Fail

Recoverable-by-default failure.

Err !e

Unrecoverble-by-default error.

Instances

Instances details
Functor (Result e) Source # 
Instance details

Defined in FlatParse.Basic

Methods

fmap :: (a -> b) -> Result e a -> Result e b #

(<$) :: a -> Result e b -> Result e a #

(Show a, Show e) => Show (Result e a) Source # 
Instance details

Defined in FlatParse.Basic

Methods

showsPrec :: Int -> Result e a -> ShowS #

show :: Result e a -> String #

showList :: [Result e a] -> ShowS #

runParser :: Parser e a -> ByteString -> Result e a Source #

Run a parser.

runParserUtf8 :: Parser e a -> String -> Result e a Source #

Run a parser on a String, converting it to the corresponding UTF-8 bytes.

Reminder: OverloadedStrings for ByteString does not yield a valid UTF-8 encoding! For non-ASCII ByteString literal input, use this wrapper or properly convert your input first.

runParserIO :: ParserIO e a -> ByteString -> IO (Result e a) Source #

Run an IO-based parser.

runParserST :: (forall s. ParserST s e a) -> ByteString -> Result e a Source #

Run an ST-based parser.

Primitive result types

type Res# (st :: ZeroBitType) e a = (# st, ResI# e a #) Source #

Primitive parser result wrapped with a state token.

You should rarely need to manipulate values of this type directly. Use the provided bidirectional pattern synonyms OK#, Fail# and Err#.

pattern OK# :: (st :: ZeroBitType) -> a -> Addr# -> Res# st e a Source #

Res# constructor for a successful parse. Contains the return value and a pointer to the rest of the input buffer, plus a state token.

pattern Err# :: (st :: ZeroBitType) -> e -> Res# st e a Source #

Res# constructor for errors which are by default non-recoverable. Contains the error, plus a state token.

pattern Fail# :: (st :: ZeroBitType) -> Res# st e a Source #

Res# constructor for recoverable failure. Contains only a state token.

type ResI# e a = (# (# a, Addr# #) | (# #) | (# e #) #) Source #

Primitive parser result.

Embedding ST operations

liftST :: ST s a -> ParserST s e a Source #

Run an ST action in a ParserST.

UTF conversion

strToUtf8 :: String -> ByteString Source #

Convert an UTF8-encoded String to a ByteString.

utf8ToStr :: ByteString -> String Source #

Convert a ByteString to an UTF8-encoded String.

Character predicates

isDigit :: Char -> Bool Source #

isDigit c = '0' <= c && c <= '9'

isLatinLetter :: Char -> Bool Source #

isLatinLetter c = ('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z')

isGreekLetter :: Char -> Bool Source #

isGreekLetter c = ('Α' <= c && c <= 'Ω') || ('α' <= c && c <= 'ω')

Parsers

Bytewise

eof :: ParserT st e () Source #

Succeed if the input is empty.

take :: Int -> ParserT st e ByteString Source #

Read the given number of bytes as a ByteString.

Throws a runtime error if given a negative integer.

This does no copying. The ByteString returned is a "slice" of the input, and will keep it alive. To avoid this, use copy on the output.

take# :: Int# -> ParserT st e ByteString Source #

Read n# bytes as a ByteString. Fails if fewer than n# bytes are available.

Throws a runtime error if given a negative integer.

This does no copying. The ByteString returned is a "slice" of the input, and will keep it alive. To avoid this, use copy on the output.

takeUnsafe# :: Int# -> ParserT st e ByteString Source #

Read n# bytes as a ByteString. Fails if fewer than n# bytes are available.

Undefined behaviour if given a negative integer.

This does no copying. The ByteString returned is a "slice" of the input, and will keep it alive. To avoid this, use copy on the output.

takeRest :: ParserT st e ByteString Source #

Consume the rest of the input. May return the empty bytestring.

This does no copying. The ByteString returned is a "slice" of the input, and will keep it alive. To avoid this, use copy on the output.

skip :: Int -> ParserT st e () Source #

Skip forward n bytes. Fails if fewer than n bytes are available.

Throws a runtime error if given a negative integer.

skip# :: Int# -> ParserT st e () Source #

Skip forward n# bytes. Fails if fewer than n# bytes are available.

Throws a runtime error if given a negative integer.

skipBack :: Int -> ParserT st e () Source #

Go back i bytes in the input. Takes a positive integer.

Extremely unsafe. Makes no checks. Almost certainly a Bad Idea.

skipBack# :: Int# -> ParserT st e () Source #

Go back n# bytes. Takes a positive integer.

Extremely unsafe. Makes no checks. Almost certainly a Bad Idea.

atSkip# :: Int# -> ParserT st e a -> ParserT st e a Source #

Skip forward n# bytes and run the given parser. Fails if fewer than n# bytes are available.

Throws a runtime error if given a negative integer.

atSkipUnsafe# :: Int# -> ParserT st e r -> ParserT st e r Source #

Skip forward n bytes and run the given parser. Fails if fewer than n bytes are available.

Undefined behaviour if given a negative integer.

bytes :: [Word] -> Q Exp Source #

Read a sequence of bytes. This is a template function, you can use it as $(bytes [3, 4, 5]), for example, and the splice has type Parser e (). For a non-TH variant see byteString.

bytesUnsafe :: [Word] -> Q Exp Source #

Template function, creates a Parser e () which unsafely parses a given sequence of bytes.

The caller must guarantee that the input has enough bytes.

byteString :: ByteString -> ParserT st e () Source #

Parse a given ByteString.

If the bytestring is statically known, consider using bytes instead.

anyCString :: ParserT st e ByteString Source #

Read a null-terminated bytestring (a C-style string).

Consumes the null terminator.

anyVarintProtobuf :: ParserT st e Int Source #

Read a protobuf-style varint into a positive Int.

protobuf-style varints are byte-aligned. For each byte, the lower 7 bits are data and the MSB indicates if there are further bytes. Once fully parsed, the 7-bit payloads are concatenated and interpreted as a little-endian unsigned integer.

Fails if the varint exceeds the positive Int range.

Really, these are varnats. They also match with the LEB128 varint encoding.

protobuf encodes negatives in unsigned integers using zigzag encoding. See the fromZigzag family of functions for this functionality.

Further reading: https://developers.google.com/protocol-buffers/docs/encoding#varints

Combinators

(<|>) :: ParserT st e a -> ParserT st e a -> ParserT st e a infixr 6 Source #

Choose between two parsers. If the first parser fails, try the second one, but if the first one throws an error, propagate the error. This operation can arbitrarily backtrack.

Note: this exported operator has different fixity than the same operator in Applicative. Hide this operator if you want to use the Alternative version.

branch :: ParserT st e a -> ParserT st e b -> ParserT st e b -> ParserT st e b Source #

Branch on a parser: if the first argument succeeds, continue with the second, else with the third. This can produce slightly more efficient code than (<|>). Moreover, branch does not backtrack from the true/false cases.

notFollowedBy :: ParserT st e a -> ParserT st e b -> ParserT st e a Source #

Succeed if the first parser succeeds and the second one fails.

chainl :: (b -> a -> b) -> ParserT st e b -> ParserT st e a -> ParserT st e b Source #

An analogue of the list foldl function: first parse a b, then parse zero or more a-s, and combine the results in a left-nested way by the b -> a -> b function. Note: this is not the usual chainl function from the parsec libraries!

chainr :: (a -> b -> b) -> ParserT st e a -> ParserT st e b -> ParserT st e b Source #

An analogue of the list foldr function: parse zero or more a-s, terminated by a b, and combine the results in a right-nested way using the a -> b -> b function. Note: this is not the usual chainr function from the parsec libraries!

lookahead :: ParserT st e a -> ParserT st e a Source #

Save the parsing state, then run a parser, then restore the state.

ensure :: Int -> ParserT st e () Source #

Assert that there are at least n bytes remaining.

Undefined behaviour if given a negative integer.

ensure# :: Int# -> ParserT st e () Source #

Assert that there are at least n# bytes remaining.

Undefined behaviour if given a negative integer.

withEnsure :: Int -> ParserT st e r -> ParserT st e r Source #

Assert that there are at least n# bytes remaining (CPS).

Undefined behaviour if given a negative integer.

withEnsure1 :: ParserT st e r -> ParserT st e r Source #

Assert that there is at least 1 byte remaining (CPS).

Undefined behaviour if given a negative integer.

withEnsure# :: Int# -> ParserT st e r -> ParserT st e r Source #

Assert that there are at least n# bytes remaining (CPS).

Undefined behaviour if given a negative integer.

isolate :: Int -> ParserT st e a -> ParserT st e a Source #

isolate n p runs the parser p isolated to the next n bytes. All isolated bytes must be consumed.

Throws a runtime error if given a negative integer.

isolate# :: Int# -> ParserT st e a -> ParserT st e a Source #

isolate# n# p runs the parser p isolated to the next n# bytes. All isolated bytes must be consumed.

Throws a runtime error if given a negative integer.

isolateUnsafe# :: Int# -> ParserT st e a -> ParserT st e a Source #

isolateUnsafe# n# p runs the parser p isolated to the next n# bytes. All isolated bytes must be consumed.

Undefined behaviour if given a negative integer.

switch :: Q Exp -> Q Exp Source #

This is a template function which makes it possible to branch on a collection of string literals in an efficient way. By using switch, such branching is compiled to a trie of primitive parsing operations, which has optimized control flow, vectorized reads and grouped checking for needed input bytes.

The syntax is slightly magical, it overloads the usual case expression. An example:

    $(switch [| case _ of
        "foo" -> pure True
        "bar" -> pure False |])

The underscore is mandatory in case _ of. Each branch must be a string literal, but optionally we may have a default case, like in

    $(switch [| case _ of
        "foo" -> pure 10
        "bar" -> pure 20
        _     -> pure 30 |])

All case right hand sides must be parsers with the same type. That type is also the type of the whole switch expression.

A switch has longest match semantics, and the order of cases does not matter, except for the default case, which may only appear as the last case.

If a switch does not have a default case, and no case matches the input, then it returns with failure, without having consumed any input. A fallthrough to the default case also does not consume any input.

switchWithPost :: Maybe (Q Exp) -> Q Exp -> Q Exp Source #

Switch expression with an optional first argument for performing a post-processing action after every successful branch matching, not including the default branch. For example, if we have ws :: ParserT st e () for a whitespace parser, we might want to consume whitespace after matching on any of the switch cases. For that case, we can define a "lexeme" version of switch as follows.

  switch' :: Q Exp -> Q Exp
  switch' = switchWithPost (Just [| ws |])

Note that this switch' function cannot be used in the same module it's defined in, because of the stage restriction of Template Haskell.

rawSwitchWithPost :: Maybe (Q Exp) -> [(String, Q Exp)] -> Maybe (Q Exp) -> Q Exp Source #

Version of switchWithPost without syntactic sugar. The second argument is the list of cases, the third is the default case.

many :: Alternative f => f a -> f [a] #

Zero or more.

skipMany :: ParserT st e a -> ParserT st e () Source #

Skip a parser zero or more times.

some :: Alternative f => f a -> f [a] #

One or more.

skipSome :: ParserT st e a -> ParserT st e () Source #

Skip a parser one or more times.

Errors and failures

empty :: Alternative f => f a #

The identity of <|>

failed :: ParserT st e a Source #

The failing parser. By default, parser choice (<|>) arbitrarily backtracks on parser failure.

try :: ParserT st e a -> ParserT st e a Source #

Convert a parsing error into failure.

err :: e -> ParserT st e a Source #

Throw a parsing error. By default, parser choice (<|>) can't backtrack on parser error. Use try to convert an error to a recoverable failure.

fails :: ParserT st e a -> ParserT st e () Source #

Convert a parsing failure to a success.

cut :: ParserT st e a -> e -> ParserT st e a Source #

Convert a parsing failure to an error.

cutting :: ParserT st e a -> e -> (e -> e -> e) -> ParserT st e a Source #

Run the parser, if we get a failure, throw the given error, but if we get an error, merge the inner and the newly given errors using the e -> e -> e function. This can be useful for implementing parsing errors which may propagate hints or accummulate contextual information.

optional :: ParserT st e a -> ParserT st e (Maybe a) Source #

Convert a parsing failure to a Maybe. If possible, use withOption instead.

optional_ :: ParserT st e a -> ParserT st e () Source #

Convert a parsing failure to a ().

withOption :: ParserT st e a -> (a -> ParserT st e r) -> ParserT st e r -> ParserT st e r Source #

CPS'd version of optional. This is usually more efficient, since it gets rid of the extra Maybe allocation.

Position

newtype Pos Source #

Byte offset counted backwards from the end of the buffer. Note: the Ord instance for Pos considers the earlier positions to be smaller.

Constructors

Pos 

Fields

Instances

Instances details
Show Pos Source # 
Instance details

Defined in FlatParse.Common.Position

Methods

showsPrec :: Int -> Pos -> ShowS #

show :: Pos -> String #

showList :: [Pos] -> ShowS #

Eq Pos Source # 
Instance details

Defined in FlatParse.Common.Position

Methods

(==) :: Pos -> Pos -> Bool #

(/=) :: Pos -> Pos -> Bool #

Ord Pos Source # 
Instance details

Defined in FlatParse.Common.Position

Methods

compare :: Pos -> Pos -> Ordering #

(<) :: Pos -> Pos -> Bool #

(<=) :: Pos -> Pos -> Bool #

(>) :: Pos -> Pos -> Bool #

(>=) :: Pos -> Pos -> Bool #

max :: Pos -> Pos -> Pos #

min :: Pos -> Pos -> Pos #

endPos :: Pos Source #

The end of the input.

addrToPos# :: Addr# -> Addr# -> Pos Source #

Very unsafe conversion between a primitive address and a position. The first argument points to the end of the buffer, the second argument is being converted.

posToAddr# :: Addr# -> Pos -> Addr# Source #

Very unsafe conversion between a primitive address and a position. The first argument points to the end of the buffer.

data Span Source #

A pair of positions.

Constructors

Span !Pos !Pos 

Instances

Instances details
Show Span Source # 
Instance details

Defined in FlatParse.Common.Position

Methods

showsPrec :: Int -> Span -> ShowS #

show :: Span -> String #

showList :: [Span] -> ShowS #

Eq Span Source # 
Instance details

Defined in FlatParse.Common.Position

Methods

(==) :: Span -> Span -> Bool #

(/=) :: Span -> Span -> Bool #

unsafeSlice :: ByteString -> Span -> ByteString Source #

Slice into a ByteString using a Span. The result is invalid if the Span is not a valid slice of the first argument.

getPos :: ParserT st e Pos Source #

Get the current position in the input.

setPos :: Pos -> ParserT st e () Source #

Set the input position.

Warning: this can result in crashes if the position points outside the current buffer. It is always safe to setPos values which came from getPos with the current input.

spanOf :: ParserT st e a -> ParserT st e Span Source #

Return the consumed span of a parser.

withSpan :: ParserT st e a -> (a -> Span -> ParserT st e b) -> ParserT st e b Source #

Bind the result together with the span of the result. CPS'd version of spanOf for better unboxing.

byteStringOf :: ParserT st e a -> ParserT st e ByteString Source #

Return the ByteString consumed by a parser. Note: it's more efficient to use spanOf and withSpan instead.

withByteString :: ParserT st e a -> (a -> ByteString -> ParserT st e b) -> ParserT st e b Source #

CPS'd version of byteStringOf. Can be more efficient, because the result is more eagerly unboxed by GHC. It's more efficient to use spanOf or withSpan instead.

inSpan :: Span -> ParserT st e a -> ParserT st e a Source #

Run a parser in a given input Span.

The input position is restored after the parser is finished, so inSpan does not consume input and has no side effect.

Warning: this operation may crash if the given span points outside the current parsing buffer. It's always safe to use inSpan if the Span comes from a previous withSpan or spanOf call on the current input.

validPos :: ByteString -> Pos -> Bool Source #

Check whether a Pos points into a ByteString.

posLineCols :: ByteString -> [Pos] -> [(Int, Int)] Source #

Compute corresponding line and column numbers for each Pos in a list, assuming UTF8 encoding. Throw an error on invalid positions. Note: computing lines and columns may traverse the ByteString, but it traverses it only once regardless of the length of the position list.

mkPos :: ByteString -> (Int, Int) -> Pos Source #

Create a Pos from a line and column number. Throws an error on out-of-bounds line and column numbers.

Text

UTF-8

char :: Char -> Q Exp Source #

Parse a UTF-8 character literal. This is a template function, you can use it as $(char 'x'), for example, and the splice in this case has type Parser e ().

string :: String -> Q Exp Source #

Parse a UTF-8 string literal. This is a template function, you can use it as $(string "foo"), for example, and the splice has type Parser e ().

anyChar :: ParserT st e Char Source #

Parse any single Unicode character encoded using UTF-8 as a Char.

skipAnyChar :: ParserT st e () Source #

Skip any single Unicode character encoded using UTF-8.

satisfy :: (Char -> Bool) -> ParserT st e Char Source #

Parse a UTF-8 Char for which a predicate holds.

skipSatisfy :: (Char -> Bool) -> ParserT st e () Source #

Skip a UTF-8 Char for which a predicate holds.

fusedSatisfy :: (Char -> Bool) -> (Char -> Bool) -> (Char -> Bool) -> (Char -> Bool) -> ParserT st e Char Source #

This is a variant of satisfy which allows more optimization. We can pick four testing functions for the four cases for the possible number of bytes in the UTF-8 character. So in fusedSatisfy f1 f2 f3 f4, if we read a one-byte character, the result is scrutinized with f1, for two-bytes, with f2, and so on. This can result in dramatic lexing speedups.

For example, if we want to accept any letter, the naive solution would be to use isLetter, but this accesses a large lookup table of Unicode character classes. We can do better with fusedSatisfy isLatinLetter isLetter isLetter isLetter, since here the isLatinLetter is inlined into the UTF-8 decoding, and it probably handles a great majority of all cases without accessing the character table.

skipFusedSatisfy :: (Char -> Bool) -> (Char -> Bool) -> (Char -> Bool) -> (Char -> Bool) -> Parser e () Source #

Skipping variant of fusedSatisfy.

takeLine :: ParserT st e String Source #

Parse the rest of the current line as a String. Assumes UTF-8 encoding, throws an error if the encoding is invalid.

takeRestString :: ParserT st e String Source #

Take the rest of the input as a String. Assumes UTF-8 encoding.

linesUtf8 :: ByteString -> [String] Source #

Break an UTF-8-coded ByteString to lines. Throws an error on invalid input. This is mostly useful for grabbing specific source lines for displaying error messages.

ASCII

anyAsciiChar :: ParserT st e Char Source #

Parse any single ASCII character (a single byte) as a Char.

More efficient than anyChar for ASCII-only input.

skipAnyAsciiChar :: ParserT st e () Source #

Skip any single ASCII character (a single byte).

More efficient than skipAnyChar for ASCII-only input.

satisfyAscii :: (Char -> Bool) -> ParserT st e Char Source #

Parse an ASCII Char for which a predicate holds.

Assumption: the predicate must only return True for ASCII-range characters. Otherwise this function might read a 128-255 range byte, thereby breaking UTF-8 decoding.

skipSatisfyAscii :: (Char -> Bool) -> ParserT st e () Source #

Skip an ASCII Char for which a predicate holds. Assumption: the predicate must only return True for ASCII-range characters.

ASCII-encoded numbers

anyAsciiDecimalWord :: ParserT st e Word Source #

Parse a non-empty ASCII decimal digit sequence as a Word. Fails on overflow.

anyAsciiDecimalInt :: ParserT st e Int Source #

Parse a non-empty ASCII decimal digit sequence as a positive Int. Fails on overflow.

anyAsciiDecimalInteger :: ParserT st e Integer Source #

Parse a non-empty ASCII decimal digit sequence as a positive Integer.

anyAsciiHexWord :: ParserT st e Word Source #

Parse a non-empty, case-insensitive ASCII hexadecimal digit sequence as a Word. Fails on overflow.

anyAsciiHexInt :: ParserT st e Int Source #

Parse a non-empty, case-insensitive ASCII hexadecimal digit sequence as a positive Int. Fails on overflow.

Machine integers

Debugging parsers

traceLine :: ParserT st e String Source #

Parse the rest of the current line as a String, but restore the parsing state. Assumes UTF-8 encoding. This can be used for debugging.

traceRest :: ParserT st e String Source #

Get the rest of the input as a String, but restore the parsing state. Assumes UTF-8 encoding. This can be used for debugging.

Unsafe

unsafeSpanToByteString :: Span -> ParserT st e ByteString Source #

Create a ByteString from a Span.

The result is invalid if the Span points outside the current buffer, or if the Span start is greater than the end position.

IO

unsafeLiftIO :: IO a -> ParserT st e a Source #

Embed an IO action in a ParserT. This is slightly safer than unsafePerformIO because it will sequenced correctly with respect to the surrounding actions, and its execution is guaranteed.

Parsers

anyCStringUnsafe :: ParserT st e ByteString Source #

Read a null-terminated bytestring (a C-style string), where the bytestring is known to be null-terminated somewhere in the input.

Highly unsafe. Unless you have a guarantee that the string will be null terminated before the input ends, use anyCString instead. Honestly, I'm not sure if this is a good function to define. But here it is.

Fails on GHC versions older than 9.0, since we make use of the cstringLength# primop introduced in GHC 9.0, and we aren't very useful without it.

Consumes the null terminator.