| Copyright | (c) Dong Han 2017-2018 | 
|---|---|
| License | BSD | 
| Maintainer | winterland1989@gmail.com | 
| Stability | experimental | 
| Portability | non-portable | 
| Safe Haskell | Safe-Inferred | 
| Language | Haskell2010 | 
Z.Data.Parser
Description
This module provide a simple resumable Parser, which is suitable for binary protocol and simple textual protocol parsing. Both binary parsers (decodePrim ,etc) and textual parsers are provided, and they all work on Bytes.
You can use Alternative instance to do backtracking, each branch will either succeed and may consume some input, or fail without consume anything. It's recommend to use peek or peekMaybe to avoid backtracking if possible to get high performance.
Error message can be attached using <?>, which have very small overhead, so it's recommended to attach a message in front of a composed parser like xPacket = "Foo.Bar.xPacket" ? do ..., following is an example message when parsing an integer failed:
    >parse int "foo"
    ([102,111,111],Left ["Z.Data.Parser.Numeric.int","Std.Data.Parser.Base.takeWhile1: no satisfied byte"])
    -- It's easy to see we're trying to match a leading sign or digit here
Use parser-combinators to get combinators based on
Applicative or Monad instance, such as manyTill, sepBy, etc.
Synopsis
- data Result e r
- type ParseError = [Text]
- data Parser a
- (<?>) :: Text -> Parser a -> Parser a
- parse :: Parser a -> Bytes -> (Bytes, Either ParseError a)
- parse' :: Parser a -> Bytes -> Either ParseError a
- parseChunk :: Parser a -> Bytes -> Result ParseError a
- parseChunkList :: Parser a -> [Bytes] -> ([Bytes], Either ParseError a)
- type ParseChunks m err x = m Bytes -> Bytes -> m (Bytes, Either err x)
- parseChunks :: Monad m => (Bytes -> Result e a) -> ParseChunks m e a
- finishParsing :: Result ParseError a -> (Bytes, Either ParseError a)
- runAndKeepTrack :: Parser a -> Parser (Result ParseError a, Bytes)
- match :: Parser a -> Parser (Bytes, a)
- ensureN :: Int -> Text -> Parser ()
- endOfInput :: Parser ()
- atEnd :: Parser Bool
- currentChunk :: Parser Bytes
- decodePrim :: forall a. Unaligned a => Parser a
- newtype BE a = BE {- getBE :: a
 
- newtype LE a = LE {- getLE :: a
 
- decodePrimLE :: forall a. Unaligned (LE a) => Parser a
- decodePrimBE :: forall a. Unaligned (BE a) => Parser a
- scan :: s -> (s -> Word8 -> Maybe s) -> Parser (Bytes, s)
- scanChunks :: forall s. s -> (s -> Bytes -> Either s (Bytes, Bytes, s)) -> Parser (Bytes, s)
- peekMaybe :: Parser (Maybe Word8)
- peek :: Parser Word8
- satisfy :: (Word8 -> Bool) -> Parser Word8
- satisfyWith :: (Word8 -> a) -> (a -> Bool) -> Parser a
- anyWord8 :: Parser Word8
- word8 :: Word8 -> Parser ()
- char8 :: Char -> Parser ()
- anyChar8 :: Parser Char
- anyCharUTF8 :: Parser Char
- charUTF8 :: Char -> Parser ()
- char7 :: Char -> Parser ()
- anyChar7 :: Parser Char
- skipWord8 :: Parser ()
- endOfLine :: Parser ()
- skip :: Int -> Parser ()
- skipWhile :: (Word8 -> Bool) -> Parser ()
- skipSpaces :: Parser ()
- take :: Int -> Parser Bytes
- takeN :: (Word8 -> Bool) -> Int -> Parser Bytes
- takeTill :: (Word8 -> Bool) -> Parser Bytes
- takeWhile :: (Word8 -> Bool) -> Parser Bytes
- takeWhile1 :: (Word8 -> Bool) -> Parser Bytes
- takeRemaining :: Parser Bytes
- takeUTF8 :: Int -> Parser Text
- bytes :: Bytes -> Parser ()
- bytesCI :: Bytes -> Parser ()
- text :: Text -> Parser ()
- uint :: forall a. (Integral a, Bounded a) => Parser a
- int :: forall a. (Integral a, Bounded a) => Parser a
- integer :: Parser Integer
- uint_ :: forall a. (Integral a, Bounded a) => Parser a
- int_ :: (Integral a, Bounded a) => Parser a
- digit :: Parser Int
- hex :: forall a. (Integral a, FiniteBits a) => Parser a
- hex' :: forall a. (Integral a, FiniteBits a) => Parser a
- hex_ :: (Integral a, Bits a) => Parser a
- rational :: Fractional a => Parser a
- float :: Parser Float
- double :: Parser Double
- scientific :: Parser Scientific
- scientifically :: (Scientific -> a) -> Parser a
- rational' :: Fractional a => Parser a
- float' :: Parser Float
- double' :: Parser Double
- scientific' :: Parser Scientific
- scientifically' :: (Scientific -> a) -> Parser a
- day :: Parser Day
- localTime :: Parser LocalTime
- timeOfDay :: Parser TimeOfDay
- timeZone :: Parser (Maybe TimeZone)
- utcTime :: Parser UTCTime
- zonedTime :: Parser ZonedTime
- uuid :: Parser UUID
- decodeUUID :: Parser UUID
- fail' :: Text -> Parser a
- failWithInput :: (Bytes -> Text) -> Parser a
- unsafeLiftIO :: IO a -> Parser a
- decodeWord :: Parser Word
- decodeWord64 :: Parser Word64
- decodeWord32 :: Parser Word32
- decodeWord16 :: Parser Word16
- decodeWord8 :: Parser Word8
- decodeInt :: Parser Int
- decodeInt64 :: Parser Int64
- decodeInt32 :: Parser Int32
- decodeInt16 :: Parser Int16
- decodeInt8 :: Parser Int8
- decodeDouble :: Parser Double
- decodeFloat :: Parser Float
- decodeWordLE :: Parser Word
- decodeWord64LE :: Parser Word64
- decodeWord32LE :: Parser Word32
- decodeWord16LE :: Parser Word16
- decodeIntLE :: Parser Int
- decodeInt64LE :: Parser Int64
- decodeInt32LE :: Parser Int32
- decodeInt16LE :: Parser Int16
- decodeDoubleLE :: Parser Double
- decodeFloatLE :: Parser Float
- decodeWordBE :: Parser Word
- decodeWord64BE :: Parser Word64
- decodeWord32BE :: Parser Word32
- decodeWord16BE :: Parser Word16
- decodeIntBE :: Parser Int
- decodeInt64BE :: Parser Int64
- decodeInt32BE :: Parser Int32
- decodeInt16BE :: Parser Int16
- decodeDoubleBE :: Parser Double
- decodeFloatBE :: Parser Float
Parser types
Simple parsing result, that represent respectively:
- Success: the remaining unparsed data and the parsed value
- Failure: the remaining unparsed data and the error message
- Partial: that need for more input data, supply empty bytes to indicate endOfInput
type ParseError = [Text] Source #
Type alias for error message
Simple CPSed parser
A parser takes a failure continuation, and a success one, while the success continuation is
 usually composed by Monad instance, the failure one is more like a reader part, which can
 be modified via <?>. If you build parsers from ground, a pattern like this can be used:
   xxParser = do
     ensureN errMsg ...            -- make sure we have some bytes
     Parser $  kf k s inp ->      -- fail continuation, success continuation, state token and input
       ...
       ... kf errMsg (if input not OK)
       ... k s ... (if we get something useful for next parser)
 Running a parser
parse :: Parser a -> Bytes -> (Bytes, Either ParseError a) Source #
Parse the complete input, without resupplying, return the rest bytes
parse' :: Parser a -> Bytes -> Either ParseError a Source #
Parse the complete input, without resupplying
parseChunk :: Parser a -> Bytes -> Result ParseError a Source #
Parse an input chunk
parseChunkList :: Parser a -> [Bytes] -> ([Bytes], Either ParseError a) Source #
Parse the complete input list, without resupplying, return the rest bytes list.
Parsers in Z.Data.Parser will take empty as EOF, so please make sure there are no emptys
 mixed into the chunk list.
type ParseChunks m err x = m Bytes -> Bytes -> m (Bytes, Either err x) Source #
Type alias for a streaming parser, draw chunk from Monad m with a initial chunk,
 return result in Either err x.
parseChunks :: Monad m => (Bytes -> Result e a) -> ParseChunks m e a Source #
Run a chunk parser with an initial input string, and a monadic action that can supply more input if needed.
finishParsing :: Result ParseError a -> (Bytes, Either ParseError a) Source #
Finish parsing and fetch result, feed empty bytes if it's Partial result.
runAndKeepTrack :: Parser a -> Parser (Result ParseError a, Bytes) Source #
match :: Parser a -> Parser (Bytes, a) Source #
Return both the result of a parse and the portion of the input that was consumed while it was being parsed.
Basic parsers
ensureN :: Int -> Text -> Parser () Source #
Ensure that there are at least n bytes available. If not, the
 computation will escape with Partial.
Since this parser is used in many other parsers, an extra error param is provide to attach custom error info.
endOfInput :: Parser () Source #
Test whether all input has been consumed, i.e. there are no remaining
 undecoded bytes. Fail if not atEnd.
Test whether all input has been consumed, i.e. there are no remaining undecoded bytes.
currentChunk :: Parser Bytes Source #
Get current input chunk, draw new chunk if neccessary. null means EOF.
Note this is different from takeRemaining, currentChunk only return what's
 left in current input chunk.
Primitive decoders
decodePrim :: forall a. Unaligned a => Parser a Source #
Decode a primitive type in host byte order.
big endianess wrapper
Instances
little endianess wrapper
Instances
decodePrimLE :: forall a. Unaligned (LE a) => Parser a Source #
Decode a primitive type in little endian.
decodePrimBE :: forall a. Unaligned (BE a) => Parser a Source #
Decode a primitive type in big endian.
More parsers
scan :: s -> (s -> Word8 -> Maybe s) -> Parser (Bytes, s) Source #
A stateful scanner.  The predicate consumes and transforms a
 state argument, and each transformed state is passed to successive
 invocations of the predicate on each byte of the input until one
 returns Nothing or the input ends.
This parser does not fail.  It will return an empty string if the
 predicate returns Nothing on the first byte of input.
scanChunks :: forall s. s -> (s -> Bytes -> Either s (Bytes, Bytes, s)) -> Parser (Bytes, s) Source #
Similar to scan, but working on Bytes chunks, The predicate
 consumes a Bytes chunk and transforms a state argument,
 and each transformed state is passed to successive invocations of
 the predicate on each chunk of the input until one chunk got splited to
 Right (V.Bytes, V.Bytes) or the input ends.
Note the fields of result triple will not be forced by scanChunks, you may need to add seq or strict annotation to
 avoid thunks and unintentional references to buffer.
peekMaybe :: Parser (Maybe Word8) Source #
Match any byte, to perform lookahead. Returns Nothing if end of
 input has been reached. Does not consume any input.
Match any byte, to perform lookahead. Does not consume any input, but will fail if end of input has been reached.
satisfy :: (Word8 -> Bool) -> Parser Word8 Source #
The parser satisfy p succeeds for any byte for which the
 predicate p returns True. Returns the byte that is actually
 parsed.
digit = satisfy isDigit
    where isDigit w = w >= 48 && w <= 57satisfyWith :: (Word8 -> a) -> (a -> Bool) -> Parser a Source #
The parser satisfyWith f p transforms a byte, and succeeds if
 the predicate p returns True on the transformed value. The
 parser returns the transformed byte that was parsed.
anyCharUTF8 :: Parser Char Source #
Decode next few bytes as an UTF8 char.
Don't use this method as UTF8 decoder, it's slower than validate.
endOfLine :: Parser () Source #
Match either a single newline byte '\n', or a carriage
 return followed by a newline byte "\r\n".
skipWhile :: (Word8 -> Bool) -> Parser () Source #
Skip past input for as long as the predicate returns True.
skipSpaces :: Parser () Source #
Skip over white space using isSpace.
takeN :: (Word8 -> Bool) -> Int -> Parser Bytes Source #
Similar to take, but requires the predicate to succeed on next N bytes
 of input, and take N bytes(no matter if N+1 byte satisfy predicate or not).
takeTill :: (Word8 -> Bool) -> Parser Bytes Source #
Consume input as long as the predicate returns False or reach the end of input,
 and return the consumed input.
takeWhile :: (Word8 -> Bool) -> Parser Bytes Source #
Consume input as long as the predicate returns True or reach the end of input,
 and return the consumed input.
takeUTF8 :: Int -> Parser Text Source #
Take N bytes and validate as UTF8, failed if not UTF8 encoded.
Numeric parsers
Decimal
uint :: forall a. (Integral a, Bounded a) => Parser a Source #
Parse and decode an unsigned decimal number.
Will fail in case of overflow.
int :: forall a. (Integral a, Bounded a) => Parser a Source #
Parse a decimal number with an optional leading '+' or '-' sign
 character.
This parser will fail if overflow happens.
uint_ :: forall a. (Integral a, Bounded a) => Parser a Source #
Same with uint, but sliently cast in case of overflow.
int_ :: (Integral a, Bounded a) => Parser a Source #
Same with int, but sliently cast if overflow happens.
Hex
hex :: forall a. (Integral a, FiniteBits a) => Parser a Source #
Parse and decode an unsigned hex number, fail if input length is larger than (bit_size/4). The hex digits
 'a' through 'f' may be upper or lower case.
This parser does not accept a leading "0x" string, and consider
 sign bit part of the binary hex nibbles, e.g.
>>>parse' hex "FF" == Right (-1 :: Int8)>>>parse' hex "7F" == Right (127 :: Int8)>>>parse' hex "7Ft" == Right (127 :: Int8)>>>parse' hex "7FF" == Left ["Z.Data.Parser.Numeric.hex","hex numeric number overflow"]
hex' :: forall a. (Integral a, FiniteBits a) => Parser a Source #
Same with hex, but only take as many as (bit_size/4) bytes.
>>>parse' hex "FF" == Right (-1 :: Int8)>>>parse' hex "7F" == Right (127 :: Int8)>>>parse' hex "7Ft" == Right (127 :: Int8)>>>parse' hex "7FF" == Right (127 :: Int8)
hex_ :: (Integral a, Bits a) => Parser a Source #
Same with hex, but silently cast in case of overflow.
>>>parse' hex "FF" == Right (-1 :: Int8)>>>parse' hex "7F" == Right (127 :: Int8)>>>parse' hex "7Ft" == Right (127 :: Int8)>>>parse' hex "7FF" == Right (-1 :: Int8)
Fractional
rational :: Fractional a => Parser a Source #
Parse a rational number.
The syntax accepted by this parser is the same as for double.
Note: this parser is not safe for use with inputs from untrusted
 sources.  An input with a suitably large exponent such as
 "1e1000000000" will cause a huge Integer to be allocated,
 resulting in what is effectively a denial-of-service attack.
In most cases, it is better to use double or scientific
 instead.
double :: Parser Double Source #
Parse a rational number and round to Double.
This parser accepts an optional leading sign character, followed by
 at least one decimal digit.  The syntax similar to that accepted by
 the read function, with the exception that a trailing '.' or
 'e' not followed by a number is not consumed.
Examples with behaviour identical to read:
parse' double "3"     == ("", Right 3.0)
parse' double "3.1"   == ("", Right 3.1)
parse' double "3e4"   == ("", Right 30000.0)
parse' double "3.1e4" == ("", Right 31000.0)parse' double ".3"    == (".3", Left ParserError)
parse' double "e3"    == ("e3", Left ParserError)Examples of differences from read:
parse' double "3.foo" == (".foo", Right 3.0)
parse' double "3e"    == ("e",    Right 3.0)
parse' double "-3e"   == ("e",    Right -3.0)This function does not accept string representations of "NaN" or "Infinity".
scientific :: Parser Scientific Source #
Parse a scientific number.
The syntax accepted by this parser is the same as for double.
scientifically :: (Scientific -> a) -> Parser a Source #
Parse a scientific number and convert to result using a user supply function.
The syntax accepted by this parser is the same as for double.
Stricter fractional(rfc8259)
rational' :: Fractional a => Parser a Source #
Parse a rational number.
The syntax accepted by this parser is the same as for double'.
Note: this parser is not safe for use with inputs from untrusted
 sources.  An input with a suitably large exponent such as
 "1e1000000000" will cause a huge Integer to be allocated,
 resulting in what is effectively a denial-of-service attack.
In most cases, it is better to use double' or scientific'
 instead.
double' :: Parser Double Source #
More strict number parsing(rfc8259).
scientific support parse 2314. and 21321exyz without eating extra dot or e via
 backtrack, this is not allowed in some strict grammer such as JSON, so we make an
 non-backtrack strict number parser separately using LL(1) lookahead. This parser also
 agree with read on extra dot or e handling:
parse' double "3.foo" == Left ParseError parse' double "3e" == Left ParseError
Leading zeros or + sign is also not allowed:
parse' double "+3.14" == Left ParseError parse' double "0014" == Left ParseError
If you have a similar grammer, you can use this parser to save considerable time.
     number = [ minus ] int [ frac ] [ exp ]
     decimal-point = %x2E       ; .
     digit1-9 = %x31-39         ; 1-9
     e = %x65 / %x45            ; e E
     exp = e [ minus / plus ] 1*DIGIT
     frac = decimal-point 1*DIGIT
This function does not accept string representations of "NaN" or "Infinity". reference: https://tools.ietf.org/html/rfc8259#section-6
scientific' :: Parser Scientific Source #
Parse a scientific number.
The syntax accepted by this parser is the same as for double'.
scientifically' :: (Scientific -> a) -> Parser a Source #
Parse a scientific number and convert to result using a user supply function.
The syntax accepted by this parser is the same as for double'.
Time
Parse a date of the form [+,-]YYYY-MM-DD.
Invalid date(leap year rule violation, etc.) will be rejected.
localTime :: Parser LocalTime Source #
Parse a date and time, of the form YYYY-MM-DD HH:MM[:SS[.SSS]].
 The space may be replaced with a T.  The number of seconds is optional
 and may be followed by a fractional component.
timeZone :: Parser (Maybe TimeZone) Source #
Parse a time zone, and return Nothing if the offset from UTC is
 zero. (This makes some speedups possible.)
utcTime :: Parser UTCTime Source #
Behaves as zonedTime, but converts any time zone offset into a -- UTC time.
zonedTime :: Parser ZonedTime Source #
Parse a date with time zone info. Acceptable formats:
YYYY-MM-DD HH:MM Z YYYY-MM-DD HH:MM:SS Z YYYY-MM-DD HH:MM:SS.SSS Z
The first space may instead be a T, and the second space is
 optional.  The Z represents UTC.  The Z may be replaced with a
 time zone offset of the form +0000 or -08:00, where the first
 two digits are hours, the : is optional and the second two digits
 (also optional) are minutes.
UUID
Parse texutal UUID bytes(lower or upper-cased), e.g. 550e8400-e29b-41d4-a716-446655440000
decodeUUID :: Parser UUID Source #
Decode binary UUID(two 64-bits word in big-endian), as described in RFC 4122.
Misc
failWithInput :: (Bytes -> Text) -> Parser a Source #
Similar to fail`, but can produce error message with current input chunk.
unsafeLiftIO :: IO a -> Parser a Source #
Specialized primitive parser
decodeWord :: Parser Word Source #
decodeInt8 :: Parser Int8 Source #
decodeIntLE :: Parser Int Source #
decodeIntBE :: Parser Int Source #