json-stream-0.4.5.0: Incremental applicative JSON parser
LicenseBSD-style
Maintainerpalkovsky.ondrej@gmail.com
Stabilityexperimental
Portabilityportable
Safe HaskellSafe-Inferred
LanguageHaskell2010

Data.JsonStream.Parser

Description

An incremental applicative-style JSON parser, suitable for high performance memory efficient stream parsing.

The parser is optionally using Data.Aeson types and FromJSON instance, it can be easily combined with aeson monadic parsing instances when appropriate.

Synopsis

How to use this library

>>> parseByteString value "[1,2,3]" :: [[Int]]
[[1,2,3]]

The value parser matches any FromJSON value. The above command is essentially identical to the aeson decode function; the parsing process can generate more objects, therefore the results is [a].

Example of json-stream style parsing:

>>> parseByteString (arrayOf integer) "[1,2,3]" :: [Int]
[1,2,3]

Parsers can be combinated using <*> and <|> operators. The parsers are run in parallel and return combinations of the parsed values.

>>> let text = "[{\"name\": \"John\", \"age\": 20}, {\"age\": 30, \"name\": \"Frank\"} ]"
>>> let parser = arrayOf $ (,) <$> "name" .: string <*> "age"  .: integer
>>> parseByteString  parser text :: [(T.Text,Int)]
[("John",20),("Frank",30)]

When parsing larger values, it is advisable to use lazy ByteStrings. The parsing is then more memory efficient as less lexical state is needed to be held in memory for parallel parsers.

More examples are available on https://github.com/ondrap/json-stream.

Performance

The parser tries to do the least amount of work to get the job done, skipping over items that are not required. General guidelines to get best performance:

Do not use the value parser for the whole object if the object is big. Consider using the objectOf parser to parse objects instead of direct applicative parsing for creating objects if they have lots of records. Every <*> outside of direct objectOf parser causes parallel parsing. Too many parallel parsers kill performance.

arrayOf $ objectOf $ MyStructure <$> "field1" .: string <*> "field2" .: integer <*> .... <*> "field20" .: string

will be the fastest and use the least memory.

arrayOf value :: Parser MyStructure -- MyStructure with FromJSON instance

will probably still behave better than

arrayOf $ MyStructure <$> "field1" .: string <*> "field2" .: integer <*> .... <*> "field20" .: string

and also better (at least memory-wise) than

value :: Parser [MyStructure]

unless the structure has hundreths of fields and you are parsing only a substructure.

The integer parser was optimized in such a way that the integer numbers skip the conversion to Scientific, resulting in a slightly faster speed.

It is possible to use the *> operator to filter objects based on a condition, e.g.:

arrayOf $ id <$> "error" .: number
              *> "name" .: string

This will return all objects that contain attribute error with number content. The parser will skip trying to decode the name attribute if error is not found.

Constant space decoding

Constant space decoding is possible if the grammar does not specify non-constant operations. The non-constant operations are value, string, many and in some instances <*>.

The value parser works by creating an aeson AST and passing it to the parseJSON method. The AST can consume a lot of memory before it is rejected in parseJSON. To achieve constant space the parsers safeString, objectOf, number, integer, real and bool must be used; these parsers reject and do not parse data if it does not match the type.

The object key length is limited to ~64K. Object records with longer key are ignored and unparsed.

Numbers are limited to 200.000 digits. Longer numbers will make the parsing fail.

The many parser works by accumulating all matched values. Obviously, number of such values influences the amount of used memory.

The <*> operator runs both parsers in parallel and when they are both done, it produces combinations of the received values. It is constant-space as long as the number of element produced by child parsers is limited by a constant. This can be achieved by using .! and .: functions combined with constant space parsers or limiting the number of returned elements with takeI.

Running many parallel parsers (e.g. when parsing objects with a lot of fields) will slow things done. Use the objectOf.

If the source object contains an object with multiple keys with a same name, json-stream matches the key multiple times. The only exception is objectWithKey (.: and .:?) that return at most one value for a given key.

Aeson compatibility

The parser uses internally Data.Aeson types, so that the FromJSON instances are directly usable with the value parser. It may be more convenient to parse the outer structure with json-stream and the inner objects with aeson as long as constant-space decoding is not required.

Json-stream defines the object-access operators .:, .:? but in a slightly different albeit more natural way. New operators are .! for array access and .| to handle missing values.

>>> let test = "[{\"name\": \"test1\", \"value\": 1}, {\"name\": \"test2\", \"value\": null}, {\"name\": \"test3\"}]"
>>> let person = (,) <$> "name" .: string <*> "value" .: integer .| (-1)
>>> let people = arrayOf person
>>> parseByteString people test :: [(T.Text, Int)]
[("test1",1),("test2",-1),("test3",-1)]

The above code would run 3 parsers in parallel to get the appropriate results. You can use the objectOf parser to get a similar result in a more performant way.

>>> let test = "[{\"name\": \"test1\", \"value\": 1}, {\"name\": \"test2\", \"value\": null}, {\"name\": \"test3\"}]"
>>> let person = objectOf $ (,) <$> "name" .: string <*> "value" .: integer .| (-1)
>>> let people = arrayOf person
>>> parseByteString people test :: [(T.Text, Int)]
[("test1",1),("test2",-1),("test3",-1)]

The Parser type

data Parser a Source #

A representation of the parser.

Instances

Instances details
Alternative Parser Source #

Match items from the first parser, if none is matched, return items from the second parser. Constant-space if second parser returns constant number of items. .| is implemented using this operator.

>>> let json = "[{\"key1\": [1,2], \"key2\": [5,6], \"key3\": [8,9]}]"
>>> let parser = arrayOf $ "key1" .: (arrayOf value) <|> "key2" .: (arrayOf value)
>>> parseByteString parser json :: [Int]
[1,2]
>>> let parser = arrayOf $ "key-non" .: (arrayOf value) <|> "key2" .: (arrayOf value)
>>> parseByteString parser json :: [Int]
[5,6]

many - Gather matches and return them as list.

>>> let json = "[{\"keys\":[1,2], \"values\":[5,6]}, {\"keys\":[9,8], \"values\":[7,6]}]"
>>> let parser = arrayOf $ (,) <$> many ("keys" .: arrayOf integer) <*> many ("values" .: arrayOf integer)
>>> parseByteString parser json :: [([Int], [Int])]
[([1,2],[5,6]),([9,8],[7,6])]
Instance details

Defined in Data.JsonStream.Parser

Methods

empty :: Parser a #

(<|>) :: Parser a -> Parser a -> Parser a #

some :: Parser a -> Parser [a] #

many :: Parser a -> Parser [a] #

Applicative Parser Source #

<*> will run both parsers in parallel and combine results.

It behaves as a list functor (produces all combinations), but the typical use is:

>>> :set -XOverloadedStrings
>>> let text = "[{\"name\": \"John\", \"age\": 20}, {\"age\": 30, \"name\": \"Frank\"}]"
>>> let parser = arrayOf $ (,) <$> "name" .: string <*> "age"  .: integer
>>> parseByteString parser text :: [(T.Text,Int)]
[("John",20),("Frank",30)]
Instance details

Defined in Data.JsonStream.Parser

Methods

pure :: a -> Parser a #

(<*>) :: Parser (a -> b) -> Parser a -> Parser b #

liftA2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c #

(*>) :: Parser a -> Parser b -> Parser b #

(<*) :: Parser a -> Parser b -> Parser a #

Functor Parser Source # 
Instance details

Defined in Data.JsonStream.Parser

Methods

fmap :: (a -> b) -> Parser a -> Parser b #

(<$) :: a -> Parser b -> Parser a #

Monoid (Parser a) Source #

<> will run both parsers in parallel yielding from both as the data comes

>>> :m +Data.Monoid
>>> let test = "[{\"key1\": [1,2], \"key2\": [5,6], \"key3\": [8,9]}]"
>>> let parser = arrayOf $ "key1" .: (arrayOf value) <> "key2" .: (arrayOf value)
>>> parseByteString parser test :: [Int]
[1,2,5,6]
Instance details

Defined in Data.JsonStream.Parser

Methods

mempty :: Parser a #

mappend :: Parser a -> Parser a -> Parser a #

mconcat :: [Parser a] -> Parser a #

Semigroup (Parser a) Source # 
Instance details

Defined in Data.JsonStream.Parser

Methods

(<>) :: Parser a -> Parser a -> Parser a #

sconcat :: NonEmpty (Parser a) -> Parser a #

stimes :: Integral b => b -> Parser a -> Parser a #

data ParseOutput a Source #

Result of parsing. Contains continuations to continue parsing.

Constructors

ParseYield a (ParseOutput a)

Returns a value from a parser.

ParseNeedData (ByteString -> ParseOutput a)

Parser needs more data to continue parsing.

ParseFailed String

Parsing failed, error is reported.

ParseDone ByteString

Parsing finished, unparsed data is returned.

Parsing functions

runParser :: Parser a -> ParseOutput a Source #

Run streaming parser, immediately returns ParseNeedData.

runParser' :: Parser a -> ByteString -> ParseOutput a Source #

Run streaming parser with initial input.

parseByteString :: Parser a -> ByteString -> [a] Source #

Parse a bytestring, generate lazy list of parsed values. If an error occurs, throws an exception.

>>> parseByteString (arrayOf integer) "[1,2,3,4]" :: [Int]
[1,2,3,4]
>>> parseByteString (arrayOf ("name" .: string)) "[{\"name\":\"KIWI\"}, {\"name\":\"BIRD\"}]"
["KIWI","BIRD"]

parseLazyByteString :: Parser a -> ByteString -> [a] Source #

Parse a lazy bytestring, generate lazy list of parsed values. If an error occurs, throws an exception.

Aeson in-place replacement functions

decode :: FromJSON a => ByteString -> Maybe a Source #

Deserialize a JSON value from lazy ByteString.

If this fails due to incomplete or invalid input, Nothing is returned.

The input must consist solely of a JSON document, with no trailing data except for whitespace.

eitherDecode :: FromJSON a => ByteString -> Either String a Source #

Like decode but returns an error message when decoding fails.

decodeStrict :: FromJSON a => ByteString -> Maybe a Source #

Like decode, but on strict ByteString

FromJSON parser

value :: FromJSON a => Parser a Source #

Match FromJSON value. Equivalent to valueWith parseJSON.

>>> let json = "[{\"key1\": [1,2], \"key2\": [5,6]}]"
>>> parseByteString (arrayOf value) json :: [AE.Value]
[Object (fromList [("key2",Array [Number 5.0,Number 6.0]),("key1",Array [Number 1.0,Number 2.0])])]

valueWith :: (Value -> Parser a) -> Parser a Source #

Match values with a Parser. Returns values for which the given parser succeeds.

string :: Parser Text Source #

Parse string value, skip parsing otherwise.

byteString :: Parser ByteString Source #

Parse raw bytestring value (json string expected), skip parsing otherwise. The returned value is not unescaped.

Constant space parsers

safeString :: Int -> Parser Text Source #

Stops parsing string after the limit is reached. The string will not be matched if it exceeds the size. The size is the size of escaped string including escape characters.

number :: Parser Scientific Source #

Parse number, return in scientific format.

integer :: forall i. (Integral i, Bounded i) => Parser i Source #

Parse to bounded integer type (not Integer). If you are using integer numbers, use this parser. It skips the conversion JSON -> Scientific -> Int and uses an Int directly.

real :: RealFloat a => Parser a Source #

Parse to float/double.

bool :: Parser Bool Source #

Parse bool, skip if the type is not bool.

jNull :: Parser () Source #

Match a null value.

safeByteString :: Int -> Parser ByteString Source #

Stops parsing string after the limit is reached. The string will not be matched if it exceeds the size. The size is the size of escaped string including escape characters. The return value is not unescaped.

Structure operators

(.:) :: OnObject o a => Text -> Parser a -> o a infixr 7 Source #

Synonym for objectWithKey. The .: operators can be chained.

>>> let json = "{\"key1\": {\"nested-key\": 3}}"
>>> parseByteString ("key1" .: "nested-key" .: integer) json :: [Int]
> [3]

It works both as a standalone parser and as a part of objectOf parser

>>> let test = "[{\"name\": \"test1\", \"value\": 1}, {\"name\": \"test2\", \"value\": null}, {\"name\": \"test3\"}]"
>>> let person = objectOf $ (,) <$> "name" .: string <*> "value" .: integer .| (-1)
>>> let people = arrayOf person
>>> parseByteString people test :: [(T.Text, Int)]
[("test1",1),("test2",-1),("test3",-1)]

(.:?) :: OnObject o a => Text -> Parser a -> o (Maybe a) infixr 7 Source #

Returns Nothing if value is null or does not exist or match. Otherwise returns Just value.

key .:? val = optional (key .: val)

It could be similarly used in the objectOf parser

(.|) :: OnObject o a => o a -> a -> o a infixl 6 Source #

Return default value if the parsers on the left hand didn't produce a result.

p .| defval = p <|> pure defval

The operator works on complete left side, the following statements are equal:

Record <$>  "key1" .: "nested-key" .: value .| defaultValue
Record <$> (("key1" .: "nested-key" .: value) .| defaultValue)

(.!) :: Int -> Parser a -> Parser a infixr 7 Source #

Synonym for arrayWithIndexOf. Matches n-th item in array.

>>> parseByteString (arrayOf (1 .! bool)) "[ [1,true,null], [2,false], [3]]" :: [Bool]
[True,False]

Structure parsers

objectWithKey :: Text -> Parser a -> Parser a Source #

Match only specific key of an object. This function will return only the first matched value in an object even if the source JSON defines the key multiple times (in violation of the specification).

objectItems :: Parser a -> Parser (Text, a) Source #

Match all key-value pairs of an object, return them as a tuple. If the source object defines same key multiple times, all values are matched.

objectValues :: Parser a -> Parser a Source #

Match all key-value pairs of an object, return only values. If the source object defines same key multiple times, all values are matched. Keys are ignored.

objectKeyValues :: (Text -> Parser a) -> Parser a Source #

Match all key-value pairs of an object, and parse the value based on the key. If the source object defines same key multiple times, all values are matched.

arrayOf :: Parser a -> Parser a Source #

Match all items of an array.

arrayWithIndexOf :: Int -> Parser a -> Parser a Source #

Match nith item in an array.

indexedArrayOf :: Parser a -> Parser (Int, a) Source #

Match all items of an array, add index to output.

nullable :: Parser a -> Parser (Maybe a) Source #

Parses a field with a possible null value.

Fast structure parser

objectOf :: forall f. Object f -> Parser f Source #

Parser for faster object parsing

The whole object is parsed in a single run. Use the .: combinator to access the fields; you may not access the same field more than once. If you try to access the same field, an error is called.

The operators .:, .:?, <|> and <> are supported and will produce the same results as if used directly with parallel parsing.

data Object f Source #

Representation for applicative JSON one-pass object parsing

Instances

Instances details
Alternative Object Source # 
Instance details

Defined in Data.JsonStream.Parser

Methods

empty :: Object a #

(<|>) :: Object a -> Object a -> Object a #

some :: Object a -> Object [a] #

many :: Object a -> Object [a] #

Applicative Object Source # 
Instance details

Defined in Data.JsonStream.Parser

Methods

pure :: a -> Object a #

(<*>) :: Object (a -> b) -> Object a -> Object b #

liftA2 :: (a -> b -> c) -> Object a -> Object b -> Object c #

(*>) :: Object a -> Object b -> Object b #

(<*) :: Object a -> Object b -> Object a #

Functor Object Source # 
Instance details

Defined in Data.JsonStream.Parser

Methods

fmap :: (a -> b) -> Object a -> Object b #

(<$) :: a -> Object b -> Object a #

Semigroup (Object a) Source # 
Instance details

Defined in Data.JsonStream.Parser

Methods

(<>) :: Object a -> Object a -> Object a #

sconcat :: NonEmpty (Object a) -> Object a #

stimes :: Integral b => b -> Object a -> Object a #

Parsing modifiers

filterI :: (a -> Bool) -> Parser a -> Parser a Source #

Let only items matching a condition pass.

>>> parseByteString (filterI (>5) $ arrayOf integer) "[1,2,3,4,5,6,7,8,9,0]" :: [Int]
[6,7,8,9]

takeI :: Int -> Parser a -> Parser a Source #

Take maximum n matching items.

>>> parseByteString (takeI 3 $ arrayOf integer) "[1,2,3,4,5,6,7,8,9,0]" :: [Int]
[1,2,3]

mapWithFailure :: (a -> Either String b) -> Parser a -> Parser b Source #

A back-door for lifting of possibly failing actions. If an action fails with Left value, convert it into failure of parsing

manyReverse :: Parser a -> Parser [a] Source #

Identical to fmap reverse . many but more efficient. If you don't care about the order of the results but plan to fully evaluate the list, this can be slightly more efficient than many as it avoids the accumulating thunks.

foldI :: (b -> a -> b) -> b -> Parser a -> Parser b Source #

Fold over values in stream

foldMapI :: Monoid m => (a -> m) -> Parser a -> Parser m Source #

Strict foldMap over values in stream

unFoldI :: Parser [a] -> Parser a Source #

From a list of values generate single values

catMaybeI :: Parser (Maybe a) -> Parser a Source #

Filter Nothing values out of a stream

SAX-like parsers

arrayFound :: a -> a -> Parser a -> Parser a Source #

Generate start/end values when an array is found, in between run a parser. The inner parser is not run if an array is not found.

>>> let test = "[[1,2,3],true,[],false,{\"key\":1}]" :: BS.ByteString
>>> parseByteString (arrayOf (arrayFound 10 20 (1 .! integer))) test :: [Int]
[10,2,20,10,20]

objectFound :: a -> a -> Parser a -> Parser a Source #

Generate start/end values when an object is found, in between run a parser. The inner parser is not run if an array is not found.