Safe Haskell | Safe-Inferred |
---|---|
Language | Haskell2010 |
Avro encoding and decoding routines.
This library provides a high level interface for encoding and decoding Haskell values in Apache's Avro serialization format.
Synopsis
- data Schema where
- Null
- Boolean
- Int { }
- Long { }
- Float
- Double
- Bytes { }
- String { }
- Array { }
- Map { }
- NamedType TypeName
- Record { }
- Enum { }
- Union { }
- Fixed {
- name :: TypeName
- aliases :: [TypeName]
- size :: Int
- logicalTypeF :: Maybe LogicalTypeFixed
- pattern Int' :: Schema
- pattern Long' :: Schema
- pattern Bytes' :: Schema
- pattern String' :: Schema
- data Field = Field {
- fldName :: Text
- fldAliases :: [Text]
- fldDoc :: Maybe Text
- fldOrder :: Maybe Order
- fldType :: Schema
- fldDefault :: Maybe DefaultValue
- data Order
- data TypeName = TN {}
- data Decimal = Decimal {}
- newtype LogicalTypeBytes = DecimalB Decimal
- data LogicalTypeFixed
- data LogicalTypeInt
- data LogicalTypeLong
- data LogicalTypeString = UUID
- data ReadSchema
- deconflict :: Schema -> Schema -> Either String ReadSchema
- readSchemaFromSchema :: Schema -> ReadSchema
- encodeValue :: (HasAvroSchema a, ToAvro a) => a -> ByteString
- encodeValueWithSchema :: ToAvro a => Schema -> a -> ByteString
- decodeValue :: forall a. (HasAvroSchema a, FromAvro a) => ByteString -> Either String a
- decodeValueWithSchema :: FromAvro a => ReadSchema -> ByteString -> Either String a
- decodeContainer :: forall a. (HasAvroSchema a, FromAvro a) => ByteString -> [Either String a]
- decodeContainerWithEmbeddedSchema :: forall a. FromAvro a => ByteString -> [Either String a]
- decodeContainerWithReaderSchema :: forall a. FromAvro a => Schema -> ByteString -> [Either String a]
- encodeContainer :: forall a. (HasAvroSchema a, ToAvro a) => Codec -> [[a]] -> IO ByteString
- encodeContainerWithSchema :: ToAvro a => Codec -> Schema -> [[a]] -> IO ByteString
- encodeContainerWithSync :: ToAvro a => Codec -> Schema -> ByteString -> [[a]] -> ByteString
- newSyncBytes :: IO ByteString
- extractContainerValuesBytes :: ByteString -> Either String (Schema, [Either String ByteString])
- decodeContainerValuesBytes :: forall a. FromAvro a => Schema -> ByteString -> Either String (Schema, [Either String (a, ByteString)])
- class ToAvro a
- class FromAvro a
- data Codec
- nullCodec :: Codec
- deflateCodec :: Codec
- class HasAvroSchema a where
- schemaOf :: HasAvroSchema a => a -> Schema
Schema
N.B. It is possible to create a Haskell value (of Schema
type) that is
not a valid Avro schema by violating one of the above or one of the
conditions called out in validateSchema
.
Null | |
Boolean | |
Int | |
Long | |
Float | |
Double | |
Bytes | |
String | |
Array | |
Map | |
NamedType TypeName | |
Record | |
Enum | |
Union | |
Fixed | |
|
Instances
Field | |
|
Instances
Instances
FromJSON Order Source # | |
ToJSON Order Source # | |
Defined in Data.Avro.Schema.Schema | |
Generic Order Source # | |
Show Order Source # | |
NFData Order Source # | |
Defined in Data.Avro.Schema.Schema | |
Eq Order Source # | |
Ord Order Source # | |
Lift Order Source # | |
type Rep Order Source # | |
Defined in Data.Avro.Schema.Schema type Rep Order = D1 ('MetaData "Order" "Data.Avro.Schema.Schema" "avro-0.6.1.2-CUddNuGftGHAxf2IYjn53O" 'False) (C1 ('MetaCons "Ascending" 'PrefixI 'False) (U1 :: Type -> Type) :+: (C1 ('MetaCons "Descending" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "Ignore" 'PrefixI 'False) (U1 :: Type -> Type))) |
A named type in Avro has a name and, optionally, a namespace.
A name is a string that starts with an ASCII letter or underscore followed by letters, underscores and digits:
name ::= [A-Za-z_][A-Za-z0-9_]*
Examples include "_foo7"
, Bar_
and "x"
.
A namespace is a sequence of names with the same lexical
structure. When written as a string, the components of a namespace
are separated with dots ("com.example"
).
TypeName
represents a fullname—a name combined with a
namespace. These are written and parsed as dot-separated
strings. The TypeName
TN Foo ["com", "example"]
is rendered
as "com.example.Foo"
.
Fullnames have to be globally unique inside an Avro schema.
A namespace of []
or [""]
is the "null namespace". In avro
an explicitly null-namespaced identifier is written as ".Foo"
Instances
The decimal
logical type represents arbitrary-precision decimal
numbers. Numbers are represented as unscaled * (10 ** -scale)
where scale
is part of the logical type and unscaled
is an
integer represented by the underlying primitive type.
Instances of the decimal
logical type need to specify a scale
and precision
.
decimal
can be encoded as one of several different primitive
types:
bytes
fixed
long
int
For long
and int
, unscaled
is the underlying number.
For bytes
and fixed
, unscaled
is represented as a
two's-complement signed integer in big-endian byte order.
Note: int
and long
representations for decimal
are not part
of the current Avro
specification,
but they are supported by some language implementations including
the official Java library. Implementations that do not support this
should ignore the logical type and use the underlying primitive
type instead.
Instances
Generic Decimal Source # | |
Show Decimal Source # | |
NFData Decimal Source # | |
Defined in Data.Avro.Schema.Schema | |
Eq Decimal Source # | |
Ord Decimal Source # | |
Lift Decimal Source # | |
type Rep Decimal Source # | |
Defined in Data.Avro.Schema.Schema type Rep Decimal = D1 ('MetaData "Decimal" "Data.Avro.Schema.Schema" "avro-0.6.1.2-CUddNuGftGHAxf2IYjn53O" 'False) (C1 ('MetaCons "Decimal" 'PrefixI 'True) (S1 ('MetaSel ('Just "precision") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedStrict) (Rec0 Integer) :*: S1 ('MetaSel ('Just "scale") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedStrict) (Rec0 Integer))) |
newtype LogicalTypeBytes Source #
Instances
data LogicalTypeFixed Source #
DecimalF Decimal | An arbitrary-precision signed decimal number. See |
Duration | An interval of time, represented as some number of months, days and milliseconds. Encoded as three little-endian unsigned integers for months, days and milliseconds respectively. |
Instances
data LogicalTypeInt Source #
DecimalI Decimal | An arbitrary-precision signed decimal number. See |
Date | A date (eg Encoded as the number of days before/after the Unix epoch (1970-01-01). |
TimeMillis | A time of day with millisecond precision. Encoded as the number of milliseconds after midnight. |
Instances
data LogicalTypeLong Source #
DecimalL Decimal | An arbitrary-precision signed decimal number. See |
TimeMicros | A time of day with microsecond precision. Encoded as the number of microseconds after midnight. |
TimestampMillis | A UTC timestamp with millisecond precision. Encoded as the number of milliseconds before/after the Unix epoch (1970-01-01 00:00:00.000). |
TimestampMicros | A UTC timestamp with microsecond precision. Encoded as the number of microseconds before/after the Unix epoch (1970-01-01 00:00:00.000000). |
LocalTimestampMillis | A timestamp in the local timezone, whatever that happens to be, with millisecond precision. Encoded as the number of milliseconds before/after the Unix epoch (1970-01-01 00:00:00.000). |
LocalTimestampMicros | A timestamp in the local timezone, whatever that happens to be, with microsecond precision. Encoded as the number of microseconds before/after the Unix epoch (1970-01-01 00:00:00.000000). |
Instances
data LogicalTypeString Source #
Instances
Deconflicting schemas
data ReadSchema Source #
This type represents a deconflicted version of a Schema
.
Schema resolution is described in Avro specification: https://avro.apache.org/docs/current/spec.html#Schema+Resolution
This library represents "normal" schema and "deconflicted" schema as different types to avoid confusion between these two usecases (we shouldn't serialise values with such schema) and to be able to accomodate some extra information that links between how data is supposed transformed between what reader wants and what writer has.
Instances
deconflict :: Schema -> Schema -> Either String ReadSchema Source #
deconflict writer reader
will produce a schema that can decode
with the writer's schema into the form specified by the reader's schema.
Schema resolution rules are described by the specification: https://avro.apache.org/docs/current/spec.html#Schema+Resolution
readSchemaFromSchema :: Schema -> ReadSchema Source #
Converts Schema
into ReadSchema
. This function may be useful when it is known
that the writer and the reader schemas are the same.
Individual values
encodeValue :: (HasAvroSchema a, ToAvro a) => a -> ByteString Source #
Serialises an individual value into Avro using the schema
from its coresponding HasAvroSchema
instance.
encodeValueWithSchema :: ToAvro a => Schema -> a -> ByteString Source #
Serialises an individual value into Avro with the schema provided.
decodeValue :: forall a. (HasAvroSchema a, FromAvro a) => ByteString -> Either String a Source #
Deserialises an individual value from Avro using the schema from its coresponding HasAvroSchema
.
NOTE: This function is only to be used when reader and writes schemas are known to be the same. Because only one schema is known at this point, and it is the reader schema, no decondlicting can be performed.
decodeValueWithSchema :: FromAvro a => ReadSchema -> ByteString -> Either String a Source #
Deserialises an individual value from Avro.
Working with containers
Decoding containers
decodeContainer :: forall a. (HasAvroSchema a, FromAvro a) => ByteString -> [Either String a] Source #
Decodes the container using a schema from HasAvroSchema
as a reader schema.
Errors are reported as a part of the list and the list will stop at first error. This means that the consumer will get all the "good" content from the container until the error is detected, then this error and then the list is finished.
decodeContainerWithEmbeddedSchema :: forall a. FromAvro a => ByteString -> [Either String a] Source #
Decodes the container as a list of values of the requested type.
Errors are reported as a part of the list and the list will stop at first error. This means that the consumer will get all the "good" content from the container until the error is detected, then this error and then the list is finished.
decodeContainerWithReaderSchema :: forall a. FromAvro a => Schema -> ByteString -> [Either String a] Source #
Decodes the container as a list of values of the requested type.
The provided reader schema will be de-conflicted with the schema embedded with the container.
Errors are reported as a part of the list and the list will stop at first error. This means that the consumer will get all the "good" content from the container until the error is detected, then this error and then the list is finished.
encodeContainer :: forall a. (HasAvroSchema a, ToAvro a) => Codec -> [[a]] -> IO ByteString Source #
Encode chunks of values into a container, using 16 random bytes for
the synchronization markers and a corresponding HasAvroSchema
schema.
Blocks are compressed (or not) according to the given Codec
(nullCodec
or deflateCodec
).
encodeContainerWithSchema :: ToAvro a => Codec -> Schema -> [[a]] -> IO ByteString Source #
Encode chunks of values into a container, using 16 random bytes for
the synchronization markers. Blocks are compressed (or not) according
to the given Codec
(nullCodec
or deflateCodec
).
encodeContainerWithSync :: ToAvro a => Codec -> Schema -> ByteString -> [[a]] -> ByteString Source #
Encode chunks of objects into a container, using the provided ByteString as the synchronization markers.
newSyncBytes :: IO ByteString Source #
Generates a new synchronization marker for encoding Avro containers
Extracting containers' data
extractContainerValuesBytes :: ByteString -> Either String (Schema, [Either String ByteString]) Source #
Splits container into a list of individual avro-encoded values.
This is particularly useful when slicing up containers into one or more smaller files. By extracting the original bytestring it is possible to avoid re-encoding data.
decodeContainerValuesBytes :: forall a. FromAvro a => Schema -> ByteString -> Either String (Schema, [Either String (a, ByteString)]) Source #
Splits container into a list of individual avro-encoded values. This version provides both encoded and decoded values.
This is particularly useful when slicing up containers into one or more smaller files. By extracting the original bytestring it is possible to avoid re-encoding data.
Classes
Describes how to encode Haskell data types into Avro bytes
Instances
Descrives how to convert a given intermediate Value
into a Haskell data type.
Instances
Compression
A Codec
allows for compression/decompression of a block in an
Avro container according to the Avro spec.
nullCodec
specifies null
required by Avro spec.
(see https://avro.apache.org/docs/1.8.1/spec.html#null)
deflateCodec :: Codec Source #
deflateCodec
specifies deflate
codec required by Avro spec.
(see https://avro.apache.org/docs/1.8.1/spec.html#deflate)
class HasAvroSchema a where Source #
Instances
schemaOf :: HasAvroSchema a => a -> Schema Source #