Safe Haskell | None |
---|
This module implements encoding and decoding of CSV data. The implementation is RFC 4180 compliant, with the following extensions:
- Empty lines are ignored.
- Non-escaped fields may contain any characters except double-quotes, commas, carriage returns, and newlines
- Escaped fields may contain any characters (but double-quotes need to be escaped).
- decode :: FromRecord a => Bool -> ByteString -> Either String (Vector a)
- decodeByName :: FromNamedRecord a => ByteString -> Either String (Header, Vector a)
- encode :: ToRecord a => Vector a -> ByteString
- encodeByName :: ToNamedRecord a => Header -> Vector a -> ByteString
- data DecodeOptions = DecodeOptions {
- decDelimiter :: !Word8
- defaultDecodeOptions :: DecodeOptions
- decodeWith :: FromRecord a => DecodeOptions -> Bool -> ByteString -> Either String (Vector a)
- decodeByNameWith :: FromNamedRecord a => DecodeOptions -> ByteString -> Either String (Header, Vector a)
- data EncodeOptions = EncodeOptions {
- encDelimiter :: !Word8
- defaultEncodeOptions :: EncodeOptions
- encodeWith :: ToRecord a => EncodeOptions -> Vector a -> ByteString
- encodeByNameWith :: ToNamedRecord a => EncodeOptions -> Header -> Vector a -> ByteString
- type Csv = Vector Record
- type Record = Vector Field
- type Field = ByteString
- type Header = Vector Name
- type Name = ByteString
- type NamedRecord = HashMap ByteString ByteString
- class FromRecord a where
- parseRecord :: Record -> Parser a
- data Parser a
- runParser :: Parser a -> Either String a
- index :: FromField a => Record -> Int -> Parser a
- (.!) :: FromField a => Record -> Int -> Parser a
- class ToRecord a where
- record :: [ByteString] -> Record
- newtype Only a = Only {
- fromOnly :: a
- class FromNamedRecord a where
- parseNamedRecord :: NamedRecord -> Parser a
- lookup :: FromField a => NamedRecord -> ByteString -> Parser a
- (.:) :: FromField a => NamedRecord -> ByteString -> Parser a
- class ToNamedRecord a where
- toNamedRecord :: a -> NamedRecord
- namedRecord :: [(ByteString, ByteString)] -> NamedRecord
- namedField :: ToField a => ByteString -> a -> (ByteString, ByteString)
- (.=) :: ToField a => ByteString -> a -> (ByteString, ByteString)
- class FromField a where
- parseField :: Field -> Parser a
- class ToField a where
Usage example
A short encoding usage example:
>>> encode $ fromList [("John" :: Text, 27), ("Jane", 28)] Chunk "John,27\r\nJane,28\r\n" Empty
Since string literals are overloaded we have to supply a type
signature as the compiler couldn't deduce which string type (i.e.
String
or Text
) we want to use. In most cases type inference
will infer the type from the context and you can omit type
signatures.
A short decoding usage example:
>>> decode False "John,27\r\nJane,28\r\n" :: Either String (Vector (Text, Int)) Right (fromList [("John",27),("Jane",28)])
We pass False
as the first argument to indicate that the CSV
input data isn't preceded by a header.
In practice, the return type of decode
rarely needs to be given,
as it can often be inferred from the context.
Treating CSV data as opaque byte strings
Sometimes you might want to work with a CSV file which contents is
unknown to you. For example, you might want remove the second
column of a file without knowing anything about its content. To
parse a CSV file to a generic representation, just convert each
record to a
value, like so:
Vector
ByteString
decode False "John,27\r\nJane,28\r\n" :: Either String (Vector (Vector ByteString)) Right (fromList [fromList ["John","27"],fromList ["Jane","28"]])
As the example output above shows, all the fields are returned as
uninterpreted ByteString
values.
Custom type conversions
Most of the time the existing FromField
and ToField
instances
do what you want. However, if you need to parse a different format
(e.g. hex) but use a type (e.g. Int
) for which there's already a
FromField
instance, you need to use a newtype
. Example:
newtype Hex = Hex Int parseHex :: ByteString -> Parser Int parseHex = ... instance FromField Hex where parseField s = Hex <$> parseHex s
Other than giving an explicit type signature, you can pattern match
on the newtype
constructor to indicate which type conversion you
want to have the library use:
case decode False "0xff,0xaa\r\n0x11,0x22\r\n" of Left err -> putStrLn err Right v -> forM_ v $ \ (Hex val1, Hex val2) -> print (val1, val2)
Encoding and decoding
Encoding and decoding is a two step process. To encode a value, it
is first converted to a generic representation, using either
ToRecord
or ToNamedRecord
. The generic representation is then
encoded as CSV data. To decode a value the process is reversed and
either FromRecord
or FromNamedRecord
is used instead. Both
these steps are combined in the encode
and decode
functions.
:: FromRecord a | |
=> Bool | Data contains header that should be skipped |
-> ByteString | CSV data |
-> Either String (Vector a) |
Efficiently deserialize CSV records from a lazy ByteString
.
If this fails due to incomplete or invalid input,
is
returned. Equivalent to Left
msg
.
decodeWith
defaultDecodeOptions
:: FromNamedRecord a | |
=> ByteString | CSV data |
-> Either String (Header, Vector a) |
Efficiently deserialize CSV records from a lazy ByteString
.
If this fails due to incomplete or invalid input,
is
returned. The data is assumed to be preceeded by a header.
Equivalent to Left
msg
.
decodeByNameWith
defaultDecodeOptions
encode :: ToRecord a => Vector a -> ByteStringSource
Efficiently serialize CSV records as a lazy ByteString
.
encodeByName :: ToNamedRecord a => Header -> Vector a -> ByteStringSource
Efficiently serialize CSV records as a lazy ByteString
. The
header is written before any records and dictates the field order.
Encoding and decoding options
These functions can be used to control how data is encoded and decoded. For example, they can be used to encode data in a tab-separated format instead of in a comma-separated format.
data DecodeOptions Source
Options that controls how data is decoded. These options can be used to e.g. decode tab-separated data instead of comma-separated data.
To avoid having your program stop compiling when new fields are
added to DecodeOptions
, create option records by overriding
values in defaultDecodeOptions
. Example:
myOptions = defaultDecodeOptions { decDelimiter = fromIntegral (ord '\t') }
DecodeOptions | |
|
defaultDecodeOptions :: DecodeOptionsSource
Decoding options for parsing CSV files.
:: FromRecord a | |
=> DecodeOptions | Decoding options |
-> Bool | Data contains header that should be skipped |
-> ByteString | CSV data |
-> Either String (Vector a) |
Like decode
, but lets you customize how the CSV data is parsed.
:: FromNamedRecord a | |
=> DecodeOptions | Decoding options |
-> ByteString | CSV data |
-> Either String (Header, Vector a) |
Like decodeByName
, but lets you customize how the CSV data is
parsed.
data EncodeOptions Source
Options that controls how data is encoded. These options can be used to e.g. encode data in a tab-separated format instead of in a comma-separated format.
To avoid having your program stop compiling when new fields are
added to EncodeOptions
, create option records by overriding
values in defaultEncodeOptions
. Example:
myOptions = defaultEncodeOptions { encDelimiter = fromIntegral (ord '\t') }
EncodeOptions | |
|
defaultEncodeOptions :: EncodeOptionsSource
Encoding options for CSV files.
encodeWith :: ToRecord a => EncodeOptions -> Vector a -> ByteStringSource
Like encode
, but lets you customize how the CSV data is
encoded.
encodeByNameWith :: ToNamedRecord a => EncodeOptions -> Header -> Vector a -> ByteStringSource
Like encodeByName
, but lets you customize how the CSV data is
encoded.
Core CSV types
type Field = ByteStringSource
A single field within a record.
type Header = Vector NameSource
The header corresponds to the first line a CSV file. Not all CSV files have a header.
type Name = ByteStringSource
A header has one or more names, describing the data in the column following the name.
type NamedRecord = HashMap ByteString ByteStringSource
A record corresponds to a single line in a CSV file, indexed by the column name rather than the column index.
Type conversion
There are two ways to convert CSV records to and from and user-defined data types: index-based conversion and name-based conversion.
Index-based record conversion
Index-based conversion lets you convert CSV records to and from user-defined data types by referring to a field's position (its index) in the record. The first column in a CSV file is given index 0, the second index 1, and so on.
class FromRecord a whereSource
A type that can be converted from a single CSV record, with the possibility of failure.
When writing an instance, use empty
, mzero
, or fail
to make a
conversion fail, e.g. if a Record
has the wrong number of
columns.
Given this example data:
John,56 Jane,55
here's an example type and instance:
data Person = Person { name :: Text, age :: Int } instance FromRecord Person where parseRecord v | length v == 2 = Person <$> v .! 0 <*> v .! 1 | otherwise = mzero
parseRecord :: Record -> Parser aSource
FromField a => FromRecord [a] | |
FromField a => FromRecord (Vector a) | |
(FromField a, Unbox a) => FromRecord (Vector a) | |
FromField a => FromRecord (Only a) | |
(FromField a, FromField b) => FromRecord (a, b) | |
(FromField a, FromField b, FromField c) => FromRecord (a, b, c) | |
(FromField a, FromField b, FromField c, FromField d) => FromRecord (a, b, c, d) | |
(FromField a, FromField b, FromField c, FromField d, FromField e) => FromRecord (a, b, c, d, e) | |
(FromField a, FromField b, FromField c, FromField d, FromField e, FromField f) => FromRecord (a, b, c, d, e, f) | |
(FromField a, FromField b, FromField c, FromField d, FromField e, FromField f, FromField g) => FromRecord (a, b, c, d, e, f, g) |
Conversion of a field to a value might fail e.g. if the field is
malformed. This possibility is captured by the Parser
type, which
lets you compose several field conversions together in such a way
that if any of them fail, the whole record conversion fails.
index :: FromField a => Record -> Int -> Parser aSource
Retrieve the nth field in the given record. The result is
empty
if the value cannot be converted to the desired type.
Raises an exception if the index is out of bounds.
index
is a simple convenience function that is equivalent to
. If you're certain that the index is not
out of bounds, using parseField
(v !
idx)
is
somewhat faster.
parseField
(unsafeIndex
v idx)
A type that can be converted to a single CSV record.
An example type and instance:
data Person = Person { name :: Text, age :: Int } instance ToRecord Person where toRecord (Person name age) = 'record' [ 'toField' name, 'toField' age]
Outputs data on this form:
John,56 Jane,55
ToField a => ToRecord [a] | |
ToField a => ToRecord (Vector a) | |
(ToField a, Unbox a) => ToRecord (Vector a) | |
ToField a => ToRecord (Only a) | |
(ToField a, ToField b) => ToRecord (a, b) | |
(ToField a, ToField b, ToField c) => ToRecord (a, b, c) | |
(ToField a, ToField b, ToField c, ToField d) => ToRecord (a, b, c, d) | |
(ToField a, ToField b, ToField c, ToField d, ToField e) => ToRecord (a, b, c, d, e) | |
(ToField a, ToField b, ToField c, ToField d, ToField e, ToField f) => ToRecord (a, b, c, d, e, f) | |
(ToField a, ToField b, ToField c, ToField d, ToField e, ToField f, ToField g) => ToRecord (a, b, c, d, e, f, g) |
record :: [ByteString] -> RecordSource
Construct a record from a list of ByteString
s. Use toField
to convert values to ByteString
s for use with record
.
Haskell lacks a single-element tuple type, so if you CSV data
with just one column you can use the Only
type to represent a
single-column result.
Name-based record conversion
Name-based conversion lets you convert CSV records to and from user-defined data types by referring to a field's name. The names of the fields are defined by the first line in the file, also known as the header. Name-based conversion is more robust to changes in the file structure e.g. to reording or addition of columns, but can be a bit slower.
class FromNamedRecord a whereSource
A type that can be converted from a single CSV record, with the possibility of failure.
When writing an instance, use empty
, mzero
, or fail
to make a
conversion fail, e.g. if a Record
has the wrong number of
columns.
Given this example data:
name,age John,56 Jane,55
here's an example type and instance:
{-# LANGUAGE OverloadedStrings #-} data Person = Person { name :: Text, age :: Int } instance FromRecord Person where parseNamedRecord m = Person <$> m .: "name" <*> m .: "age"
Note the use of the OverloadedStrings
language extension which
enables ByteString
values to be written as string literals.
parseNamedRecord :: NamedRecord -> Parser aSource
FromField a => FromNamedRecord (Map ByteString a) | |
FromField a => FromNamedRecord (HashMap ByteString a) |
lookup :: FromField a => NamedRecord -> ByteString -> Parser aSource
Retrieve a field in the given record by name. The result is
empty
if the field is missing or if the value cannot be converted
to the desired type.
(.:) :: FromField a => NamedRecord -> ByteString -> Parser aSource
Alias for lookup
.
class ToNamedRecord a whereSource
A type that can be converted to a single CSV record.
An example type and instance:
data Person = Person { name :: Text, age :: Int } instance ToRecord Person where toNamedRecord (Person name age) = namedRecord [ "name" .= name, "age" .= age]
toNamedRecord :: a -> NamedRecordSource
ToField a => ToNamedRecord (Map ByteString a) | |
ToField a => ToNamedRecord (HashMap ByteString a) |
namedRecord :: [(ByteString, ByteString)] -> NamedRecordSource
Construct a named record from a list of name-value ByteString
pairs. Use .=
to construct such a pair from a name and a value.
namedField :: ToField a => ByteString -> a -> (ByteString, ByteString)Source
Construct a pair from a name and a value. For use with
namedRecord
.
(.=) :: ToField a => ByteString -> a -> (ByteString, ByteString)Source
Alias for namedField
.
Field conversion
The FromField
and ToField
classes define how to convert between
Field
s and values you care about (e.g. Int
s). Most of the time
you don't need to write your own instances as the standard ones
cover most use cases.
A type that can be converted from a single CSV field, with the possibility of failure.
When writing an instance, use empty
, mzero
, or fail
to make a
conversion fail, e.g. if a Field
can't be converted to the given
type.
Example type and instance:
{-# LANGUAGE OverloadedStrings #-} data Color = Red | Green | Blue instance FromField Color where parseField s | s == "R" = pure Red | s == "G" = pure Green | s == "B" = pure Blue | otherwise = mzero
parseField :: Field -> Parser aSource
FromField Char | Assumes UTF-8 encoding. |
FromField Double | Accepts same syntax as |
FromField Float | Accepts same syntax as |
FromField Int | Accepts a signed decimal number. |
FromField Int8 | Accepts a signed decimal number. |
FromField Int16 | Accepts a signed decimal number. |
FromField Int32 | Accepts a signed decimal number. |
FromField Int64 | Accepts a signed decimal number. |
FromField Integer | Accepts a signed decimal number. |
FromField Word | Accepts an unsigned decimal number. |
FromField Word8 | Accepts an unsigned decimal number. |
FromField Word16 | Accepts an unsigned decimal number. |
FromField Word32 | Accepts an unsigned decimal number. |
FromField Word64 | Accepts an unsigned decimal number. |
FromField ByteString | |
FromField ByteString | |
FromField Text | Assumes UTF-8 encoding. |
FromField Text | Assumes UTF-8 encoding. |
FromField [Char] | Assumes UTF-8 encoding. |
FromField a => FromField (Maybe a) |
A type that can be converted to a single CSV field.
Example type and instance:
{-# LANGUAGE OverloadedStrings #-} data Color = Red | Green | Blue instance ToField Color where toField Red = "R" toField Green = "G" toField Blue = "B"
ToField Char | Uses UTF-8 encoding. |
ToField Double | Uses decimal notation or scientific notation, depending on the number. |
ToField Float | Uses decimal notation or scientific notation, depending on the number. |
ToField Int | Uses decimal encoding with optional sign. |
ToField Int8 | Uses decimal encoding with optional sign. |
ToField Int16 | Uses decimal encoding with optional sign. |
ToField Int32 | Uses decimal encoding with optional sign. |
ToField Int64 | Uses decimal encoding with optional sign. |
ToField Integer | Uses decimal encoding with optional sign. |
ToField Word | Uses decimal encoding. |
ToField Word8 | Uses decimal encoding. |
ToField Word16 | Uses decimal encoding. |
ToField Word32 | Uses decimal encoding. |
ToField Word64 | Uses decimal encoding. |
ToField ByteString | |
ToField ByteString | |
ToField Text | Uses UTF-8 encoding. |
ToField Text | Uses UTF-8 encoding. |
ToField [Char] | Uses UTF-8 encoding. |
ToField a => ToField (Maybe a) |