| Safe Haskell | None |
|---|---|
| Language | Haskell98 |
Data.Csv
Contents
Description
This module implements encoding and decoding of CSV data. The implementation is RFC 4180 compliant, with the following extensions:
- Empty lines are ignored.
- Non-escaped fields may contain any characters except double-quotes, commas, carriage returns, and newlines.
- Escaped fields may contain any characters (but double-quotes need to be escaped).
- data HasHeader
- decode :: FromRecord a => HasHeader -> ByteString -> Either String (Vector a)
- decodeByName :: FromNamedRecord a => ByteString -> Either String (Header, Vector a)
- encode :: ToRecord a => [a] -> ByteString
- encodeByName :: ToNamedRecord a => Header -> [a] -> ByteString
- data DecodeOptions = DecodeOptions {
- decDelimiter :: !Word8
- defaultDecodeOptions :: DecodeOptions
- decodeWith :: FromRecord a => DecodeOptions -> HasHeader -> ByteString -> Either String (Vector a)
- decodeByNameWith :: FromNamedRecord a => DecodeOptions -> ByteString -> Either String (Header, Vector a)
- data EncodeOptions = EncodeOptions {
- encDelimiter :: !Word8
- encUseCrLf :: !Bool
- encIncludeHeader :: !Bool
- defaultEncodeOptions :: EncodeOptions
- encodeWith :: ToRecord a => EncodeOptions -> [a] -> ByteString
- encodeByNameWith :: ToNamedRecord a => EncodeOptions -> Header -> [a] -> ByteString
- type Csv = Vector Record
- type Record = Vector Field
- type Field = ByteString
- type Header = Vector Name
- type Name = ByteString
- type NamedRecord = HashMap ByteString ByteString
- class FromRecord a where
- parseRecord :: Record -> Parser a
- data Parser a
- runParser :: Parser a -> Either String a
- index :: FromField a => Record -> Int -> Parser a
- (.!) :: FromField a => Record -> Int -> Parser a
- unsafeIndex :: FromField a => Record -> Int -> Parser a
- class ToRecord a where
- record :: [ByteString] -> Record
- newtype Only a = Only {
- fromOnly :: a
- class FromNamedRecord a where
- parseNamedRecord :: NamedRecord -> Parser a
- lookup :: FromField a => NamedRecord -> ByteString -> Parser a
- (.:) :: FromField a => NamedRecord -> ByteString -> Parser a
- class ToNamedRecord a where
- toNamedRecord :: a -> NamedRecord
- namedRecord :: [(ByteString, ByteString)] -> NamedRecord
- namedField :: ToField a => ByteString -> a -> (ByteString, ByteString)
- (.=) :: ToField a => ByteString -> a -> (ByteString, ByteString)
- class FromField a where
- parseField :: Field -> Parser a
- class ToField a where
Usage example
A short encoding usage example:
>>> encode [("John" :: Text, 27), ("Jane", 28)]
"John,27\r\nJane,28\r\n"Since string literals are overloaded we have to supply a type
signature as the compiler couldn't deduce which string type (i.e.
String or Text) we want to use. In most cases type inference
will infer the type from the context and you can omit type
signatures.
A short decoding usage example:
>>> decode NoHeader "John,27\r\nJane,28\r\n" :: Either String (Vector (Text, Int))
Right [("John",27),("Jane",28)]We pass NoHeader as the first argument to indicate that the CSV
input data isn't preceded by a header.
In practice, the return type of decode rarely needs to be given,
as it can often be inferred from the context.
Treating CSV data as opaque byte strings
Sometimes you might want to work with a CSV file which contents is
unknown to you. For example, you might want remove the second
column of a file without knowing anything about its content. To
parse a CSV file to a generic representation, just convert each
record to a value, like so:Vector ByteString
decode NoHeader "John,27\r\nJane,28\r\n" :: Either String (Vector (Vector ByteString)) Right [["John","27"],["Jane","28"]]
As the example output above shows, all the fields are returned as
uninterpreted ByteString values.
Custom type conversions
Most of the time the existing FromField and ToField instances
do what you want. However, if you need to parse a different format
(e.g. hex) but use a type (e.g. Int) for which there's already a
FromField instance, you need to use a newtype. Example:
newtype Hex = Hex Int
parseHex :: ByteString -> Parser Int
parseHex = ...
instance FromField Hex where
parseField s = Hex <$> parseHex sOther than giving an explicit type signature, you can pattern match
on the newtype constructor to indicate which type conversion you
want to have the library use:
case decode NoHeader "0xff,0xaa\r\n0x11,0x22\r\n" of
Left err -> putStrLn err
Right v -> forM_ v $ \ (Hex val1, Hex val2) ->
print (val1, val2)If a field might be in one several different formats, you can use a newtype to normalize the result:
newtype HexOrDecimal = HexOrDecimal Int
instance FromField DefaultToZero where
parseField s = case runParser (parseField s :: Parser Hex) of
Left err -> HexOrDecimal <$> parseField s -- Uses Int instance
Right n -> pure $ HexOrDecimal nYou can use the unit type, (), to ignore a column. The
parseField method for () doesn't look at the Field and thus
always decodes successfully. Note that it lacks a corresponding
ToField instance. Example:
case decode NoHeader "foo,1\r\nbar,22" of
Left err -> putStrLn err
Right v -> forM_ v $ \ ((), i) -> print (i :: Int)Dealing with bad data
If your input might contain invalid fields, you can write a custom
FromField instance to deal with them. Example:
newtype DefaultToZero = DefaultToZero Int
instance FromField DefaultToZero where
parseField s = case runParser (parseField s) of
Left err -> pure $ DefaultToZero 0
Right n -> pure $ DefaultToZero nEncoding and decoding
Encoding and decoding is a two step process. To encode a value, it
is first converted to a generic representation, using either
ToRecord or ToNamedRecord. The generic representation is then
encoded as CSV data. To decode a value the process is reversed and
either FromRecord or FromNamedRecord is used instead. Both
these steps are combined in the encode and decode functions.
Is the CSV data preceded by a header?
Arguments
| :: FromRecord a | |
| => HasHeader | Data contains header that should be skipped |
| -> ByteString | CSV data |
| -> Either String (Vector a) |
Efficiently deserialize CSV records from a lazy ByteString.
If this fails due to incomplete or invalid input, is
returned. Equivalent to Left msg.decodeWith defaultDecodeOptions
Arguments
| :: FromNamedRecord a | |
| => ByteString | CSV data |
| -> Either String (Header, Vector a) |
Efficiently deserialize CSV records from a lazy ByteString.
If this fails due to incomplete or invalid input, is
returned. The data is assumed to be preceeded by a header.
Equivalent to Left msg.decodeByNameWith defaultDecodeOptions
encode :: ToRecord a => [a] -> ByteString Source
Efficiently serialize CSV records as a lazy ByteString.
encodeByName :: ToNamedRecord a => Header -> [a] -> ByteString Source
Efficiently serialize CSV records as a lazy ByteString. The
header is written before any records and dictates the field order.
Encoding and decoding options
These functions can be used to control how data is encoded and decoded. For example, they can be used to encode data in a tab-separated format instead of in a comma-separated format.
data DecodeOptions Source
Options that controls how data is decoded. These options can be used to e.g. decode tab-separated data instead of comma-separated data.
To avoid having your program stop compiling when new fields are
added to DecodeOptions, create option records by overriding
values in defaultDecodeOptions. Example:
myOptions = defaultDecodeOptions {
decDelimiter = fromIntegral (ord '\t')
}Constructors
| DecodeOptions | |
Fields
| |
Instances
defaultDecodeOptions :: DecodeOptions Source
Decoding options for parsing CSV files.
Arguments
| :: FromRecord a | |
| => DecodeOptions | Decoding options |
| -> HasHeader | Data contains header that should be skipped |
| -> ByteString | CSV data |
| -> Either String (Vector a) |
Like decode, but lets you customize how the CSV data is parsed.
Arguments
| :: FromNamedRecord a | |
| => DecodeOptions | Decoding options |
| -> ByteString | CSV data |
| -> Either String (Header, Vector a) |
Like decodeByName, but lets you customize how the CSV data is
parsed.
data EncodeOptions Source
Options that controls how data is encoded. These options can be used to e.g. encode data in a tab-separated format instead of in a comma-separated format.
To avoid having your program stop compiling when new fields are
added to EncodeOptions, create option records by overriding
values in defaultEncodeOptions. Example:
myOptions = defaultEncodeOptions {
encDelimiter = fromIntegral (ord '\t')
}N.B. The encDelimiter must not be the quote character (i.e.
") or one of the record separator characters (i.e. \n or
\r).
Constructors
| EncodeOptions | |
Fields
| |
Instances
defaultEncodeOptions :: EncodeOptions Source
Encoding options for CSV files.
encodeWith :: ToRecord a => EncodeOptions -> [a] -> ByteString Source
Like encode, but lets you customize how the CSV data is
encoded.
encodeByNameWith :: ToNamedRecord a => EncodeOptions -> Header -> [a] -> ByteString Source
Like encodeByName, but lets you customize how the CSV data is
encoded.
Core CSV types
type Field = ByteString Source
A single field within a record.
type Header = Vector Name Source
The header corresponds to the first line a CSV file. Not all CSV files have a header.
type Name = ByteString Source
A header has one or more names, describing the data in the column following the name.
type NamedRecord = HashMap ByteString ByteString Source
A record corresponds to a single line in a CSV file, indexed by the column name rather than the column index.
Type conversion
There are two ways to convert CSV records to and from and user-defined data types: index-based conversion and name-based conversion.
Index-based record conversion
Index-based conversion lets you convert CSV records to and from user-defined data types by referring to a field's position (its index) in the record. The first column in a CSV file is given index 0, the second index 1, and so on.
class FromRecord a where Source
A type that can be converted from a single CSV record, with the possibility of failure.
When writing an instance, use empty, mzero, or fail to make a
conversion fail, e.g. if a Record has the wrong number of
columns.
Given this example data:
John,56 Jane,55
here's an example type and instance:
data Person = Person { name :: !Text, age :: !Int }
instance FromRecord Person where
parseRecord v
| length v == 2 = Person <$>
v .! 0 <*>
v .! 1
| otherwise = mzeroMinimal complete definition
Nothing
Methods
parseRecord :: Record -> Parser a Source
Instances
| FromField a => FromRecord [a] | |
| FromField a => FromRecord (Vector a) | |
| (FromField a, Unbox a) => FromRecord (Vector a) | |
| FromField a => FromRecord (Only a) | |
| (FromField a, FromField b) => FromRecord (a, b) | |
| (FromField a, FromField b, FromField c) => FromRecord (a, b, c) | |
| (FromField a, FromField b, FromField c, FromField d) => FromRecord (a, b, c, d) | |
| (FromField a, FromField b, FromField c, FromField d, FromField e) => FromRecord (a, b, c, d, e) | |
| (FromField a, FromField b, FromField c, FromField d, FromField e, FromField f) => FromRecord (a, b, c, d, e, f) | |
| (FromField a, FromField b, FromField c, FromField d, FromField e, FromField f, FromField g) => FromRecord (a, b, c, d, e, f, g) | |
| (FromField a, FromField b, FromField c, FromField d, FromField e, FromField f, FromField g, FromField h) => FromRecord (a, b, c, d, e, f, g, h) | |
| (FromField a, FromField b, FromField c, FromField d, FromField e, FromField f, FromField g, FromField h, FromField i) => FromRecord (a, b, c, d, e, f, g, h, i) |
Conversion of a field to a value might fail e.g. if the field is
malformed. This possibility is captured by the Parser type, which
lets you compose several field conversions together in such a way
that if any of them fail, the whole record conversion fails.
index :: FromField a => Record -> Int -> Parser a Source
Retrieve the nth field in the given record. The result is
empty if the value cannot be converted to the desired type.
Raises an exception if the index is out of bounds.
index is a simple convenience function that is equivalent to
. If you're certain that the index is not
out of bounds, using parseField (v ! idx)unsafeIndex is somewhat faster.
unsafeIndex :: FromField a => Record -> Int -> Parser a Source
Like index but without bounds checking.
A type that can be converted to a single CSV record.
An example type and instance:
data Person = Person { name :: !Text, age :: !Int }
instance ToRecord Person where
toRecord (Person name age) = record [
toField name, toField age]Outputs data on this form:
John,56 Jane,55
Minimal complete definition
Nothing
Instances
| ToField a => ToRecord [a] | |
| ToField a => ToRecord (Vector a) | |
| (ToField a, Unbox a) => ToRecord (Vector a) | |
| ToField a => ToRecord (Only a) | |
| (ToField a, ToField b) => ToRecord (a, b) | |
| (ToField a, ToField b, ToField c) => ToRecord (a, b, c) | |
| (ToField a, ToField b, ToField c, ToField d) => ToRecord (a, b, c, d) | |
| (ToField a, ToField b, ToField c, ToField d, ToField e) => ToRecord (a, b, c, d, e) | |
| (ToField a, ToField b, ToField c, ToField d, ToField e, ToField f) => ToRecord (a, b, c, d, e, f) | |
| (ToField a, ToField b, ToField c, ToField d, ToField e, ToField f, ToField g) => ToRecord (a, b, c, d, e, f, g) | |
| (ToField a, ToField b, ToField c, ToField d, ToField e, ToField f, ToField g, ToField h) => ToRecord (a, b, c, d, e, f, g, h) | |
| (ToField a, ToField b, ToField c, ToField d, ToField e, ToField f, ToField g, ToField h, ToField i) => ToRecord (a, b, c, d, e, f, g, h, i) |
record :: [ByteString] -> Record Source
Construct a record from a list of ByteStrings. Use toField
to convert values to ByteStrings for use with record.
Haskell lacks a single-element tuple type, so if you CSV data
with just one column you can use the Only type to represent a
single-column result.
Name-based record conversion
Name-based conversion lets you convert CSV records to and from user-defined data types by referring to a field's name. The names of the fields are defined by the first line in the file, also known as the header. Name-based conversion is more robust to changes in the file structure e.g. to reording or addition of columns, but can be a bit slower.
class FromNamedRecord a where Source
A type that can be converted from a single CSV record, with the possibility of failure.
When writing an instance, use empty, mzero, or fail to make a
conversion fail, e.g. if a Record has the wrong number of
columns.
Given this example data:
name,age John,56 Jane,55
here's an example type and instance:
{-# LANGUAGE OverloadedStrings #-}
data Person = Person { name :: !Text, age :: !Int }
instance FromNamedRecord Person where
parseNamedRecord m = Person <$>
m .: "name" <*>
m .: "age"Note the use of the OverloadedStrings language extension which
enables ByteString values to be written as string literals.
Minimal complete definition
Nothing
Methods
parseNamedRecord :: NamedRecord -> Parser a Source
Instances
| FromField a => FromNamedRecord (Map ByteString a) | |
| FromField a => FromNamedRecord (HashMap ByteString a) |
lookup :: FromField a => NamedRecord -> ByteString -> Parser a Source
Retrieve a field in the given record by name. The result is
empty if the field is missing or if the value cannot be converted
to the desired type.
(.:) :: FromField a => NamedRecord -> ByteString -> Parser a Source
Alias for lookup.
class ToNamedRecord a where Source
A type that can be converted to a single CSV record.
An example type and instance:
data Person = Person { name :: !Text, age :: !Int }
instance ToNamedRecord Person where
toNamedRecord (Person name age) = namedRecord [
"name" .= name, "age" .= age]Minimal complete definition
Nothing
Methods
toNamedRecord :: a -> NamedRecord Source
Instances
| ToField a => ToNamedRecord (Map ByteString a) | |
| ToField a => ToNamedRecord (HashMap ByteString a) |
namedRecord :: [(ByteString, ByteString)] -> NamedRecord Source
Construct a named record from a list of name-value ByteString
pairs. Use .= to construct such a pair from a name and a value.
namedField :: ToField a => ByteString -> a -> (ByteString, ByteString) Source
Construct a pair from a name and a value. For use with
namedRecord.
(.=) :: ToField a => ByteString -> a -> (ByteString, ByteString) Source
Alias for namedField.
Field conversion
The FromField and ToField classes define how to convert between
Fields and values you care about (e.g. Ints). Most of the time
you don't need to write your own instances as the standard ones
cover most use cases.
class FromField a where Source
A type that can be converted from a single CSV field, with the possibility of failure.
When writing an instance, use empty, mzero, or fail to make a
conversion fail, e.g. if a Field can't be converted to the given
type.
Example type and instance:
{-# LANGUAGE OverloadedStrings #-}
data Color = Red | Green | Blue
instance FromField Color where
parseField s
| s == "R" = pure Red
| s == "G" = pure Green
| s == "B" = pure Blue
| otherwise = mzeroMethods
parseField :: Field -> Parser a Source
Instances
| FromField Char | Assumes UTF-8 encoding. |
| FromField Double | Accepts same syntax as |
| FromField Float | Accepts same syntax as |
| FromField Int | Accepts a signed decimal number. Ignores whitespace. |
| FromField Int8 | Accepts a signed decimal number. Ignores whitespace. |
| FromField Int16 | Accepts a signed decimal number. Ignores whitespace. |
| FromField Int32 | Accepts a signed decimal number. Ignores whitespace. |
| FromField Int64 | Accepts a signed decimal number. Ignores whitespace. |
| FromField Integer | Accepts a signed decimal number. Ignores whitespace. |
| FromField Word | Accepts an unsigned decimal number. Ignores whitespace. |
| FromField Word8 | Accepts an unsigned decimal number. Ignores whitespace. |
| FromField Word16 | Accepts an unsigned decimal number. Ignores whitespace. |
| FromField Word32 | Accepts an unsigned decimal number. Ignores whitespace. |
| FromField Word64 | Accepts an unsigned decimal number. Ignores whitespace. |
| FromField () | Ignores the |
| FromField ByteString | |
| FromField ByteString | |
| FromField Text | Assumes UTF-8 encoding. Fails on invalid byte sequences. |
| FromField Text | Assumes UTF-8 encoding. Fails on invalid byte sequences. |
| FromField [Char] | Assumes UTF-8 encoding. Fails on invalid byte sequences. |
| FromField a => FromField (Maybe a) | |
| FromField a => FromField (Either Field a) |
A type that can be converted to a single CSV field.
Example type and instance:
{-# LANGUAGE OverloadedStrings #-}
data Color = Red | Green | Blue
instance ToField Color where
toField Red = "R"
toField Green = "G"
toField Blue = "B"Instances
| ToField Char | Uses UTF-8 encoding. |
| ToField Double | Uses decimal notation or scientific notation, depending on the number. |
| ToField Float | Uses decimal notation or scientific notation, depending on the number. |
| ToField Int | Uses decimal encoding with optional sign. |
| ToField Int8 | Uses decimal encoding with optional sign. |
| ToField Int16 | Uses decimal encoding with optional sign. |
| ToField Int32 | Uses decimal encoding with optional sign. |
| ToField Int64 | Uses decimal encoding with optional sign. |
| ToField Integer | Uses decimal encoding with optional sign. |
| ToField Word | Uses decimal encoding. |
| ToField Word8 | Uses decimal encoding. |
| ToField Word16 | Uses decimal encoding. |
| ToField Word32 | Uses decimal encoding. |
| ToField Word64 | Uses decimal encoding. |
| ToField ByteString | |
| ToField ByteString | |
| ToField Text | Uses UTF-8 encoding. |
| ToField Text | Uses UTF-8 encoding. |
| ToField [Char] | Uses UTF-8 encoding. |
| ToField a => ToField (Maybe a) |