The cassava package

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain]

Warnings:

cassava is a library for parsing and encoding RFC 4180 compliant comma-separated values (CSV) data, which is a textual line-oriented format commonly used for exchanging tabular data.

cassava's API includes support for

Moreover, this library is designed to be easy to use; for instance, here's a very simple example of encoding CSV data:

>>> Data.Csv.encode [("John",27),("Jane",28)]
"John,27\r\nJane,28\r\n"

Please refer to the documentation in Data.Csv and the included README for more usage examples.


[Skip to ReadMe]

Properties

Versions0.1.0.0, 0.1.0.1, 0.2.0.0, 0.2.1.0, 0.2.1.1, 0.2.1.2, 0.2.2.0, 0.3.0.0, 0.3.0.1, 0.4.0.0, 0.4.1.0, 0.4.2.0, 0.4.2.1, 0.4.2.2, 0.4.2.3, 0.4.2.4, 0.4.3.0, 0.4.3.1, 0.4.4.0, 0.4.5.0, 0.4.5.1, 0.5.0.0, 0.5.1.0, 0.5.1.0
Change logCHANGES.md
Dependenciesarray (>=0.4 && <0.6), attoparsec (>=0.11.3.0 && <0.14), base (>=4.5 && <5), bytestring (>=0.9.2 && <0.11), bytestring-builder (>=0.10.8 && <0.11), containers (>=0.4.2 && <0.6), deepseq (>=1.1 && <1.5), fail (==4.9.*), ghc-prim (==0.2.*), hashable (<1.3), nats (>=1 && <1.2), Only (>=0.1 && <0.1.1), scientific (>=0.3.4.7 && <0.4), semigroups (==0.18.*), text (<1.3), text-short (==0.1.*), unordered-containers (<0.3), vector (>=0.8 && <0.13) [details]
LicenseBSD3
Copyright(c) 2012 Johan Tibell (c) 2012 Bryan O'Sullivan (c) 2011 MailRank, Inc.
AuthorJohan Tibell
Maintainerhvr@gnu.org
CategoryText, Web, CSV
Home pagehttps://github.com/hvr/cassava
Bug trackerhttps://github.com/hvr/cassava/issues
Source repositoryhead: git clone https://github.com/hvr/cassava.git
UploadedSat Aug 12 16:05:20 UTC 2017 by HerbertValerioRiedel

Modules

[Index]

Flags

NameDescriptionDefaultType
bytestring--lt-0_10_4

bytestring < 0.10.4

EnabledAutomatic

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainers' corner

For package maintainers and hackage trustees

Readme for cassava-0.5.1.0

cassava: A CSV parsing and encoding library Hackage Build Status

Please refer to the package description for an overview of cassava.

Usage example

Here's the two second crash course in using the library. Given a CSV file with this content:

John Doe,50000
Jane Doe,60000

here's how you'd process it record-by-record:

{-# LANGUAGE ScopedTypeVariables #-}

import qualified Data.ByteString.Lazy as BL
import Data.Csv
import qualified Data.Vector as V

main :: IO ()
main = do
    csvData <- BL.readFile "salaries.csv"
    case decode NoHeader csvData of
        Left err -> putStrLn err
        Right v -> V.forM_ v $ \ (name, salary :: Int) ->
            putStrLn $ name ++ " earns " ++ show salary ++ " dollars"

If you want to parse a file that includes a header, like this one

name,salary
John Doe,50000
Jane Doe,60000

use decodeByName:

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative
import qualified Data.ByteString.Lazy as BL
import Data.Csv
import qualified Data.Vector as V

data Person = Person
    { name   :: !String
    , salary :: !Int
    }

instance FromNamedRecord Person where
    parseNamedRecord r = Person <$> r .: "name" <*> r .: "salary"

main :: IO ()
main = do
    csvData <- BL.readFile "salaries.csv"
    case decodeByName csvData of
        Left err -> putStrLn err
        Right (_, v) -> V.forM_ v $ \ p ->
            putStrLn $ name p ++ " earns " ++ show (salary p) ++ " dollars"

You can find more code examples in the examples/ folder as well as smaller usage examples in the Data.Csv module documentation.

Project Goals for cassava

There's no end to what people consider CSV data. Most programs don't follow RFC4180 so one has to make a judgment call which contributions to accept. Consequently, not everything gets accepted, because then we'd end up with a (slow) general purpose parsing library. There are plenty of those. The goal is to roughly accept what the Python csv module accepts.

The Python csv module (which is implemented in C) is also considered the base-line for performance. Adding options (e.g. the above mentioned parsing "flexibility") will have to be a trade off against performance. There's been complaints about performance in the past, therefore, if in doubt performance wins over features.

Last but not least, it's important to keep the dependency footprint light, as each additional dependency incurs costs and risks in terms of additional maintenance overhead and loss of flexibility. So adding a new package dependency should only be done if that dependency is known to be a reliable package and there's a clear benefit which outweights the cost.

Further reading

The primary API documentation for cassava is its Haddock documentation which can be found at http://hackage.haskell.org/package/cassava/docs/Data-Csv.html

Below are listed additional recommended third-party blogposts and tutorials