csv-conduit: A flexible, fast, conduit-based CSV parser library for Haskell.

[ bsd3, conduit, csv, data, library, text ] [ Propose Tags ]

CSV files are the de-facto standard in many situations involving data transfer, particularly when dealing with enterprise application or disparate database systems.

While there are a number of CSV libraries in Haskell, at the time of this project's start in 2010, there wasn't one that provided all of the following:

  • Full flexibility in quote characters, separators, input/output

  • Constant space operation

  • Robust parsing, correctness and error resiliency

  • Convenient interface that supports a variety of use cases

  • Fast operation

This library is an attempt to close these gaps. Please note that this library started its life based on the enumerator package and has recently been ported to work with conduits instead. In the process, it has been greatly simplified thanks to the modular nature of the conduits library.

Following the port to conduits, the library has also gained the ability to parameterize on the stream type and work both with ByteString and Text.

For more documentation and examples, check out the README at:

http://github.com/ozataman/csv-conduit


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1, 0.2, 0.2.1.1, 0.3, 0.3.0.1, 0.3.0.2, 0.3.0.3, 0.4.1, 0.5.0, 0.5.1, 0.6.2, 0.6.2.1, 0.6.3, 0.6.5, 0.6.6, 0.6.7, 0.6.8, 0.6.8.1, 0.7.0.0, 0.7.1.0, 0.7.2.0, 0.7.3.0
Dependencies attoparsec (>=0.10), attoparsec-conduit, base (>=4 && <5), bytestring, conduit (>=0.4 && <0.5), containers (>=0.3), directory, monad-control, safe, text, transformers (>=0.2), unix-compat (>=0.2.1.1) [details]
License BSD-3-Clause
Author Ozgun Ataman
Maintainer Ozgun Ataman <ozataman@gmail.com>
Category Data, Conduit
Home page http://github.com/ozataman/csv-conduit
Uploaded by OzgunAtaman at 2012-04-16T16:50:07Z
Distributions Debian:0.7.1.0
Reverse Dependencies 5 direct, 1 indirect [details]
Downloads 20340 total (47 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for csv-conduit-0.2

[back to package description]

README

CSV Files and Haskell

CSV files are the de-facto standard in many cases of data transfer, particularly when dealing with enterprise application or disparate database systems.

While there are a number of csv libraries in Haskell, at the time of this project's start in 2010, there wasn't one that provided all of the following:

  • Full flexibility in quote characters, separators, input/output
  • Constant space operation
  • Robust parsing and error resiliency
  • Fast operation
  • Convenient interface that supports a variety of use cases

This library is an attempt to close these gaps.

This package

csv-conduit is a conduits based CSV parsing library that is easy to use, flexible and fast. Furthermore, it provides ways to use constant-space during operation, which is absolutely critical in many real world use cases.

Introduction

  • The CSVeable typeclass implements the key operations.
  • CSVeable is parameterized on both a stream type and a target CSV row type.
  • There are 2 basic row types and they implement exactly the same operations, so you can chose the right one for the job at hand:
    • type MapRow t = Map t t
    • type Row t = [t]
  • You basically use the Conduits defined in this library to do the parsing from a CSV stream and rendering back into a CSV stream.
  • Use the full flexibility and modularity of conduits for sources and sinks.

Speed

While fast operation is of concern, I have so far cared more about correct operation and a flexible API. Please let me know if you notice any performance regressions or optimization opportunities.

Usage Examples

Example #1: Basics Using Convenience API

{-# LANGUAGE OverloadedStrings #-}

import Data.Conduit
import Data.Conduit.Binary
import Data.Conduit.List as CL
import Data.CSV.Conduit

-- Just reverse te columns
myProcessor :: Conduit (Row Text) m (Row Text)
myProcessor = CL.map reverse

test :: IO ()
test = runResourceT $ 
  transformCSV defCSVSettings 
               (sourceFile "input.csv") 
               myProcessor
               (sinkFile "output.csv")

Example #2: Basics Using Conduit API

{-# LANGUAGE OverloadedStrings #-}

import Data.Conduit
import Data.Conduit.Binary
import Data.CSV.Conduit

myProcessor :: Conduit (Row Text) m (Row Text)
myProcessor = undefined

-- Let's simply stream from a file, parse the CSV, reserialize it
-- and push back into another file.
test :: IO ()
test = runResourceT $ 
  sourceFile "test/BigFile.csv" $= 
  intoCSV defCSVSettings $=
  myProcessor $=
  fromCSV defCSVSettings $$
  sinkFile "test/BigFileOut.csv"