csv-enumerator: A flexible, fast, enumerator-based CSV parser library for Haskell.

[ bsd3, data, library ] [ Propose Tags ]

CSV files are the de-facto standard in many situations involving data transfer, particularly when dealing with enterprise application or disparate database systems.

While there are a number of CSV libraries in Haskell, at the time of this project's start in 2010, there wasn't one that provided all of the following:

Full flexibility in quote characters, separators, input/output
Constant space operation
Robust parsing, correctness and error resiliency
Convenient interface that supports a variety of use cases
Fast operation

This library is an attempt to close these gaps.

For more documentation and examples, check out the README at:

http://github.com/ozataman/csv-enumerator

The API is fairly well documented and I would encourage you to keep your haddocks handy. If you run into problems, just email me or holler over at #haskell.

[Skip to Readme]

Modules

[Index]

Data
- CSV
  - Data.CSV.Enumerator
    - Data.CSV.Enumerator.Parser

Downloads

csv-enumerator-0.9.5.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

OzgunAtaman

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.8, 0.8.2, 0.9.0, 0.9.2, 0.9.2.1, 0.9.3, 0.9.5, 0.10.1.0, 0.10.1.1, 0.10.2.0
Dependencies	attoparsec (>=0.8), attoparsec-enumerator (>=0.2), base (>=4 && <5), bytestring, containers (>=0.3), directory, enumerator (>=0.4.5), safe, transformers (>=0.2), unix-compat (>=0.2.1.1) [details]
License	BSD-3-Clause
Author	Ozgun Ataman
Maintainer	Ozgun Ataman <ozataman@gmail.com>
Category	Data
Home page	http://github.com/ozataman/csv-enumerator
Uploaded	by OzgunAtaman at 2011-11-15T18:09:57Z
Distributions
Reverse Dependencies	2 direct, 0 indirect [details]
Downloads	7673 total (22 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs uploaded by user Build status unknown [no reports yet]

Readme for csv-enumerator-0.9.5

[back to package description]

README

CSV Files and Haskell

CSV files are the de-facto standard in many cases of data transfer, particularly when dealing with enterprise application or disparate database systems.

While there are a number of csv libraries in Haskell, at the time of this project's start in 2010, there wasn't one that provided all of the following:

Full flexibility in quote characters, separators, input/output
Constant space operation
Robust parsing and error resiliency
Fast operation
Convenient interface that supports a variety of use cases

This library is an attempt to close these gaps.

This package

csv-enumerator is an enumerator-based CSV parsing library that is easy to use, flexible and fast. Furthermore, it provides ways to use constant-space during operation, which is absolutely critical in many real world use cases.

Introduction

ByteStrings are used for everything
There are 2 basic row types and they implement exactly the same operations, so you can chose the right one for the job at hand:
- type MapRow = Map ByteString ByteString
- type Row = [ByteString]
Folding over a CSV file can be thought of as the most basic operation.
Higher level convenience functions are provided to "map" over CSV files, modifying and transforming them along the way.
Helpers are provided for simple input/output of CSV files for simple use cases.
For extreme / advanced use cases, the user can drop down to the Enumerator/Iteratee level and do interleaved IO among other things.

API Docs

The API is quite well documented and I would encourage you to keep it handy.

Speed

While fast operation is of concern, I have so far cared more about correct operation and a flexible API. Please let me know if you notice any performance regressions or optimization opportunities.

Usage Examples

Example 1: Basic Operation

{-# LANGUAGE OverloadedStrings #-}

import Data.CSV.Enumerator
import Data.Char (isSpace)
import qualified Data.Map as M
import Data.Map ((!))

-- Naive whitespace stripper
strip = reverse . B.dropWhile isSpace . reverse . B.dropWhile isSpace

-- A function that takes a row and "emits" zero or more rows as output.
processRow :: MapRow -> [MapRow]
processRow row = [M.insert "Column1" fixedCol row]
  where fixedCol = strip (row ! "Column1")

main = mapCSVFile "InputFile.csv" defCSVSettings procesRow "OutputFile.csv"

and we are done.

Further examples to be provided at a later time.

TODO - Next Steps

Refactor all operations to use iterCSV as the basic building block -- in progress.
The CSVeable typeclass can be refactored to have a more minimal definition.
Get mapCSVFiles out of the typeclass if possible.
Need to think about specializing an Exception type for the library and properly notifying the user when parsing-related problems occur.
Some operations can be further broken down to their atoms, increasing the flexibility of the library.
Operating on Text in addition to ByteString would be phenomenal.
A test-suite needs to be added.
Some benchmarking would be nice.

Any and all kinds of help is much appreciated!