The csv-enumerator package

[Tags:bsd3, library]

CSV files are the de-facto standard in many situations involving data transfer, particularly when dealing with enterprise application or disparate database systems.

While there are a number of CSV libraries in Haskell, at the time of this project's start in 2010, there wasn't one that provided all of the following:

This library is an attempt to close these gaps.

For more documentation and examples, check out the README at:

http://github.com/ozataman/csv-enumerator

The API is fairly well documented and I would encourage you to keep your haddocks handy. If you run into problems, just email me or holler over at #haskell.


[Skip to Readme]

Properties

Versions 0.8, 0.8.2, 0.9.0, 0.9.2, 0.9.2.1, 0.9.3, 0.9.5, 0.10.1.0, 0.10.1.1, 0.10.2.0
Dependencies attoparsec (>=0.8), attoparsec-enumerator (>=0.2), base (==4.*), bytestring, containers (>=0.3), directory, enumerator (>=0.4.5), safe, transformers (>=0.2), unix-compat (>=0.2.1.1) [details]
License BSD3
Author Ozgun Ataman
Maintainer Ozgun Ataman <ozataman@gmail.com>
Stability Unknown
Category Data
Home page http://github.com/ozataman/csv-enumerator
Uploaded Tue Nov 15 18:09:57 UTC 2011 by OzgunAtaman
Distributions NixOS:0.10.2.0
Downloads 1983 total (28 in the last 30 days)
Votes
0 []
Status Docs uploaded by user
Build status unknown [no reports yet]

Modules

[Index]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees

Readme for csv-enumerator

Readme for csv-enumerator-0.9.5

README

CSV Files and Haskell

CSV files are the de-facto standard in many cases of data transfer, particularly when dealing with enterprise application or disparate database systems.

While there are a number of csv libraries in Haskell, at the time of this project's start in 2010, there wasn't one that provided all of the following:

  • Full flexibility in quote characters, separators, input/output
  • Constant space operation
  • Robust parsing and error resiliency
  • Fast operation
  • Convenient interface that supports a variety of use cases

This library is an attempt to close these gaps.

This package

csv-enumerator is an enumerator-based CSV parsing library that is easy to use, flexible and fast. Furthermore, it provides ways to use constant-space during operation, which is absolutely critical in many real world use cases.

Introduction

  • ByteStrings are used for everything
  • There are 2 basic row types and they implement exactly the same operations, so you can chose the right one for the job at hand:
    • type MapRow = Map ByteString ByteString
    • type Row = [ByteString]
  • Folding over a CSV file can be thought of as the most basic operation.
  • Higher level convenience functions are provided to "map" over CSV files, modifying and transforming them along the way.
  • Helpers are provided for simple input/output of CSV files for simple use cases.
  • For extreme / advanced use cases, the user can drop down to the Enumerator/Iteratee level and do interleaved IO among other things.

API Docs

The API is quite well documented and I would encourage you to keep it handy.

Speed

While fast operation is of concern, I have so far cared more about correct operation and a flexible API. Please let me know if you notice any performance regressions or optimization opportunities.

Usage Examples

Example 1: Basic Operation

{-# LANGUAGE OverloadedStrings #-}

import Data.CSV.Enumerator
import Data.Char (isSpace)
import qualified Data.Map as M
import Data.Map ((!))

-- Naive whitespace stripper
strip = reverse . B.dropWhile isSpace . reverse . B.dropWhile isSpace

-- A function that takes a row and "emits" zero or more rows as output.
processRow :: MapRow -> [MapRow]
processRow row = [M.insert "Column1" fixedCol row]
  where fixedCol = strip (row ! "Column1")

main = mapCSVFile "InputFile.csv" defCSVSettings procesRow "OutputFile.csv"

and we are done.

Further examples to be provided at a later time.

TODO - Next Steps

  • Refactor all operations to use iterCSV as the basic building block -- in progress.
  • The CSVeable typeclass can be refactored to have a more minimal definition.
  • Get mapCSVFiles out of the typeclass if possible.
  • Need to think about specializing an Exception type for the library and properly notifying the user when parsing-related problems occur.
  • Some operations can be further broken down to their atoms, increasing the flexibility of the library.
  • Operating on Text in addition to ByteString would be phenomenal.
  • A test-suite needs to be added.
  • Some benchmarking would be nice.

Any and all kinds of help is much appreciated!