hw-dsv: Unbelievably fast streaming DSV file parser

[ bsd3, csv, data-structures, library, program, simd, succinct-data-structures, text ] [ Propose Tags ]

Please see the README on Github at https://github.com/haskell-works/hw-dsv#readme

[Skip to Readme]


Automatic Flags

Enable bmi2 instruction set


Enable SSE 4.2 optimisations.


Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info


Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Versions [RSS], 0.2, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.3.6, 0.3.7, 0.3.8, 0.4.0,,,
Change log ChangeLog.md
Dependencies base (>=4.7 && <4.12), bits-extra (>= && <0.1), bytestring (>=0.10 && <0.11), deepseq (>=1.4 && <1.5), hedgehog (>=0.5 && <0.7), hw-bits (>= && <0.8), hw-dsv, hw-prim (>= && <0.7), hw-rankselect (>= && <0.13), hw-rankselect-base (>= && <0.4), lens (>=4.15 && <5), optparse-applicative (>=0.13 && <0.15), resourcet (>=1.1 && <1.3), semigroups (>=0.8.4 && <0.19), transformers (>=0.4 && <0.6), vector (>= && <0.13) [details]
License BSD-3-Clause
Copyright 2018 John Ky
Author John Ky
Maintainer newhoggy@gmail.com
Revised Revision 2 made by GeorgeWilson at 2018-09-27T00:31:42Z
Category Text, Web, CSV
Home page https://github.com/haskell-works/hw-dsv#readme
Bug tracker https://github.com/haskell-works/hw-dsv/issues
Source repo head: git clone https://github.com/haskell-works/hw-dsv
Uploaded by GeorgeWilson at 2018-06-18T07:00:48Z
Reverse Dependencies 3 direct, 3 indirect [details]
Executables hw-dsv
Downloads 8188 total (37 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2018-06-18 [all 1 reports]

Readme for hw-dsv-0.2.1

[back to package description]


CircleCI Travis

Unbelievably fast streaming DSV file parser that reads based on succinct data structures.

This library will use support for some BMI2 CPU instructions on some x86 based CPUs if compiled with the appropriate flags on ghc-8.4.1 or later.



It is sufficient to build, test and benchmark the library as follows for basic performance. The library will be compiled to use broadword implementation of rank & select, which has reasonable performance.

stack build
stack test
stack bench

For best performance, add the bmi2 flag to target the BMI2 instruction set:

stack build   --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-dsv:bmi2
stack test    --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-dsv:bmi2
stack bench   --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-dsv:bmi2
stack install --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-dsv:bmi2

Benchmark results

The following benchmark shows the kinds of performance gain that can be expected from enabling the BMI2 instruction set for CPU targets that support them. Benchmarks were run on 2.9 GHz Intel Core i7, macOS High Sierra.

With BMI2 disabled:

$ stack install
$ cat 7g.csv | pv -t -e -b -a | hw-dsv query-lazy -k 0 -k 1 -d , -e '|' > /dev/null
7.08GiB 0:07:25 [16.3MiB/s]

With BMI2 enabled:

$ stack install --flag bits-extra:bmi2 --flag hw-bits:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-dsv:bmi2
$ cat 7g.csv | pv -t -e -b -a | hw-dsv query-lazy -k 0 -k 1 -d , -e '|' > /dev/null
7.08GiB 0:00:52 [ 138MiB/s]

Using hw-dsv as a library

{-# LANGUAGE ScopedTypeVariables #-}

module Example where

import qualified Data.ByteString.Lazy              as LBS
import qualified Data.Vector                       as DV
import qualified HaskellWorks.Data.Dsv.Lazy.Cursor as SVL

example :: IO ()
example = do
  bs <- LBS.readFile "sample.csv"
  let c = SVL.makeCursor ',' bs
  let rows :: [DV.Vector LBS.ByteString] = SVL.toListVector c

  return ()