hw-dsv: Unbelievably fast streaming DSV file parser

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain]

Warnings:

Please see the README on Github at https://github.com/haskell-works/hw-dsv#readme


[Skip to ReadMe]

Properties

Versions0.1.0.0, 0.2, 0.2.1, 0.3.0, 0.3.0
Change logChangeLog.md
Dependenciesbase (>=4.7 && <5), bits-extra (>=0.0.1.2 && <0.1), bytestring (==0.10.*), deepseq (==1.4.*), ghc-prim, hedgehog (>=0.5 && <0.7), hw-bits (>=0.7.0.2 && <0.8), hw-dsv, hw-prim (>=0.6.2.14 && <0.7), hw-rankselect (>=0.12.0.2 && <0.13), hw-rankselect-base (>=0.3.2.0 && <0.4), hw-simd (>=0.1.1.2 && <0.2), lens (>=4.15 && <5), optparse-applicative (>=0.13 && <0.15), resourcet (>=1.1 && <1.3), semigroups (>=0.8.4 && <0.19), transformers (>=0.4 && <0.6), vector (>=0.12.0.1 && <0.13) [details]
LicenseBSD-3-Clause
Copyright2018 John Ky
AuthorJohn Ky
Maintainernewhoggy@gmail.com
CategoryText, CSV, SIMD, Succinct Data Structures, Data Structures
Home pagehttps://github.com/haskell-works/hw-dsv#readme
Bug trackerhttps://github.com/haskell-works/hw-dsv/issues
Source repositoryhead: git clone https://github.com/haskell-works/hw-dsv
Executableshw-dsv
UploadedThu Sep 27 01:27:23 UTC 2018 by haskellworks

Modules

[Index]

Flags

NameDescriptionDefaultType
avx2

Enable avx2 instruction set

DisabledAutomatic
bmi2

Enable bmi2 instruction set

DisabledAutomatic
sse42

Enable SSE 4.2 optimisations.

EnabledAutomatic

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainers' corner

For package maintainers and hackage trustees


Readme for hw-dsv-0.3.0

[back to package description]

hw-dsv

CircleCI Travis

Unbelievably fast streaming DSV file parser that reads based on succinct data structures.

This library will use support for some BMI2 or AVX2 CPU instructions on some x86 based CPUs if compiled with the appropriate flags on ghc-8.4.1 or later.

Compilation

Pre-requisites:

It is sufficient to build, test and benchmark the library as follows for basic performance. The library will be compiled to use broadword implementation of rank & select, which has reasonable performance.

stack build
stack test
stack bench

For best performance, add the bmi2 and avx2 flag to target the BMI2 and AVS2 instruction sets:

stack build   --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-simd:bmi2 --flag hw-simd:avx2 --flag hw-dsv:bmi2 --flag hw-dsv:avx2
stack test    --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-simd:bmi2 --flag hw-simd:avx2 --flag hw-dsv:bmi2 --flag hw-dsv:avx2
stack bench   --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-simd:bmi2 --flag hw-simd:avx2 --flag hw-dsv:bmi2 --flag hw-dsv:avx2
stack install --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-simd:bmi2 --flag hw-simd:avx2 --flag hw-dsv:bmi2 --flag hw-dsv:avx2

For slightly older CPUs, add only the bmi2 flag to target the BMI2 instruction set:

stack build   --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-simd:bmi2 --flag hw-dsv:bmi2
stack test    --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-simd:bmi2 --flag hw-dsv:bmi2
stack bench   --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-simd:bmi2 --flag hw-dsv:bmi2
stack install --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-simd:bmi2 --flag hw-dsv:bmi2

Benchmark results

The following benchmark shows the kinds of performance gain that can be expected from enabling the BMI2 instruction set for CPU targets that support them. Benchmarks were run on 2.9 GHz Intel Core i7, macOS High Sierra.

With BMI2 disabled:

$ stack install
$ cat 7g.csv | pv -t -e -b -a | hw-dsv query-lazy -k 0 -k 1 -d , -e '|' > /dev/null
7.08GiB 0:07:25 [16.3MiB/s]

With BMI2 and AVX2 enabled:

$ stack install --flag bits-extra:bmi2 --flag hw-bits:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-dsv:bmi2 --flag hw-dsv:avx2
$ cat 7gb.csv | pv -t -e -b -a | hw-dsv query-lazy -k 0 -k 1 -d , -e '|' > /dev/null
7.08GiB 0:00:39 [ 181MiB/s]

With only BMI2 enabled:

$ stack install --flag bits-extra:bmi2 --flag hw-bits:bmi2 --flag hw-rankselect-base:bmi2 --flag hw-rankselect:bmi2 --flag hw-dsv:bmi2
$ cat 7gb.csv | pv -t -e -b -a | hw-dsv query-lazy -k 0 -k 1 -d , -e '|' > /dev/null
7.08GiB 0:00:43 [ 165MiB/s]

Using hw-dsv as a library

{-# LANGUAGE ScopedTypeVariables #-}

module Example where

import qualified Data.ByteString.Lazy              as LBS
import qualified Data.Vector                       as DV
import qualified HaskellWorks.Data.Dsv.Lazy.Cursor as SVL

example :: IO ()
example = do
  bs <- LBS.readFile "sample.csv"
  let c = SVL.makeCursor ',' bs
  let rows :: [DV.Vector LBS.ByteString] = SVL.toListVector c

  return ()