hw-bits: Conduits for tokenizing streams.

[ bit, bsd3, data, library ] [ Propose Tags ]

Please see README.md


[Skip to Readme]
Versions [faq] 0.0.0.1, 0.0.0.2, 0.0.0.3, 0.0.0.5, 0.0.0.6, 0.0.0.7, 0.0.0.8, 0.0.0.9, 0.0.0.10, 0.0.0.11, 0.0.0.12, 0.1.0.0, 0.1.0.1, 0.2.0.0, 0.2.0.1, 0.2.0.2, 0.3.0.0, 0.4.0.0, 0.5.0.0, 0.5.0.1, 0.5.0.2, 0.5.0.3, 0.6.0.0, 0.7.0.0, 0.7.0.1, 0.7.0.2, 0.7.0.3, 0.7.0.4, 0.7.0.5, 0.7.0.6, 0.7.0.7, 0.7.0.8
Dependencies array, attoparsec (>=0.10), base (>=4.7 && <4.9), bytestring, conduit (>=1.1 && <1.3), criterion (>=1.1.0.0 && <1.2), deepseq (<1.5), ghc-prim, hw-bits, lens, mmap, mono-traversable, parsec, QuickCheck, random, resourcet (>=1.1), safe, text, vector (>=0.6 && <0.12), word8 [details]
License BSD-3-Clause
Copyright 2016 John Ky
Author John Ky
Maintainer newhoggy@gmail.com
Revised Revision 1 made by GeorgeWilson at Wed May 30 02:02:32 UTC 2018
Category Data, Conduit
Home page http://github.com/haskell-works/hw-bits#readme
Source repo head: git clone https://github.com/haskell-works/hw-bits
Uploaded by newhoggy at Mon Apr 11 21:57:00 UTC 2016
Distributions LTSHaskell:0.7.0.6, NixOS:0.7.0.8, Stackage:0.7.0.6
Executables hw-bits-example
Downloads 10810 total (448 in the last 30 days)
Rating 2.0 (votes: 1) [estimated by rule of succession]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs available [build log]
Last success reported on 2016-04-11 [all 1 reports]

Modules

[Index]

Downloads

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

For package maintainers and hackage trustees


Readme for hw-bits-0.0.0.2

[back to package description]

hw-succinct

Circle CI Conduits for tokenizing streams.

hw-succinct is a succinct JSON parsing library. It uses succinct data-structures to allow traversal of large JSON strings with minimal memory overhead.

It is currently considered experimental.

For an example, see app/Main.hs

Prerequisites

  • Install haskell-stack.
  • Install hlint (eg. stack install hlint)

Building

Run the following in the shell:

git clone git@github.com:haskell-works/hw-succinct.git
cd hw-succinct
stack setup
stack build
stack test
stack ghci --ghc-options -XOverloadedStrings \
  --main-is hw-succinct:exe:hw-succinct-example

Memory benchmark

Parsing large Json files in Scala with Argonaut

      S0U       EU           OU       MU     CCSU CMD
--------- --------- ----------- -------- -------- ---------------------------------------------------------------
      0.0  80,526.3    76,163.6 72,338.6 13,058.6 sbt console
      0.0 536,660.4    76,163.6 72,338.6 13,058.6 import java.io._, argonaut._, Argonaut._
      0.0 552,389.1    76,163.6 72,338.6 13,058.6 val file = new File("/Users/jky/Downloads/78mbs.json"
      0.0 634,066.5    76,163.6 72,338.6 13,058.6 val array = new Array[Byte](file.length.asInstanceOf[Int])
      0.0 644,552.3    76,163.6 72,338.6 13,058.6 val is = new FileInputStream("/Users/jky/Downloads/78mbs.json")
      0.0 655,038.1    76,163.6 72,338.6 13,058.6 is.read(array)
294,976.0 160,159.7 1,100,365.0 79,310.8 13,748.1 val json = new String(array)
285,182.9 146,392.6 1,956,264.5 82,679.8 14,099.6 val data = Parse.parse(json)
                    ***********

Parsing large Json files in Haskell with Aeson

Mem (MB) CMD
-------- ---------------------------------------------------------
     302 import Data.Aeson
     302 import qualified  Data.ByteString.Lazy as BSL
     302 json78m <- BSL.readFile "/Users/jky/Downloads/78mbs.json"
    1400 let !x = decode json78m :: Maybe Value

Parsing large Json files in Haskell with hw-succinct

Mem (MB) CMD
-------- ---------------------------------------------------------
     274 import Foreign
     274 import qualified Data.Vector.Storable as DVS
     274 import qualified Data.ByteString as BS
     274 import System.IO.MMap
     274 import Data.Word
     274 (fptr :: ForeignPtr Word8, offset, size) <- mmapFileForeignPtr "/Users/jky/Downloads/78mbs.json" ReadOnly Nothing
     601 cursor <- measure (fromForeignRegion (fptr, offset, size) :: JsonCursor BS.ByteString (BitShown (DVS.Vector Word64)) (SimpleBalancedParens (DVS.Vector Word64)))

Examples

import Foreign
import qualified Data.Vector.Storable as DVS
import qualified Data.ByteString as BS
import qualified Data.ByteString.Internal as BSI
import System.IO.MMap
import Data.Word
import System.CPUTime
(fptr :: ForeignPtr Word8, offset, size) <- mmapFileForeignPtr "/Users/jky/Downloads/78mbs.json" ReadOnly Nothing
cursor <- measure (fromForeignRegion (fptr, offset, size) :: JsonCursor BS.ByteString (BitShown (DVS.Vector Word64)) (SimpleBalancedParens (DVS.Vector Word64)))
let !bs = BSI.fromForeignPtr (castForeignPtr fptr) offset size
x <- measure $ jsonBsToInterestBs bs
let !y = runListConduit [bs] (unescape' "")

import Foreign
import qualified Data.Vector.Storable as DVS
import qualified Data.ByteString as BS
import qualified Data.ByteString.Internal as BSI
import System.IO.MMap
import Data.Word
import System.CPUTime
(fptr :: ForeignPtr Word8, offset, size) <- mmapFileForeignPtr "/Users/jky/Downloads/part40.json" ReadOnly Nothing
let !bs = BSI.fromForeignPtr (castForeignPtr fptr) offset size
x <- measure $ BS.concat $ runListConduit [bs] (blankJson =$= blankedJsonToInterestBits)
x <- measure $ jsonBsToInterestBs bs

jsonTokenAt $ J.nextSibling $ J.firstChild $ J.nextSibling $ J.firstChild $ J.firstChild  cursor

References

Special mentions