bytestring: Fast, packed, strict and lazy byte arrays with a list interface

[ bsd3, data, library ] [ Propose Tags ]

A time and space-efficient implementation of byte vectors using packed Word8 arrays, suitable for high performance use, both in terms of large data quantities, or high speed requirements. Byte vectors are encoded as strict Word8 arrays of bytes, and lazy lists of strict chunks, held in a ForeignPtr, and can be passed between C and Haskell with little effort.

Test coverage data for this library is available at: http://code.haskell.org/~dons/tests/bytestring/hpc_index.html


[Skip to Readme]

Downloads

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.9, 0.9.0.1, 0.9.0.2, 0.9.0.3, 0.9.0.4, 0.9.1.0, 0.9.1.1, 0.9.1.2, 0.9.1.3, 0.9.1.4, 0.9.1.5, 0.9.1.6, 0.9.1.7, 0.9.1.8, 0.9.1.9, 0.9.1.10, 0.9.2.0, 0.9.2.1, 0.10.0.0, 0.10.0.1, 0.10.0.2, 0.10.2.0, 0.10.4.0, 0.10.4.1, 0.10.6.0, 0.10.8.0, 0.10.8.1, 0.10.8.2, 0.10.9.0, 0.10.10.0, 0.10.10.1, 0.10.12.0, 0.10.12.1, 0.11.0.0, 0.11.1.0, 0.11.2.0, 0.11.3.0, 0.11.3.1, 0.11.4.0, 0.11.5.0, 0.11.5.1, 0.11.5.2, 0.11.5.3, 0.12.0.0, 0.12.0.1, 0.12.0.2, 0.12.1.0
Dependencies base (<4.2), ghc-prim [details]
License BSD-3-Clause
Copyright Copyright (c) Don Stewart 2005-2008, (c) Duncan Coutts 2006-2007, (c) David Roundy 2003-2005.
Author Don Stewart, Duncan Coutts
Maintainer dons@galois.com, duncan@haskell.org
Revised Revision 1 made by HerbertValerioRiedel at 2014-12-22T07:41:25Z
Category Data
Home page http://www.cse.unsw.edu.au/~dons/fps.html
Uploaded by DonaldStewart at 2008-08-25T21:06:34Z
Distributions Arch:0.11.4.0, Fedora:0.11.4.0
Reverse Dependencies 6042 direct, 8759 indirect [details]
Downloads 116740 total (519 in the last 30 days)
Rating 2.75 (votes: 18) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for bytestring-0.9.1.1

[back to package description]
------------------------------------------------------------------------
               ByteString : Fast, packed strings of bytes
------------------------------------------------------------------------

This library provides the Data.ByteString library -- strict and lazy
byte arrays manipulable as strings -- providing very time and space
efficient string and IO operations.

For very large data requirements, or constraints on heap size,
Data.ByteString.Lazy is provided, a lazy list of bytestring chunks.
Efficient processing of multi-gigabyte data can be achieved this way.

Requirements:
        > Cabal
        > GHC 6.4 or greater, or hugs

Building:
        > runhaskell Setup.lhs configure --prefix=/f/g
        > runhaskell Setup.lhs build
        > runhaskell Setup.lhs install

After installation, you can run the testsuite as follows:
    
        > cd tests ; make
    or
        > cd tests ; make hugs

For the full test and benchmark suite, you need GHC and Hugs:

        > cd tests ; make everything

Authors:
    ByteString was derived from the GHC PackedString library,
    originally written by Bryan O'Sullivan, and then by Simon Marlow.
    It was adapted, and greatly extended for darcs by David Roundy, and
    others. Don Stewart cleaned up and further extended the implementation.
    Duncan Coutts wrote much of the .Lazy code. Don, Duncan and Roman
    Leshchinskiy wrote the fusion system.

------------------------------------------------------------------------

Performance, some random numbers (with GHC):

This table compares the performance of common operations ByteString,
from various string libraries.

Size of test data: 21256k, Linux 3.2Ghz P4

                          FPS7       SPS     PS      [a]    
++                        0.028      !       !       1.288   
length                    0.000      0.000   0.000   0.131   
pack                      0.303      0.502   0.337   -       
unpack                    3.319*     1.630   7.445   -       
compare                   0.000      0.000   0.000   0.000   
index                     0.000      0.000   0.000   0.000   
map                       2.762*     2.917   4.813   7.286   
filter                    0.304      2.805   0.954   0.305   
take                      0.000      0.000   0.024   0.005   
drop                      0.000      0.000   11.768  0.130   
takeWhile                 0.000      1.498   0.000   0.000   
dropWhile                 0.000      1.985   8.447   0.130   
span                      0.000      9.289   11.144  0.131   
break                     0.000      9.383   11.268  0.133   
lines                     0.052      1.114   1.367   2.790   
unlines                   0.048      !       !       10.950  
words                     1.344      2.128   5.644   4.184   
unwords                   0.016      !       !       1.305   
reverse                   0.024      12.997  13.018  1.622   
concat                    0.000      12.701  11.459  1.163   
cons                      0.016      2.064   8.358   0.131   
empty                     0.000      0.000   0.000   0.000   
head                      0.000      0.000   0.000   0.000   
tail                      0.000      0.000   14.490  0.130   
elem                      0.000      1.490   0.001   0.000   
last                      0.000      -       -       0.143   
init                      0.000      -       -       1.147   
inits                     0.414      -       -       !       
tails                     0.460      -       -       1.136   
intersperse               0.040      -       -       10.517  
any                       0.000      -       -       0.000   
all                       0.000      -       -       0.000   
sort                      0.168      -       -       !
maximum                   0.024      -       -       0.183
minimum                   0.025      -       -       0.185
replicate                 0.000      -       -       0.053   
findIndex                 0.096
find                      0.120      -       -       0.000   
elemIndex                 0.000      -       -       0.000   
elemIndicies              0.008      -       -       0.314   
foldl                     0.148
spanEnd                   0.000
snoc                      0.016
filterChar                0.031      
filterNotChar             0.124
join                      0.016      
split                     0.032      
findIndices               0.408      
splitAt                   0.000      
lineIndices               0.029      
breakOn                   0.000      
breakSpace                0.000 
splitWith                 0.329 
dropSpace                 0.000 
dropSpaceEnd              0.000 
joinWithChar              0.017
join /                    0.016 
zip                       0.960 
zipWith                   0.892 
isSubstringOf             0.039 
isPrefixOf                0.000 
isSuffixOf                0.000
count                     0.021

Key: FPS6 = FPS 0.6
     SPS  = Simon Marlow's packedstring prototype
     PS   = Data.PackedString
     [a]  = [Char]

     -    = no function exists
     !    = stack or memory exhaustion

------------------------------------------------------------------------

== Stress testing really big strings

Doing some stress testing of FPS, here are some results for 0.5G strings.

3.2Ghz box, 2G physical mem.

Size of test data: 524288k
Size of test data: 524288k
                Char8   Word8

Effectively O(1) or O(m) where m < n
    all             0.000   0.000   
    any             0.000   0.004   
    break           0.000   0.000   
    breakChar       0.000   0.000   
    breakSpace      0.000   
    compare         0.000   
    concat          0.000   
    drop            0.000   
    dropSpace       0.000   
    dropSpaceEnd    0.000   
    dropWhile       0.000   0.000   
    elem            0.000   0.000   
    elemIndex       0.000   0.000   
    elemIndexLast   0.000   0.000   
    empty           0.000   
    head            0.000   0.000   
    index           0.000   0.000   
    init            0.000   
    last            0.000   0.000   
    length          0.000   
    notElem         0.000   0.000   
    span            0.000   0.000   
    spanChar        0.000   0.000   
    spanEnd         0.000   0.000   
    splitAt         0.000   
    tail            0.000   
    take            0.000   
    takeWhile       0.000   0.000   
    isPrefixOf      0.000   
    isSuffixOf      0.000   
    addr1           0.000   
    addr2           0.000   

O(n)
    ++              0.676   
    map             6.080   5.868   
    cons            0.396   0.396   
    snoc            0.400   0.400   
    find            3.240   
    split           1.204   1.200   
    lines           2.000   
    foldl           3.804   
    unwords         0.552   
    reverse         0.884   
    findIndex       3.128   
    filterChar      0.756   0.732   
    filter/='f'     8.265   7.012   
    filterNotChar   4.456   3.388   
    join            0.400   
    sort            4.344   
    maximum         0.776   0.764   
    minimum         0.772   0.776   
    replicate       0.008   0.000   
    elemIndices     0.240   0.240   
    lineIndices     1.092   
    joinWithChar    0.400   0.400   
    isSubstringOf   0.052   
    count           0.748   

slow O(n)
    words           38.722  
    group           77.261  
    groupBy         96.226  
    inits           32.430  
    tails           23.225  
    findIndices     13.841  15.825  
    splitWith       18.445  19.225  
    zip             33.926  
    zipWith         33.562