streaming-bytestring-0.1.0.2: effectful bytestrings, or: lazy bytestring done right

Copyright(c) Don Stewart 2006 (c) Duncan Coutts 2006-2011 (c) Michael Thompson 2015
LicenseBSD-style
Maintainerwhat_is_it_to_do_anything@yahoo.com
Stabilityexperimental
Portabilityportable
Safe HaskellNone
LanguageHaskell2010

Data.ByteString.Streaming

Contents

Description

A time and space-efficient implementation of effectful byte streams using a stream of packed Word8 arrays, suitable for high performance use, both in terms of large data quantities, or high speed requirements. Streaming ByteStrings are encoded as streams of strict chunks of bytes.

A key feature of streaming ByteStrings is the means to manipulate large or unbounded streams of data without requiring the entire sequence to be resident in memory. To take advantage of this you have to write your functions in a streaming style, e.g. classic pipeline composition. The default I/O chunk size is 32k, which should be good in most circumstances.

Some operations, such as concat, append, reverse and cons, have better complexity than their Data.ByteString equivalents, due to optimisations resulting from the list spine structure. For other operations streaming, like lazy, ByteStrings are usually within a few percent of strict ones.

This module is intended to be imported qualified, to avoid name clashes with Prelude functions. eg.

import qualified Data.ByteString.Streaming as B

Original GHC implementation by Bryan O'Sullivan. Rewritten to use UArray by Simon Marlow. Rewritten to support slices and use ForeignPtr by David Roundy. Rewritten again and extended by Don Stewart and Duncan Coutts. Lazy variant by Duncan Coutts and Don Stewart. Streaming variant by Michael Thompson, following the ideas of Gabriel Gonzales' pipes-bytestring

Synopsis

The ByteString type

data ByteString m r Source

A space-efficient representation of a succession of Word8 vectors, supporting many efficient operations.

An effectful ByteString contains 8-bit bytes, or by using the operations from Data.ByteString.Streaming.Char8 it can be interpreted as containing 8-bit characters.

Introducing and eliminating ByteStrings

empty :: ByteString m () Source

O(1) The empty ByteString -- i.e. return ()

singleton :: Monad m => Word8 -> ByteString m () Source

O(1) Yield a Word8 as a minimal ByteString

pack :: Monad m => Stream (Of Word8) m r -> ByteString m r Source

O(n) Convert a monadic stream of individual Word8s into a packed byte stream.

unpack :: Monad m => ByteString m r -> Stream (Of Word8) m r Source

O(n) Converts a packed byte stream into a stream of individual bytes.

fromLazy :: Monad m => ByteString -> ByteString m () Source

O(c) Transmute a lazy bytestring to its representation as a monadic stream of chunks.

>>> Q.putStrLn $ Q.fromLazy "hi"
hi
>>> Q.fromLazy "hi"
Chunk "hi" (Empty (()))  -- note: a 'show' instance works in the identity monad
>>> Q.fromLazy $ BL.fromChunks ["here", "are", "some", "chunks"]
Chunk "here" (Chunk "are" (Chunk "some" (Chunk "chunks" (Empty (())))))

toLazy :: Monad m => ByteString m () -> m ByteString Source

O(n) Convert a monadic byte stream into a single lazy ByteString with the same internal chunk structure.

>>> Q.toLazy "hello"
"hello"

toLazy' :: Monad m => ByteString m r -> m (Of ByteString r) Source

O(n) Convert a monadic byte stream into a single lazy ByteString with the same invisible chunk structure, retaining the original return value.

>>> Q.toLazy' "hello"
"hello" :> ()
>>> S.toListM $ mapsM Q.toLazy' $ Q.lines $ "one\ntwo\three\nfour\nfive\n"
["one","two\three","four","five",""]

fromChunks :: Monad m => Stream (Of ByteString) m r -> ByteString m r Source

O(c) Convert a monadic stream of individual strict ByteString chunks into a byte stream.

toChunks :: Monad m => ByteString m r -> Stream (Of ByteString) m r Source

O(c) Convert a byte stream into a stream of individual strict bytestrings. This of course exposes the internal chunk structure.

fromStrict :: ByteString -> ByteString m () Source

O(1) yield a strict ByteString chunk.

toStrict :: Monad m => ByteString m () -> m ByteString Source

O(n) Convert a byte stream into a single strict ByteString.

Note that this is an expensive operation that forces the whole monadic ByteString into memory and then copies all the data. If possible, try to avoid converting back and forth between streaming and strict bytestrings.

toStrict' :: Monad m => ByteString m r -> m (Of ByteString r) Source

O(n) Convert a monadic byte stream into a single strict ByteString, retaining the return value of the original pair. This operation is for use with mapsM.

mapsM R.toStrict' :: Monad m => Stream (ByteString m) m r -> Stream (Of ByteString) m r 

It is subject to all the objections one makes to toStrict.

drain :: Monad m => ByteString m r -> m r Source

wrap :: m (ByteString m r) -> ByteString m r Source

Smart constructor for Go.

Transforming ByteStrings

map :: Monad m => (Word8 -> Word8) -> ByteString m r -> ByteString m r Source

O(n) map f xs is the ByteString obtained by applying f to each element of xs.

intercalate :: Monad m => ByteString m () -> Stream (ByteString m) m r -> ByteString m r Source

O(n) The intercalate function takes a ByteString and a list of ByteStrings and concatenates the list after interspersing the first argument between each element of the list.

Basic interface

cons :: Monad m => Word8 -> ByteString m r -> ByteString m r Source

O(1) cons is analogous to '(:)' for lists.

cons' :: Word8 -> ByteString m r -> ByteString m r Source

O(1) Unlike cons, 'cons\'' is strict in the ByteString that we are consing onto. More precisely, it forces the head and the first chunk. It does this because, for space efficiency, it may coalesce the new byte onto the first 'chunk' rather than starting a new 'chunk'.

So that means you can't use a lazy recursive contruction like this:

let xs = cons\' c xs in xs

You can however use cons, as well as repeat and cycle, to build infinite byte streams.

snoc :: Monad m => ByteString m r -> Word8 -> ByteString m r Source

O(n/c) Append a byte to the end of a ByteString

append :: Monad m => ByteString m r -> ByteString m s -> ByteString m s Source

O(n/c) Append two

filter :: Monad m => (Word8 -> Bool) -> ByteString m r -> ByteString m r Source

O(n) filter, applied to a predicate and a ByteString, returns a ByteString containing those characters that satisfy the predicate.

uncons :: Monad m => ByteString m r -> m (Maybe (Word8, ByteString m r)) Source

O(1) Extract the head and tail of a ByteString, or Nothing if it is empty

nextByte :: Monad m => ByteString m r -> m (Either r (Word8, ByteString m r)) Source

O(1) Extract the head and tail of a ByteString, or its return value if it is empty

Direct chunk handling

consChunk :: ByteString -> ByteString m r -> ByteString m r Source

Smart constructor for Chunk.

chunk :: ByteString -> ByteString m () Source

Yield-style smart constructor for Chunk.

foldrChunks :: Monad m => (ByteString -> a -> a) -> a -> ByteString m r -> m a Source

Consume the chunks of an effectful ByteString with a natural right fold.

foldlChunks :: Monad m => (a -> ByteString -> a) -> a -> ByteString m r -> m (Of a r) Source

Substrings

Breaking strings

break :: Monad m => (Word8 -> Bool) -> ByteString m r -> ByteString m (ByteString m r) Source

break p is equivalent to span (not . p).

drop :: Monad m => Int64 -> ByteString m r -> ByteString m r Source

O(n/c) drop n xs returns the suffix of xs after the first n elements, or [] if n > length xs.

>>> Q.putStrLn $ Q.drop 6 "Wisconsin"
sin
>>> Q.putStrLn $ Q.drop 16 "Wisconsin"
>>> 

group :: Monad m => ByteString m r -> Stream (ByteString m) m r Source

The group function take`5s a ByteString and returns a list of ByteStrings such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,

group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]

It is a special case of groupBy, which allows the programmer to supply their own equality test.

span :: Monad m => (Word8 -> Bool) -> ByteString m r -> ByteString m (ByteString m r) Source

span p xs breaks the ByteString into two segments. It is equivalent to (takeWhile p xs, dropWhile p xs)

splitAt :: Monad m => Int64 -> ByteString m r -> ByteString m (ByteString m r) Source

O(n/c) splitAt n xs is equivalent to (take n xs, drop n xs).

>>> rest <- Q.putStrLn $ Q.splitAt 3 "therapist is a danger to good hyphenation, as Knuth notes"
the
>>> Q.putStrLn $ Q.splitAt 19 rest
rapist is a danger 

splitWith :: Monad m => (Word8 -> Bool) -> ByteString m r -> Stream (ByteString m) m r Source

O(n) Splits a ByteString into components delimited by separators, where the predicate returns True for a separator element. The resulting components do not contain the separators. Two adjacent separators result in an empty component in the output. eg.

splitWith (=='a') "aabbaca" == ["","","bb","c",""]
splitWith (=='a') []        == []

take :: Monad m => Int64 -> ByteString m r -> ByteString m () Source

O(n/c) take n, applied to a ByteString xs, returns the prefix of xs of length n, or xs itself if n > length xs.

Note that in the streaming context this drops the final return value; splitAt preserves this information, and is sometimes to be preferred.

>>> Q.putStrLn $ Q.take 8 $ "Is there a God?" >> return True
Is there
>>> Q.putStrLn $ "Is there a God?" >> return True
Is there a God?
True
>>> rest <- Q.putStrLn $ Q.splitAt 8 $ "Is there a God?" >> return True
Is there
>>> Q.drain rest
True

takeWhile :: Monad m => (Word8 -> Bool) -> ByteString m r -> ByteString m () Source

takeWhile, applied to a predicate p and a ByteString xs, returns the longest prefix (possibly empty) of xs of elements that satisfy p.

Breaking into many substrings

split :: Monad m => Word8 -> ByteString m r -> Stream (ByteString m) m r Source

O(n) Break a ByteString into pieces separated by the byte argument, consuming the delimiter. I.e.

split '\n' "a\nb\nd\ne" == ["a","b","d","e"]
split 'a'  "aXaXaXa"    == ["","X","X","X",""]
split 'x'  "x"          == ["",""]

and

intercalate [c] . split c == id
split == splitWith . (==)

As for all splitting functions in this library, this function does not copy the substrings, it just constructs new ByteStrings that are slices of the original.

Special folds

concat :: Monad m => Stream (ByteString m) m r -> ByteString m r Source

O(n) Concatenate a stream of byte streams.

Building ByteStrings

Infinite ByteStrings

repeat :: Word8 -> ByteString m r Source

repeat x is an infinite ByteString, with x the value of every element.

>>> R.stdout $ R.take 50 $ R.repeat 60
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>> 
>>> Q.putStrLn $ Q.take 50 $ Q.repeat 'z'
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

iterate :: (Word8 -> Word8) -> Word8 -> ByteString m r Source

O(n) Concatenate a list of ByteStrings. concat :: (Monad m) => [ByteString m ()] -> ByteString m () concat css0 = to css0 where go css (Empty m') = to css go css (Chunk c cs) = Chunk c (go css cs) go css (Go m) = Go (liftM (go css) m) to [] = Empty () to (cs:css) = go css cs

iterate f x returns an infinite ByteString of repeated applications -- of f to x:

iterate f x == [x, f x, f (f x), ...]
>>> R.stdout $ R.take 50 $ R.iterate succ 39
()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXY>>> 
>>> Q.putStrLn $ Q.take 50 $ Q.iterate succ '\''
()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXY

cycle :: Monad m => ByteString m r -> ByteString m s Source

cycle ties a finite ByteString into a circular one, or equivalently, the infinite repetition of the original ByteString. For an empty bytestring (like return 17) it of course makes an unproductive loop

>>> Q.putStrLn $ Q.take 7 $ Q.cycle  "y\n"
y
y
y
y

Unfolding ByteStrings

unfoldM :: Monad m => (a -> Maybe (Word8, a)) -> a -> ByteString m () Source

O(n) The unfoldr function is analogous to the Stream 'unfoldr'. unfoldr builds a ByteString from a seed value. The function takes the element and returns Nothing if it is done producing the ByteString or returns Just (a,b), in which case, a is a prepending to the ByteString and b is used as the next element in a recursive call.

unfoldr :: (a -> Either r (Word8, a)) -> a -> ByteString m r Source

unfold is like unfoldr but stops when the co-algebra returns Left; the result is the return value of the 'ByteString m r' 'unfoldr uncons = id'

Folds, including support for Foldl

foldr :: Monad m => (Word8 -> a -> a) -> a -> ByteString m () -> m a Source

foldr, applied to a binary operator, a starting value -- (typically the right-identity of the operator), and a ByteString, -- reduces the ByteString using the binary operator, from right to left.

foldr cons = id

fold :: Monad m => (x -> Word8 -> x) -> x -> (x -> b) -> ByteString m () -> m b Source

fold, applied to a binary operator, a starting value (typically the left-identity of the operator), and a ByteString, reduces the ByteString using the binary operator, from left to right. We use the style of the foldl libarary for left folds

fold' :: Monad m => (x -> Word8 -> x) -> x -> (x -> b) -> ByteString m r -> m (Of b r) Source

'fold\'' keeps the return value of the left-folded bytestring. Useful for simultaneous folds over a segmented bytestream

head :: Monad m => ByteString m r -> m Word8 Source

O(1) Extract the first element of a ByteString, which must be non-empty.

head' :: Monad m => ByteString m r -> m (Of (Maybe Word8) r) Source

O(c) Extract the first element of a ByteString, which must be non-empty.

last :: Monad m => ByteString m r -> m Word8 Source

O(n/c) Extract the last element of a ByteString, which must be finite and non-empty.

last' :: Monad m => ByteString m r -> m (Of (Maybe Word8) r) Source

length :: Monad m => ByteString m r -> m Int Source

length' :: Monad m => ByteString m r -> m (Of Int r) Source

O(n/c) length' returns the length of a byte stream as an Int together with the return value. This makes various maps possible

>>> Q.length' "one\ntwo\three\nfour\nfive\n"
23 :> ()
>>> S.print $ S.take 3 $ mapsM Q.length' $ Q.lines "one\ntwo\three\nfour\nfive\n"
3
8
4

null :: Monad m => ByteString m r -> m Bool Source

O(1) Test whether a ByteString is empty. The value is of course in the base monad.

>>> Q.null "one\ntwo\three\nfour\nfive\n"
False
>>> Q.null $ Q.take 0 Q.stdin
True
>>> :t Q.null $ Q.take 0 Q.stdin
Q.null $ Q.take 0 Q.stdin :: MonadIO m => m Bool

null' :: Monad m => ByteString m r -> m (Of Bool r) Source

O(1) Test whether a ByteString is empty, collecting its return value; -- to reach the return value, this operation must check the whole length of the string.

>>> Q.null' "one\ntwo\three\nfour\nfive\n"
False :> ()
[*Main]
>>> Q.null' ""
True :> ()
>>> S.print $ mapsM R.null' $ Q.lines "yours,\nMeredith"
False
False

count :: Monad m => Word8 -> ByteString m r -> m Int Source

count returns the number of times its argument appears in the ByteString

count = length . elemIndices

count' :: Monad m => Word8 -> ByteString m r -> m (Of Int r) Source

I/O with ByteStrings

Standard input and output

getContents :: MonadIO m => ByteString m () Source

getContents. Equivalent to hGetContents stdin. Will read lazily

stdin :: MonadIO m => ByteString m () Source

Pipes-style nomenclature for getContents

stdout :: MonadIO m => ByteString m r -> m r Source

Pipes-style nomenclature for putStr

interact :: (ByteString IO () -> ByteString IO r) -> IO r Source

Similar to hPut except that it will never block. Instead it returns any tail that did not get written. This tail may be empty in the case that the whole string was written, or the whole original string if nothing was written. Partial writes are also possible.

Note: on Windows and with Haskell implementation other than GHC, this function does not work correctly; it behaves identically to hPut.

hPutNonBlocking :: MonadIO m => Handle -> ByteString m r -> ByteString m r hPutNonBlocking _ (Empty r) = Empty r hPutNonBlocking h (Go m) = Go $ liftM (hPutNonBlocking h) m hPutNonBlocking h bs@(Chunk c cs) = do c' <- lift $ S.hPutNonBlocking h c case S.length c' of l' | l' == S.length c -> hPutNonBlocking h cs 0 -> bs _ -> Chunk c' cs {--}

A synonym for hPut, for compatibility

hPutStr :: Handle -> ByteString IO r -> IO r hPutStr = hPut

  • - | Write a ByteString to stdout putStr :: ByteString IO r -> IO r putStr = hPut IO.stdout

Files

readFile :: MonadIO m => FilePath -> ByteString m () Source

Read an entire file into a chunked 'ByteString IO ()'. The Handle will be held open until EOF is encountered.

writeFile :: FilePath -> ByteString IO r -> IO r Source

Write a ByteString to a file.

appendFile :: FilePath -> ByteString IO r -> IO r Source

Append a ByteString to a file.

I/O with Handles

fromHandle :: MonadIO m => Handle -> ByteString m () Source

Pipes-style nomenclature for hGetContents

toHandle :: MonadIO m => Handle -> ByteString m r -> m r Source

Pipes nomenclature for hPut

hGet :: MonadIO m => Handle -> Int -> ByteString m () Source

Read n bytes into a ByteString, directly from the specified Handle.

hGetContents :: MonadIO m => Handle -> ByteString m () Source

Read entire handle contents lazily into a ByteString. Chunks are read on demand, using the default chunk size.

Once EOF is encountered, the Handle is closed.

Note: the Handle should be placed in binary mode with hSetBinaryMode for hGetContents to work correctly.

hGetContentsN :: MonadIO m => Int -> Handle -> ByteString m () Source

Read entire handle contents lazily into a ByteString. Chunks are read on demand, in at most k-sized chunks. It does not block waiting for a whole k-sized chunk, so if less than k bytes are available then they will be returned immediately as a smaller chunk.

The handle is closed on EOF.

Note: the Handle should be placed in binary mode with hSetBinaryMode for hGetContentsN to work correctly.

hGetN :: MonadIO m => Int -> Handle -> Int -> ByteString m () Source

Read n bytes into a ByteString, directly from the specified Handle, in chunks of size k.

hGetNonBlocking :: MonadIO m => Handle -> Int -> ByteString m () Source

hGetNonBlocking is similar to hGet, except that it will never block waiting for data to become available, instead it returns only whatever data is available. If there is no data available to be read, hGetNonBlocking returns empty.

Note: on Windows and with Haskell implementation other than GHC, this function does not work correctly; it behaves identically to hGet.

hGetNonBlockingN :: MonadIO m => Int -> Handle -> Int -> ByteString m () Source

hGetNonBlockingN is similar to hGetContentsN, except that it will never block waiting for data to become available, instead it returns only whatever data is available. Chunks are read on demand, in k-sized chunks.

hPut :: MonadIO m => Handle -> ByteString m r -> m r Source

Outputs a ByteString to the specified Handle.

Etc.

zipWithStream :: Monad m => (forall x. a -> ByteString m x -> ByteString m x) -> [a] -> Stream (ByteString m) m r -> Stream (ByteString m) m r Source

distribute :: (Monad m, MonadTrans t, MFunctor t, Monad (t m), Monad (t (ByteString m))) => ByteString (t m) a -> t (ByteString m) a Source

Given a byte stream on a transformed monad, make it possible to 'run' transformer.