typed-streams-0.1.0.1: A stream based replacement for lists

Safe HaskellNone
LanguageHaskell2010

Data.Stream.Typed

Description

The motivation of this library is at least partly demostrated by the following problem with lists:

Consider the following code (which is taken from Tests.hs from this package btw):

f :: Int -> Int
f x = x*(x .&. 3)

g :: Int -> Int
g x = x*(x .&. 7)

f and g are just silly example functions, which are effectively:

f x = x * (x mod 8)
g x = x * (x mod 16)

Now lets say we want to take some "list", apply f to it, apply g to it, append both these together, and fold them. A straightforward way would be this:

sumG :: (Functor t, Foldable t, Semigroup (t Int)) => t Int -> Int
sumG x = foldl' (+) 0 ((fmap f x) <> (fmap g x))

For comparison sake, lets write a hand written version of this function:

fast :: Int -> Int
fast n = go g (go f 0 1 n) 1 n where
  go :: (Int -> Int) -> Int -> Int -> Int -> Int
  go f = go' where
    go' :: Int -> Int -> Int -> Int
    go' acc s i = if i == 0 then acc else let next_acc = acc + f s in next_acc `seq` go' next_acc (s + 1) (i - 1)

What you will probably find is, at least with GHC 8.0.2 which I've tested it with:

sumG [1..n]

is about ten times slower than

fast n

Even though they should be doing the same thing.

But, using this stream library, and EnumFromTo from another package, you can write:

sumG (enumFromTo 1 n)

And this runs almost as fast as the handwritten code.

Now you may be able to get this speed out of ordinary lists with some fancy rewrite rules (and indeed this Stream library does have a few fancy rewrite rules itself) there more theortical advantages that Stream can have over lists.

Unlike ordinary lists, streams do not store the data directly. They just store a way to generate the data.

What does this mean?

At the moment, the main way to process a stream is to fold over it. You can't really deconstruct it step by step. But generally folds give you enough power to process a list.

Also, if you fold over a stream twice, you'll have to recalculate it. This is a good and bad thing, It can be bad because you have to recalculate, but it's good because you won't use up memory. For many lists used in practice, they're simple enough to regenerate instead of storing, and it prevents huge heap usage from code like this:

average x = (foldl' (+) 0 x) / (length x)

There's other advantages to this approach. Firstly, appending streams is always a constant time operation. Always. Even if the first stream is infinite. All appending streams does is generate a new "stream" which has the two appended streams as data items.

Actually, our stream data type is more sophisticated than this. A Stream is a type of two variables, the second is the element type as usual, but the first is the "Length". Streams can be the following lengths:

  • Infinite
  • Unknown
  • RunTime
  • CompileTime
  • Empty

Infinite streams are well, infinte, not much to say here.

Unknown streams are streams we don't know the length of. They could be infinite or finite. Ordinary lists are like this.

RunTime streams have a defined finite length, which takes constant time to access.

CompileTime streams have their length as a compile time factor.

Empty streams are well, empty.

Having these different types can be useful. We might want a safe "toVector" function that takes only RunTime streams, and immediately allocates the vector to that size before filling it.

But Stream is indeed a GADT.

Currently there are 34 different types of streams. These range from simple streams just with a state and a "next_state" function, to streams representing appended streams, concatenated streams, etc.

There's even streams that are a wrapper for Foldable types, so instead of converting everything to a list, you can just wrap your data in a stream and combine data of all different types seemlessly.

I believe there's lots of opportunity to optimise this library. Potentially (if I got to understand the GHC API better) streams could carry around code blocks, which could compile just in time (JIT) when required. This could allow for fast code to be generated in situations where there are complex transformations, perhaps based on runtime branching, which the inliner can miss.

However, currently optimisation is limited. Indeed, the only optimisation I've to optimise the example given in this documentation. But it does show the potential, and it is an extensible framework.

Synopsis

Documentation

data Stream x a Source #

Instances

Monad (Stream RunTime) Source # 
Monad (Stream Unknown) Source # 
Functor (Stream l) Source # 

Methods

fmap :: (a -> b) -> Stream l a -> Stream l b #

(<$) :: a -> Stream l b -> Stream l a #

Applicative (Stream RunTime) Source # 
Applicative (Stream Unknown) Source # 
Foldable (Stream l) Source # 

Methods

fold :: Monoid m => Stream l m -> m #

foldMap :: Monoid m => (a -> m) -> Stream l a -> m #

foldr :: (a -> b -> b) -> b -> Stream l a -> b #

foldr' :: (a -> b -> b) -> b -> Stream l a -> b #

foldl :: (b -> a -> b) -> b -> Stream l a -> b #

foldl' :: (b -> a -> b) -> b -> Stream l a -> b #

foldr1 :: (a -> a -> a) -> Stream l a -> a #

foldl1 :: (a -> a -> a) -> Stream l a -> a #

toList :: Stream l a -> [a] #

null :: Stream l a -> Bool #

length :: Stream l a -> Int #

elem :: Eq a => a -> Stream l a -> Bool #

maximum :: Ord a => Stream l a -> a #

minimum :: Ord a => Stream l a -> a #

sum :: Num a => Stream l a -> a #

product :: Num a => Stream l a -> a #

Alternative (Stream RunTime) Source # 
Alternative (Stream Unknown) Source # 
(Alternative (Stream l), Monad (Stream l)) => MonadPlus (Stream l) Source # 

Methods

mzero :: Stream l a #

mplus :: Stream l a -> Stream l a -> Stream l a #

Enum a => EnumFromTo (RunTimeStream a) Source # 
Enum a => EnumFromTo (UnknownStream a) Source # 
Enum a => EnumFrom (UnknownStream a) Source # 
Enum a => EnumFrom (InfiniteStream a) Source # 
Semigroup (Stream RunTime a) Source # 
Semigroup (Stream Unknown a) Source # 
Semigroup (Stream Infinite a) Source # 
Monoid (Stream RunTime a) Source # 
Monoid (Stream Unknown a) Source # 
type Element (Stream l a) Source # 
type Element (Stream l a) = a

type CompileTime n = Known (CompileTimeLength (NatLength n)) Source #

type RunTime = Known RunTimeLength Source #

data Length Source #

Constructors

Unknown 
Infinite 

class ToStream a where Source #

Add instances to the toStream class to allow for easy conversion to streams. Technically you could just use runTimeFoldableToStream and ,unknownFoldableToStream to wrap data in streams, but with this approach you can specialise for particular datatypes if appropriate.

Minimal complete definition

toStream

Methods

toStream :: a -> Stream (LengthT a) (Element a) Source #

Instances

ToStream ByteString Source # 
ToStream ByteString Source # 
ToStream [a] Source # 

Methods

toStream :: [a] -> Stream (LengthT [a]) (Element [a]) Source #

ToStream (Vector a) Source # 

Methods

toStream :: Vector a -> Stream (LengthT (Vector a)) (Element (Vector a)) Source #

Unbox a => ToStream (Vector a) Source # 

Methods

toStream :: Vector a -> Stream (LengthT (Vector a)) (Element (Vector a)) Source #

ToStream (Array i e) Source # 

Methods

toStream :: Array i e -> Stream (LengthT (Array i e)) (Element (Array i e)) Source #

type family Element mono :: * #

Type family for getting the type of the elements of a monomorphic container.

Instances

type Element ByteString 
type Element ByteString 
type Element IntSet 
type Element Text 
type Element Text 
type Element [a] 
type Element [a] = a
type Element (Maybe a) 
type Element (Maybe a) = a
type Element (IO a) 
type Element (IO a) = a
type Element (Identity a) 
type Element (Identity a) = a
type Element (Option a) 
type Element (Option a) = a
type Element (NonEmpty a) 
type Element (NonEmpty a) = a
type Element (ZipList a) 
type Element (ZipList a) = a
type Element (IntMap a) 
type Element (IntMap a) = a
type Element (Tree a) 
type Element (Tree a) = a
type Element (Seq a) 
type Element (Seq a) = a
type Element (ViewL a) 
type Element (ViewL a) = a
type Element (ViewR a) 
type Element (ViewR a) = a
type Element (Set e) 
type Element (Set e) = e
type Element (HashSet e) 
type Element (HashSet e) = e
type Element (Vector a) 
type Element (Vector a) = a
type Element (Vector a) 
type Element (Vector a) = a
type Element (Vector a) 
type Element (Vector a) = a
type Element (r -> a) 
type Element (r -> a) = a
type Element (Either a b) 
type Element (Either a b) = b
type Element (a, b) 
type Element (a, b) = b
type Element (Array i e) # 
type Element (Array i e) = e
type Element (Arg a b) 
type Element (Arg a b) = b
type Element (WrappedMonad m a) 
type Element (WrappedMonad m a) = a
type Element (Map k v) 
type Element (Map k v) = v
type Element (MaybeT m a) 
type Element (MaybeT m a) = a
type Element (ListT m a) 
type Element (ListT m a) = a
type Element (HashMap k v) 
type Element (HashMap k v) = v
type Element (WrappedArrow a b c) 
type Element (WrappedArrow a b c) = c
type Element (Const * m a) 
type Element (Const * m a) = a
type Element (StateT s m a) 
type Element (StateT s m a) = a
type Element (StateT s m a) 
type Element (StateT s m a) = a
type Element (WriterT w m a) 
type Element (WriterT w m a) = a
type Element (WriterT w m a) 
type Element (WriterT w m a) = a
type Element (IdentityT * m a) 
type Element (IdentityT * m a) = a
type Element (Product * f g a) 
type Element (Product * f g a) = a
type Element (ContT * r m a) 
type Element (ContT * r m a) = a
type Element (ReaderT * r m a) 
type Element (ReaderT * r m a) = a
type Element (Compose * * f g a) 
type Element (Compose * * f g a) = a
type Element (RWST r w s m a) 
type Element (RWST r w s m a) = a
type Element (RWST r w s m a) 
type Element (RWST r w s m a) = a

Both runTimeFoldableToStream and unknownFoldableToStream wraps a Foldable data into a stream. Which one you use is a matter of choice, but generally you should use runTimeFoldableToStream for structures like Vector which have a fixed and constant time list operation, and unknownFoldableToStream for structures like list, particularly when you don't yet know their length.

By default runTimeFoldableToStream just calls length to work out it's length, but if say, you've got a list but you already know it's length (and that it's finite), then runTimeFoldableToStreamWithLength might be the more appropriate choice.

empty :: EmptyStream a Source #

type family AppendLength (a :: Length) (b :: Length) where ... Source #

Equations

AppendLength _ Infinite = Infinite 
AppendLength Infinite _ = Infinite 
AppendLength Empty y = y 
AppendLength x Empty = x 
AppendLength (CompileTime n1) (CompileTime n2) = CompileTime (n1 + n2) 
AppendLength (Known l1) (Known l2) = RunTime 
AppendLength _ _ = Unknown 

append :: forall l1 l2 a. Stream l1 a -> Stream l2 a -> Stream (AppendLength l1 l2) a Source #

Whilst appending two streams of the same type always results in the same type, appending two streams of different types can always be done, with the result type selected as appropriately as possible.

Both zip and zipWith aren't optimised currently, they just convert both sides to lists and zip them sadly.

zip :: Stream l1 a -> Stream l2 b -> Stream (ZipLength l1 l2) (a, b) Source #

zipWith :: forall l1 l2 a b c. (a -> b -> c) -> Stream l1 a -> Stream l2 b -> Stream (ZipLength l1 l2) c Source #

type family ZipLength (a :: Length) (b :: Length) where ... Source #

Equations

ZipLength Empty _ = Empty 
ZipLength _ Empty = Empty 
ZipLength x Infinite = x 
ZipLength Infinite y = y 
ZipLength (CompileTime n1) (CompileTime n2) = CompileTime (Min n1 n2) 
ZipLength (Known l1) (Known l2) = RunTime 
ZipLength _ _ = Unknown 

filter :: forall l a. (a -> Bool) -> Stream l a -> UnknownStream a Source #

concat :: CanNormalConcat l ~ True => Stream l (Stream l a) -> Stream l a Source #

concat like a restricted version of mixedConcat where the input and output types are the same.

Note concat does not work on streams with compile time length, as with these streams the length is included in the type so obviously concatenating them changes the type.

concatMap :: CanNormalConcat l ~ True => (a -> Stream l b) -> Stream l a -> Stream l b Source #

iterate :: (a -> a) -> a -> Stream Infinite a Source #

null :: Foldable t => forall a. t a -> Bool #

Test whether the structure is empty. The default implementation is optimized for structures that are similar to cons-lists, because there is no general way to do better.

unfoldr :: (b -> Maybe (a, b)) -> b -> UnknownStream a Source #

safeLength :: forall l a. Stream l a -> SafeLength Source #

lengthRunTime :: forall l a. Stream (Known l) a -> Int Source #

safeHead :: SafeHead l ~ True => Stream l a -> a Source #

safeHead will only work on types which are guarenteed to have a head, like infinite streams and compile time streams of length at least 1.

unsafeHead :: Foldable t => t a -> a Source #

Just like Prelude's head, errors out if there's a problem.

maybeHead :: Stream l a -> Maybe a Source #

Returns Just a if list has a head, Nothing otherwise.

mixedConcat :: forall l1 l2 a. IsLengthType l2 => Stream l1 (Stream l2 a) -> Stream (ConcatLength l1 l2) a Source #

mixedConcat is like the usual "concat", i.e. [[a]] -> [a] except it works with nested streams of different types, e.g. RunTimeStream (UnknownStream a)

memotise :: Stream l a -> Stream l a Source #

As discussed in the intro to this module, by default streams when evaluated don't store their data. memotise is effectively an "id" style function, but it takes the stream and stores it in either a Vector or list. For RunTimeStreams, we use a Vector, as we know the length, but for UnknownStreams and InfiniteStreams we use a list.

strictMemotise :: Unbox a => Stream l a -> Stream l a Source #

strictMemotise can be used for streams of Unboxed types. It then stores the data in an unboxed vector. Note that this only works for streams of RunTime or CompileTime length, obviously we can't put an infinite length vector in a vector, and we're not sure if unknown length vectors are finite.

So in the case of infinite or unknown vectors, we just fall back to the normal memotise behaviour.

wrapUnknown :: Stream l a -> UnknownStream a Source #

Changes the type of any streams length to UnknownStream.

Note that whilst now you can not distinguish this stream's length using the type system, it still retains all it's previous behaviour. So if you wrapUnknown a run time length stream, it's length function will still work in constant time.

wrapRunTime :: Stream (Known l) a -> RunTimeStream a Source #

Like wrapUnknown but instead to RunTimeStream.

Of course, only runtime, compile time or empty streams can be converted to runtime streams, because runtime streams must know their length.