Safe Haskell | None |
---|---|
Language | Haskell2010 |
The motivation of this library is at least partly demostrated by the following problem with lists:
Consider the following code (which is taken from Tests.hs from this package btw):
f :: Int -> Int f x = x*(x .&. 3) g :: Int -> Int g x = x*(x .&. 7)
f
and g
are just silly example functions, which are effectively:
f x = x * (x mod 8) g x = x * (x mod 16)
Now lets say we want to take some "list", apply f to it, apply g to it, append both these together, and fold them. A straightforward way would be this:
sumG :: (Functor t, Foldable t, Semigroup (t Int)) => t Int -> Int sumG x = foldl' (+) 0 ((fmap f x) <> (fmap g x))
For comparison sake, lets write a hand written version of this function:
fast :: Int -> Int fast n = go g (go f 0 1 n) 1 n where go :: (Int -> Int) -> Int -> Int -> Int -> Int go f = go' where go' :: Int -> Int -> Int -> Int go' acc s i = if i == 0 then acc else let next_acc = acc + f s in next_acc `seq` go' next_acc (s + 1) (i - 1)
What you will probably find is, at least with GHC 8.0.2 which I've tested it with:
sumG [1..n]
is about ten times slower than
fast n
Even though they should be doing the same thing.
But, using this stream library, and EnumFromTo
from another package, you can write:
sumG (enumFromTo 1 n)
And this runs almost as fast as the handwritten code.
Now you may be able to get this speed out of ordinary lists with some fancy rewrite rules
(and indeed this Stream library does have a few fancy rewrite rules itself) there more
theortical advantages that Stream
can have over lists.
Unlike ordinary lists, streams do not store the data directly. They just store a way to generate the data.
What does this mean?
At the moment, the main way to process a stream is to fold over it. You can't really deconstruct it step by step. But generally folds give you enough power to process a list.
Also, if you fold over a stream twice, you'll have to recalculate it. This is a good and bad thing, It can be bad because you have to recalculate, but it's good because you won't use up memory. For many lists used in practice, they're simple enough to regenerate instead of storing, and it prevents huge heap usage from code like this:
average x = (foldl' (+) 0 x) / (length x)
There's other advantages to this approach. Firstly, appending streams is always a constant time operation. Always. Even if the first stream is infinite. All appending streams does is generate a new "stream" which has the two appended streams as data items.
Actually, our stream data type is more sophisticated than this. A Stream
is a type of two variables, the second
is the element type as usual, but the first is the "Length". Streams can be the following lengths:
- Infinite
- Unknown
- RunTime
- CompileTime
- Empty
Infinite streams are well, infinte, not much to say here.
Unknown streams are streams we don't know the length of. They could be infinite or finite. Ordinary lists are like this.
RunTime streams have a defined finite length, which takes constant time to access.
CompileTime streams have their length as a compile time factor.
Empty streams are well, empty.
Having these different types can be useful. We might want a safe "toVector" function that takes only RunTime streams, and immediately allocates the vector to that size before filling it.
But Stream
is indeed a GADT.
Currently there are 34 different types of streams. These range from simple streams just with a state and a "next_state" function, to streams representing appended streams, concatenated streams, etc.
There's even streams that are a wrapper for Foldable
types, so instead of converting everything to a list,
you can just wrap your data in a stream and combine data of all different types seemlessly.
I believe there's lots of opportunity to optimise this library. Potentially (if I got to understand the GHC API better) streams could carry around code blocks, which could compile just in time (JIT) when required. This could allow for fast code to be generated in situations where there are complex transformations, perhaps based on runtime branching, which the inliner can miss.
However, currently optimisation is limited. Indeed, the only optimisation I've to optimise the example given in this documentation. But it does show the potential, and it is an extensible framework.
- data Stream x a
- type CompileTimeStream n a = Stream (CompileTime n) a
- type RunTimeStream a = Stream RunTime a
- type UnknownStream a = Stream Unknown a
- type InfiniteStream a = Stream Infinite a
- type CompileTime n = Known (CompileTimeLength (NatLength n))
- type RunTime = Known RunTimeLength
- data Length
- class ToStream a where
- type family Element mono :: *
- runTimeFoldableToStream :: Foldable t => t a -> RunTimeStream a
- runTimeFoldableToStreamWithLength :: Foldable t => Int -> t a -> RunTimeStream a
- unknownFoldableToStream :: Foldable t => t a -> UnknownStream a
- empty :: EmptyStream a
- singleton :: a -> CompileTimeStream 1 a
- type family AppendLength (a :: Length) (b :: Length) where ...
- append :: forall l1 l2 a. Stream l1 a -> Stream l2 a -> Stream (AppendLength l1 l2) a
- zip :: Stream l1 a -> Stream l2 b -> Stream (ZipLength l1 l2) (a, b)
- zipWith :: forall l1 l2 a b c. (a -> b -> c) -> Stream l1 a -> Stream l2 b -> Stream (ZipLength l1 l2) c
- type family ZipLength (a :: Length) (b :: Length) where ...
- filter :: forall l a. (a -> Bool) -> Stream l a -> UnknownStream a
- concat :: CanNormalConcat l ~ True => Stream l (Stream l a) -> Stream l a
- concatMap :: CanNormalConcat l ~ True => (a -> Stream l b) -> Stream l a -> Stream l b
- replicate :: Integral b => b -> a -> RunTimeStream a
- iterate :: (a -> a) -> a -> Stream Infinite a
- repeat :: a -> Stream Infinite a
- cycle :: Stream l a -> Stream Infinite a
- null :: Foldable t => forall a. t a -> Bool
- unfoldr :: (b -> Maybe (a, b)) -> b -> UnknownStream a
- safeLength :: forall l a. Stream l a -> SafeLength
- data SafeLength
- lengthRunTime :: forall l a. Stream (Known l) a -> Int
- safeHead :: SafeHead l ~ True => Stream l a -> a
- unsafeHead :: Foldable t => t a -> a
- maybeHead :: Stream l a -> Maybe a
- mixedConcat :: forall l1 l2 a. IsLengthType l2 => Stream l1 (Stream l2 a) -> Stream (ConcatLength l1 l2) a
- type family ConcatLength (l1 :: Length) (l2 :: Length) where ...
- memotise :: Stream l a -> Stream l a
- strictMemotise :: Unbox a => Stream l a -> Stream l a
- wrapUnknown :: Stream l a -> UnknownStream a
- wrapRunTime :: Stream (Known l) a -> RunTimeStream a
Documentation
Monad (Stream RunTime) Source # | |
Monad (Stream Unknown) Source # | |
Functor (Stream l) Source # | |
Applicative (Stream RunTime) Source # | |
Applicative (Stream Unknown) Source # | |
Foldable (Stream l) Source # | |
Alternative (Stream RunTime) Source # | |
Alternative (Stream Unknown) Source # | |
(Alternative (Stream l), Monad (Stream l)) => MonadPlus (Stream l) Source # | |
Enum a => EnumFromTo (RunTimeStream a) Source # | |
Enum a => EnumFromTo (UnknownStream a) Source # | |
Enum a => EnumFrom (UnknownStream a) Source # | |
Enum a => EnumFrom (InfiniteStream a) Source # | |
Semigroup (Stream RunTime a) Source # | |
Semigroup (Stream Unknown a) Source # | |
Semigroup (Stream Infinite a) Source # | |
Monoid (Stream RunTime a) Source # | |
Monoid (Stream Unknown a) Source # | |
type Element (Stream l a) Source # | |
type CompileTimeStream n a = Stream (CompileTime n) a Source #
type RunTimeStream a = Stream RunTime a Source #
type UnknownStream a = Stream Unknown a Source #
type InfiniteStream a = Stream Infinite a Source #
type CompileTime n = Known (CompileTimeLength (NatLength n)) Source #
class ToStream a where Source #
Add instances to the toStream
class to allow for easy conversion to streams.
Technically you could just use runTimeFoldableToStream
and ,unknownFoldableToStream
to wrap data in
streams, but with this approach you can specialise for particular datatypes if appropriate.
type family Element mono :: * #
Type family for getting the type of the elements of a monomorphic container.
Both runTimeFoldableToStream
and unknownFoldableToStream
wraps a Foldable data into a stream.
Which one you use is a matter of choice, but generally you should use runTimeFoldableToStream
for structures like Vector which have a fixed and constant time list operation, and
unknownFoldableToStream
for structures like list, particularly when you don't yet know their length.
By default runTimeFoldableToStream
just calls length
to work out it's length, but if say, you've got
a list but you already know it's length (and that it's finite), then runTimeFoldableToStreamWithLength
might be the more appropriate choice.
runTimeFoldableToStream :: Foldable t => t a -> RunTimeStream a Source #
runTimeFoldableToStreamWithLength :: Foldable t => Int -> t a -> RunTimeStream a Source #
unknownFoldableToStream :: Foldable t => t a -> UnknownStream a Source #
singleton :: a -> CompileTimeStream 1 a Source #
type family AppendLength (a :: Length) (b :: Length) where ... Source #
AppendLength _ Infinite = Infinite | |
AppendLength Infinite _ = Infinite | |
AppendLength Empty y = y | |
AppendLength x Empty = x | |
AppendLength (CompileTime n1) (CompileTime n2) = CompileTime (n1 + n2) | |
AppendLength (Known l1) (Known l2) = RunTime | |
AppendLength _ _ = Unknown |
append :: forall l1 l2 a. Stream l1 a -> Stream l2 a -> Stream (AppendLength l1 l2) a Source #
Whilst appending two streams of the same type always results in the same type, appending two streams of different types can always be done, with the result type selected as appropriately as possible.
Both zip
and zipWith
aren't optimised currently, they just convert both sides to lists and zip them sadly.
zipWith :: forall l1 l2 a b c. (a -> b -> c) -> Stream l1 a -> Stream l2 b -> Stream (ZipLength l1 l2) c Source #
type family ZipLength (a :: Length) (b :: Length) where ... Source #
ZipLength Empty _ = Empty | |
ZipLength _ Empty = Empty | |
ZipLength x Infinite = x | |
ZipLength Infinite y = y | |
ZipLength (CompileTime n1) (CompileTime n2) = CompileTime (Min n1 n2) | |
ZipLength (Known l1) (Known l2) = RunTime | |
ZipLength _ _ = Unknown |
concat :: CanNormalConcat l ~ True => Stream l (Stream l a) -> Stream l a Source #
concat
like a restricted version of mixedConcat
where the input and output types are the same.
Note concat
does not work on streams with compile time length, as with these streams the length is
included in the type so obviously concatenating them changes the type.
replicate :: Integral b => b -> a -> RunTimeStream a Source #
null :: Foldable t => forall a. t a -> Bool #
Test whether the structure is empty. The default implementation is optimized for structures that are similar to cons-lists, because there is no general way to do better.
unfoldr :: (b -> Maybe (a, b)) -> b -> UnknownStream a Source #
safeLength :: forall l a. Stream l a -> SafeLength Source #
lengthRunTime :: forall l a. Stream (Known l) a -> Int Source #
safeHead :: SafeHead l ~ True => Stream l a -> a Source #
safeHead
will only work on types which are guarenteed to have a head, like infinite streams
and compile time streams of length at least 1.
unsafeHead :: Foldable t => t a -> a Source #
Just like Prelude's head
, errors out if there's a problem.
mixedConcat :: forall l1 l2 a. IsLengthType l2 => Stream l1 (Stream l2 a) -> Stream (ConcatLength l1 l2) a Source #
mixedConcat
is like the usual "concat", i.e. [[a]] -> [a]
except it works with nested
streams of different types, e.g. RunTimeStream (UnknownStream a)
type family ConcatLength (l1 :: Length) (l2 :: Length) where ... Source #
ConcatLength Empty _ = Empty | |
ConcatLength _ Empty = Empty | |
ConcatLength Infinite Infinite = Infinite | |
ConcatLength Infinite (CompileTime n1) = Infinite | |
ConcatLength (CompileTime n1) Infinite = Infinite | |
ConcatLength Infinite _ = Unknown | |
ConcatLength _ Infinite = Unknown | |
ConcatLength Unknown _ = Unknown | |
ConcatLength _ Unknown = Unknown | |
ConcatLength RunTime _ = RunTime | |
ConcatLength _ RunTime = RunTime | |
ConcatLength (CompileTime n1) (CompileTime n2) = CompileTime (n1 * n2) |
memotise :: Stream l a -> Stream l a Source #
As discussed in the intro to this module, by default streams when evaluated don't store their data.
memotise
is effectively an "id" style function, but it takes the stream and stores it in either a
Vector or list. For RunTimeStreams
, we use a Vector, as we know the length, but for UnknownStreams
and
InfiniteStreams
we use a list.
strictMemotise :: Unbox a => Stream l a -> Stream l a Source #
strictMemotise
can be used for streams of Unboxed types. It then stores the data in
an unboxed vector. Note that this only works for streams of RunTime or CompileTime length,
obviously we can't put an infinite length vector in a vector, and we're not sure if unknown length
vectors are finite.
So in the case of infinite or unknown vectors, we just fall back to the normal memotise
behaviour.
wrapUnknown :: Stream l a -> UnknownStream a Source #
Changes the type of any streams length to UnknownStream
.
Note that whilst now you can not distinguish this stream's length using the type system, it still
retains all it's previous behaviour. So if you wrapUnknown
a run time length stream, it's length function
will still work in constant time.
wrapRunTime :: Stream (Known l) a -> RunTimeStream a Source #
Like wrapUnknown
but instead to RunTimeStream
.
Of course, only runtime, compile time or empty streams can be converted to runtime streams, because runtime streams must know their length.