| Safe Haskell | None |
|---|
Data.Conduit
Contents
Description
Let's start off with a few simple examples of conduit usage. First, a file
copy utility:
>>>:load Data.Conduit.Binary>>>runResourceT $ sourceFile "input.txt" $$ sinkFile "output.txt"
runResourceT is a function provided by the resourcet package, and ensures
that resources are properly cleaned up, even in the presence of exceptions. The
type system will enforce that runResourceT is called as needed. The remainder
of this tutorial will not discuss runResourceT; please see the documentation
in resourcet for more information.
Looking at the rest of our example, there are three components to understand:
sourceFile, sinkFile, and the $$ operator (called "connect"). These
represent the most basic building blocks in conduit: a Source produces a
stream of values, a Sink consumes such a stream, and $$ will combine these
together.
In the case of file copying, there was no value produced by the Sink.
However, often times a Sink will produce some result value. For example:
>>>:load Data.Conduit.List>>>:module +Prelude>>>sourceList [1..10] $$ fold (+) 055
sourceList is a convenience function for turning a list into a Source.
fold implements a strict left fold for consuming the input stream.
The next major aspect to the conduit library is the Conduit type.
This type represents a stream transformer. In order to use a Conduit, we
must fuse it with either a Source or Sink. For example:
>>>:load Data.Conduit.List>>>:module +Prelude>>>sourceList [1..10] $= Data.Conduit.List.map (+1) $$ consume[2,3,4,5,6,7,8,9,10,11]
Notice the addition of the $=, or left fuse operator. This combines a
Source and a Conduit into a new Source, which can then be connected to a
Sink (in this case, consume). We can similarly perform right fusion to
combine a Conduit and Sink, or middle fusion to combine two Conduits.
A number of very common functions are provided in the Data.Conduit.List module. Many of these functions correspond very closely to standard Haskell functions.
In addition to connecting and fusing components together, we can also build up
more sophisticated components through monadic composition. For example, to
create a Sink that ignores the first 3 numbers and returns the sum of the
remaining numbers, we can use:
>>>:load Data.Conduit.List>>>:module +Prelude>>>sourceList [1..10] $$ Data.Conduit.List.drop 3 >> fold (+) 049
In some cases, we might end up consuming more input than we needed, and want to
provide that input to the next component in our monadic chain. We refer to this
as leftovers. The simplest example of this is peek.
>>>:load Data.Conduit.List>>>:set -XNoMonomorphismRestriction>>>:module +Prelude>>>let sink = do { first <- peek; total <- fold (+) 0; return (first, total) }>>>sourceList [1..10] $$ sink(Just 1,55)
Notice that, although we "consumed" the first value from the stream via
peek, it was still available to fold. This idea becomes even more important
when dealing with chunked data such as ByteStrings or Text.
Final note: Notice in the types below that Source, Sink, and Conduit
are just type aliases. This will be explained later. Another important aspect
is resource finalization, which will also be covered below.
- type Source m o = ConduitM () o m ()
- type Conduit i m o = ConduitM i o m ()
- type Sink i m r = ConduitM i Void m r
- ($$) :: Monad m => Source m a -> Sink a m b -> m b
- ($=) :: Monad m => Source m a -> Conduit a m b -> Source m b
- (=$) :: Monad m => Conduit a m b -> Sink b m c -> Sink a m c
- (=$=) :: Monad m => Conduit a m b -> ConduitM b c m r -> ConduitM a c m r
- await :: Monad m => Consumer i m (Maybe i)
- awaitForever :: Monad m => (i -> ConduitM i o m r) -> ConduitM i o m ()
- yield :: Monad m => o -> ConduitM i o m ()
- yieldOr :: Monad m => o -> m () -> ConduitM i o m ()
- leftover :: i -> ConduitM i o m ()
- bracketP :: MonadResource m => IO a -> (a -> IO ()) -> (a -> ConduitM i o m r) -> ConduitM i o m r
- addCleanup :: Monad m => (Bool -> m ()) -> ConduitM i o m r -> ConduitM i o m r
- data ResumableSource m o
- ($$+) :: Monad m => Source m a -> Sink a m b -> m (ResumableSource m a, b)
- ($$++) :: Monad m => ResumableSource m a -> Sink a m b -> m (ResumableSource m a, b)
- ($$+-) :: Monad m => ResumableSource m a -> Sink a m b -> m b
- unwrapResumable :: MonadIO m => ResumableSource m o -> m (Source m o, m ())
- transPipe :: Monad m => (forall a. m a -> n a) -> ConduitM i o m r -> ConduitM i o n r
- mapOutput :: Monad m => (o1 -> o2) -> ConduitM i o1 m r -> ConduitM i o2 m r
- mapOutputMaybe :: Monad m => (o1 -> Maybe o2) -> ConduitM i o1 m r -> ConduitM i o2 m r
- mapInput :: Monad m => (i1 -> i2) -> (i2 -> Maybe i1) -> ConduitM i2 o m r -> ConduitM i1 o m r
- type Producer m o = forall i. ConduitM i o m ()
- type Consumer i m r = forall o. ConduitM i o m r
- toProducer :: Monad m => Source m a -> Producer m a
- toConsumer :: Monad m => Sink a m b -> Consumer a m b
- data Flush a
- data ResourceT m a
- class (MonadThrow m, MonadUnsafeIO m, MonadIO m, Applicative m) => MonadResource m
- class Monad m => MonadThrow m where
- monadThrow :: Exception e => e -> m a
- class Monad m => MonadUnsafeIO m where
- unsafeLiftIO :: IO a -> m a
- runResourceT :: MonadBaseControl IO m => ResourceT m a -> m a
- newtype ExceptionT m a = ExceptionT {
- runExceptionT :: m (Either SomeException a)
- runExceptionT_ :: Monad m => ExceptionT m a -> m a
- runException :: ExceptionT Identity a -> Either SomeException a
- runException_ :: ExceptionT Identity a -> a
- class MonadBase b m => MonadBaseControl b m | m -> b
Conduit interface
type Source m o = ConduitM () o m ()Source
Provides a stream of output values, without consuming any input or producing a final result.
Since 0.5.0
type Conduit i m o = ConduitM i o m ()Source
Consumes a stream of input values and produces a stream of output values, without producing a final result.
Since 0.5.0
type Sink i m r = ConduitM i Void m rSource
Consumes a stream of input values and produces a final result, without producing any output.
Since 0.5.0
Connect/fuse
It is important to understand the lifecycle of our components. Notice that we can connect or fuse two components together. When we do this, the component providing output is called upstream, and the component consuming this input is called downstream. We can have arbitrarily long chains of such fusion, so a single component can simultaneously function as upstream and downstream.
Each component can be in one of four states of operation at any given time:
- It hasn't yet started operating.
- It is providing output downstream.
- It is waiting for input from upstream.
- It has completed processing.
Let's use sourceFile and sinkFile as an example. When we run sourceFile
input $$ sinkFile output, both components begin in the "not started"
state. Next, we start running sinkFile (note: we always begin processing on
the downstream component). sinkFile will open up the file, and then wait for
input from upstream.
Next, we'll start running sourceFile, which will open the file, read some
data from it, and provide it as output downstream. This will be fed to
sinkFile (which was already waiting). sinkFile will write the data to a
file, then ask for more input. This process will continue until sourceFile
reaches the end of the input. It will close the file handle and switch to the
completed state. When this happens, sinkFile is sent a signal that no more
input is available. It will then close its file and return a result.
Now let's change things up a bit. Suppose we were instead connecting
sourceFile to take 1. We start by running take 1, which will wait for
some input. We'll then start sourceFile, which will open the file, read a
chunk, and send it downstream. take 1 will take that single chunk and return
it as a result. Once it does this, it has transitioned to the complete state.
We don't want to pull any more data from sourceFile, as we do not need it.
So instead, we call sourceFile's finalizer. Each time upstream provides
output, it also provides a finalizer to be run if downstream finishes
processing.
One final case: suppose we connect sourceFile to return (). The latter does
nothing: it immediately switches to the complete state. In this case, we never
even start running sourceFile (it stays in the "not yet started" state),
and so no finalization occurs.
So here are the takeaways from the above discussion:
- When upstream completes before downstream, it cleans up all of its resources and sends some termination signal. We never think about upstream again. This can only occur while downstream is in the "waiting for input" state, since that is the only time that upstream is called.
- When downstream completes before upstream, we finalize upstream immediately. This can only occur when upstream produces output, because that's the only time when control is passed back to downstream.
- If downstream never awaits for input before it terminates, upstream was never started, and therefore it does not need to be finalized.
Note that all of the discussion above applies equally well to chains of components. If you have an upstream, middle, and downstream component, and downstream terminates, then the middle component will be finalized, which in turn will trigger upstream to be finalized. This setup ensures that we always have prompt resource finalization.
($$) :: Monad m => Source m a -> Sink a m b -> m bSource
The connect operator, which pulls data from a source and pushes to a sink.
When either side closes, the other side will immediately be closed as well.
If you would like to keep the Source open to be used for another
operations, use the connect-and-resume operator $$+.
Since 0.4.0
($=) :: Monad m => Source m a -> Conduit a m b -> Source m bSource
Left fuse, combining a source and a conduit together into a new source.
Both the Source and Conduit will be closed when the newly-created
Source is closed.
Leftover data from the Conduit will be discarded.
Since 0.4.0
(=$) :: Monad m => Conduit a m b -> Sink b m c -> Sink a m cSource
Right fuse, combining a conduit and a sink together into a new sink.
Both the Conduit and Sink will be closed when the newly-created Sink
is closed.
Leftover data returned from the Sink will be discarded.
Since 0.4.0
(=$=) :: Monad m => Conduit a m b -> ConduitM b c m r -> ConduitM a c m rSource
Fusion operator, combining two Conduits together into a new Conduit.
Both Conduits will be closed when the newly-created Conduit is closed.
Leftover data returned from the right Conduit will be discarded.
Since 0.4.0
Primitives
While conduit provides a number of built-in Sources, Sinks, and
Conduits, you will almost certainly want to construct some of your own.
Previous versions recommended using the constructors directly. Beginning with
0.5, the recommended approach is to compose existing Pipes into larger ones.
It is certainly possible (and advisable!) to leverage existing Pipes- like
those in Data.Conduit.List. However, you will often need to go to a lower
level set of Pipes to start your composition. The following few functions
should be sufficient for expressing all constructs besides finalization. Adding
in bracketP and addCleanup, you should be able to create any Pipe you
need. (In fact, that's precisely how the remainder of this package is written.)
The three basic operations are awaiting, yielding, and leftovers.
Awaiting asks for a new value from upstream, or returns Nothing if upstream
is done. For example:
>>>:load Data.Conduit.List>>>sourceList [1..10] $$ awaitJust 1
>>>:load Data.Conduit.List>>>sourceList [] $$ awaitNothing
Similarly, we have a yield function, which provides a value to the downstream
Pipe. yield features auto-termination: if the downstream Pipe has
already completed processing, the upstream Pipe will stop processing when it
tries to yield.
The upshot of this is that you can write code that appears to loop infinitely, and yet will terminate.
>>>:set -XNoMonomorphismRestriction>>>let infinite = yield () >> infinite>>>infinite $$ awaitJust ()
Or for something a bit more sophisticated:
>>>let enumFrom' i = yield i >> enumFrom' (succ i)>>>enumFrom' 1 $$ take 5[1,2,3,4,5]
The final primitive Pipe is leftover. This allows you to return unused
input to be used by the next Pipe in the monadic chain. A simple use case
would be implementing the peek function:
>>>let peek = await >>= maybe (return Nothing) (\x -> leftover x >> return (Just x))>>>enumFrom' 1 $$ do { mx <- peek; my <- await; mz <- await; return (mx, my, mz) }(Just 1,Just 1,Just 2)
Note that you should only return leftovers that were previously yielded from upstream.
awaitForever :: Monad m => (i -> ConduitM i o m r) -> ConduitM i o m ()Source
Finalization
bracketP :: MonadResource m => IO a -> (a -> IO ()) -> (a -> ConduitM i o m r) -> ConduitM i o m rSource
Connect-and-resume
Sometimes, we do not want to force our entire application to live inside the
Pipe monad. It can be convenient to keep normal control flow of our program,
and incrementally apply data from a Source to various Sinks. A strong
motivating example for this use case is interleaving multiple Sources, such
as combining a conduit-powered HTTP server and client into an HTTP proxy.
Normally, when we run a Pipe, we get a result and can never run it again.
Connect-and-resume allows us to connect a Source to a Sink until the latter
completes, and then return the current state of the Source to be applied
later. To do so, we introduce three new operators. Let' start off by
demonstrating them:
>>>:load Data.Conduit.List>>>(next, x) <- sourceList [1..10] $$+ take 5>>>Prelude.print x[1,2,3,4,5]>>>(next, y) <- next $$++ (isolate 4 =$ fold (Prelude.+) 0)>>>Prelude.print y30>>>next $$+- consume[10]
data ResumableSource m o Source
A Source which has been started, but has not yet completed.
This type contains both the current state of the Source, and the finalizer
to be run to close it.
Since 0.5.0
($$+) :: Monad m => Source m a -> Sink a m b -> m (ResumableSource m a, b)Source
The connect-and-resume operator. This does not close the Source, but
instead returns it to be used again. This allows a Source to be used
incrementally in a large program, without forcing the entire program to live
in the Sink monad.
Mnemonic: connect + do more.
Since 0.5.0
($$++) :: Monad m => ResumableSource m a -> Sink a m b -> m (ResumableSource m a, b)Source
Continue processing after usage of $$+.
Since 0.5.0
($$+-) :: Monad m => ResumableSource m a -> Sink a m b -> m bSource
Complete processing of a ResumableSource. This will run the finalizer
associated with the ResumableSource. In order to guarantee process resource
finalization, you must use this operator after using $$+ and $$++.
Since 0.5.0
unwrapResumable :: MonadIO m => ResumableSource m o -> m (Source m o, m ())Source
Unwraps a ResumableSource into a Source and a finalizer.
A ResumableSource represents a Source which has already been run, and
therefore has a finalizer registered. As a result, if we want to turn it
into a regular Source, we need to ensure that the finalizer will be run
appropriately. By appropriately, I mean:
- If a new finalizer is registered, the old one should not be called. * If the old one is called, it should not be called again.
This function returns both a Source and a finalizer which ensures that the
above two conditions hold. Once you call that finalizer, the Source is
invalidated and cannot be used.
Since 0.5.2
Utility functions
Generalized conduit types
It's recommended to keep your type signatures as general as possible to
encourage reuse. For example, a theoretical signature for the head function
would be:
head :: Sink a m (Maybe a)
However, doing so would prevent usage of head from inside a Conduit, since
a Sink sets its output type parameter to Void. The most general type
signature would instead be:
head :: Pipe l a o u m (Maybe a)
However, that signature is much more confusing. To bridge this gap, we also provide some generalized conduit types. They follow a simple naming convention:
- They have the same name as their non-generalized types, with a
Gprepended. - If they have leftovers, we add an
L. - If they consume the entirety of their input stream and return the upstream
result, we add
Infto indicate infinite consumption.
type Consumer i m r = forall o. ConduitM i o m rSource
Generalized Sink without leftovers.
Since 0.5.0
toProducer :: Monad m => Source m a -> Producer m aSource
toConsumer :: Monad m => Sink a m b -> Consumer a m bSource
Flushing
Provide for a stream of data that can be flushed.
A number of Conduits (e.g., zlib compression) need the ability to flush
the stream at some point. This provides a single wrapper datatype to be used
in all such circumstances.
Since 0.3.0
Convenience re-exports
data ResourceT m a
The Resource transformer. This transformer keeps track of all registered
actions, and calls them upon exit (via runResourceT). Actions may be
registered via register, or resources may be allocated atomically via
allocate. allocate corresponds closely to bracket.
Releasing may be performed before exit via the release function. This is a
highly recommended optimization, as it will ensure that scarce resources are
freed early. Note that calling release will deregister the action, so that
a release action will only ever be called once.
Since 0.3.0
Instances
| MFunctor ResourceT | Since 0.4.7 |
| MMonad ResourceT | Since 0.4.7 |
| MonadTrans ResourceT | |
| MonadTransControl ResourceT | |
| MonadRWS r w s m => MonadRWS r w s (ResourceT m) | |
| MonadBase b m => MonadBase b (ResourceT m) | |
| MonadBaseControl b m => MonadBaseControl b (ResourceT m) | |
| MonadState s m => MonadState s (ResourceT m) | |
| MonadWriter w m => MonadWriter w (ResourceT m) | |
| MonadReader r m => MonadReader r (ResourceT m) | |
| MonadError e m => MonadError e (ResourceT m) | |
| Monad m => Monad (ResourceT m) | |
| Functor m => Functor (ResourceT m) | |
| Typeable1 m => Typeable1 (ResourceT m) | |
| Applicative m => Applicative (ResourceT m) | |
| (MonadIO m, MonadActive m) => MonadActive (ResourceT m) | |
| (MonadThrow m, MonadUnsafeIO m, MonadIO m, Applicative m) => MonadResource (ResourceT m) | |
| MonadThrow m => MonadThrow (ResourceT m) | |
| MonadIO m => MonadIO (ResourceT m) | |
| MonadCont m => MonadCont (ResourceT m) |
class (MonadThrow m, MonadUnsafeIO m, MonadIO m, Applicative m) => MonadResource m
A Monad which allows for safe resource allocation. In theory, any monad
transformer stack included a ResourceT can be an instance of
MonadResource.
Note: runResourceT has a requirement for a MonadBaseControl IO m monad,
which allows control operations to be lifted. A MonadResource does not
have this requirement. This means that transformers such as ContT can be
an instance of MonadResource. However, the ContT wrapper will need to be
unwrapped before calling runResourceT.
Since 0.3.0
Instances
| (MonadThrow m, MonadUnsafeIO m, MonadIO m, Applicative m) => MonadResource (ResourceT m) | |
| MonadResource m => MonadResource (ExceptionT m) | |
| MonadResource m => MonadResource (MaybeT m) | |
| MonadResource m => MonadResource (ListT m) | |
| MonadResource m => MonadResource (IdentityT m) | |
| (Monoid w, MonadResource m) => MonadResource (WriterT w m) | |
| (Monoid w, MonadResource m) => MonadResource (WriterT w m) | |
| MonadResource m => MonadResource (StateT s m) | |
| MonadResource m => MonadResource (StateT s m) | |
| MonadResource m => MonadResource (ReaderT r m) | |
| (Error e, MonadResource m) => MonadResource (ErrorT e m) | |
| MonadResource m => MonadResource (ContT r m) | |
| MonadResource m => MonadResource (ConduitM i o m) | |
| (Monoid w, MonadResource m) => MonadResource (RWST r w s m) | |
| (Monoid w, MonadResource m) => MonadResource (RWST r w s m) | |
| MonadResource m => MonadResource (Pipe l i o u m) |
class Monad m => MonadThrow m where
A Monad which can throw exceptions. Note that this does not work in a
vanilla ST or Identity monad. Instead, you should use the ExceptionT
transformer in your stack if you are dealing with a non-IO base monad.
Since 0.3.0
Methods
monadThrow :: Exception e => e -> m a
Instances
| MonadThrow [] | |
| MonadThrow IO | |
| MonadThrow Maybe | |
| MonadThrow (Either SomeException) | |
| MonadThrow m => MonadThrow (ResourceT m) | |
| Monad m => MonadThrow (ExceptionT m) | |
| MonadThrow m => MonadThrow (MaybeT m) | |
| MonadThrow m => MonadThrow (ListT m) | |
| MonadThrow m => MonadThrow (IdentityT m) | |
| (Monoid w, MonadThrow m) => MonadThrow (WriterT w m) | |
| (Monoid w, MonadThrow m) => MonadThrow (WriterT w m) | |
| MonadThrow m => MonadThrow (StateT s m) | |
| MonadThrow m => MonadThrow (StateT s m) | |
| MonadThrow m => MonadThrow (ReaderT r m) | |
| (Error e, MonadThrow m) => MonadThrow (ErrorT e m) | |
| MonadThrow m => MonadThrow (ContT r m) | |
| MonadThrow m => MonadThrow (ConduitM i o m) | |
| (Monoid w, MonadThrow m) => MonadThrow (RWST r w s m) | |
| (Monoid w, MonadThrow m) => MonadThrow (RWST r w s m) | |
| MonadThrow m => MonadThrow (Pipe l i o u m) |
class Monad m => MonadUnsafeIO m where
A Monad based on some monad which allows running of some IO actions,
via unsafe calls. This applies to IO and ST, for instance.
Since 0.3.0
Methods
unsafeLiftIO :: IO a -> m a
Instances
| MonadUnsafeIO IO | |
| (MonadTrans t, MonadUnsafeIO m, Monad (t m)) => MonadUnsafeIO (t m) | |
| MonadUnsafeIO (ST s) | |
| MonadUnsafeIO (ST s) |
runResourceT :: MonadBaseControl IO m => ResourceT m a -> m a
Unwrap a ResourceT transformer, and call all registered release actions.
Note that there is some reference counting involved due to resourceForkIO.
If multiple threads are sharing the same collection of resources, only the
last call to runResourceT will deallocate the resources.
Since 0.3.0
newtype ExceptionT m a
The express purpose of this transformer is to allow non-IO-based monad
stacks to catch exceptions via the MonadThrow typeclass.
Since 0.3.0
Constructors
| ExceptionT | |
Fields
| |
Instances
| MonadTrans ExceptionT | |
| MonadTransControl ExceptionT | |
| MonadRWS r w s m => MonadRWS r w s (ExceptionT m) | |
| MonadBase b m => MonadBase b (ExceptionT m) | |
| MonadBaseControl b m => MonadBaseControl b (ExceptionT m) | |
| MonadState s m => MonadState s (ExceptionT m) | |
| MonadWriter w m => MonadWriter w (ExceptionT m) | |
| MonadReader r m => MonadReader r (ExceptionT m) | |
| MonadError e m => MonadError e (ExceptionT m) | |
| Monad m => Monad (ExceptionT m) | |
| Monad m => Functor (ExceptionT m) | |
| Monad m => Applicative (ExceptionT m) | |
| MonadResource m => MonadResource (ExceptionT m) | |
| Monad m => MonadThrow (ExceptionT m) | |
| MonadIO m => MonadIO (ExceptionT m) | |
| MonadCont m => MonadCont (ExceptionT m) |
runExceptionT_ :: Monad m => ExceptionT m a -> m a
Same as runExceptionT, but immediately throw any exception returned.
Since 0.3.0
runException :: ExceptionT Identity a -> Either SomeException a
Run an ExceptionT Identity stack.
Since 0.4.2
runException_ :: ExceptionT Identity a -> a
Run an ExceptionT Identity stack, but immediately throw any exception returned.
Since 0.4.2
class MonadBase b m => MonadBaseControl b m | m -> b
Instances