conduit-1.0.0.1: Streaming data processing library.

Safe HaskellNone

Data.Conduit

Contents

Description

Let's start off with a few simple examples of conduit usage. First, a file copy utility:

>>> :load Data.Conduit.Binary
>>> runResourceT $ sourceFile "input.txt" $$ sinkFile "output.txt"

runResourceT is a function provided by the resourcet package, and ensures that resources are properly cleaned up, even in the presence of exceptions. The type system will enforce that runResourceT is called as needed. The remainder of this tutorial will not discuss runResourceT; please see the documentation in resourcet for more information.

Looking at the rest of our example, there are three components to understand: sourceFile, sinkFile, and the $$ operator (called "connect"). These represent the most basic building blocks in conduit: a Source produces a stream of values, a Sink consumes such a stream, and $$ will combine these together.

In the case of file copying, there was no value produced by the Sink. However, often times a Sink will produce some result value. For example:

>>> :load Data.Conduit.List
>>> :module +Prelude
>>> sourceList [1..10] $$ fold (+) 0
55

sourceList is a convenience function for turning a list into a Source. fold implements a strict left fold for consuming the input stream.

The next major aspect to the conduit library is the Conduit type. This type represents a stream transformer. In order to use a Conduit, we must fuse it with either a Source or Sink. For example:

>>> :load Data.Conduit.List
>>> :module +Prelude
>>> sourceList [1..10] $= Data.Conduit.List.map (+1) $$ consume
[2,3,4,5,6,7,8,9,10,11]

Notice the addition of the $=, or left fuse operator. This combines a Source and a Conduit into a new Source, which can then be connected to a Sink (in this case, consume). We can similarly perform right fusion to combine a Conduit and Sink, or middle fusion to combine two Conduits.

A number of very common functions are provided in the Data.Conduit.List module. Many of these functions correspond very closely to standard Haskell functions.

In addition to connecting and fusing components together, we can also build up more sophisticated components through monadic composition. For example, to create a Sink that ignores the first 3 numbers and returns the sum of the remaining numbers, we can use:

>>> :load Data.Conduit.List
>>> :module +Prelude
>>> sourceList [1..10] $$ Data.Conduit.List.drop 3 >> fold (+) 0
49

In some cases, we might end up consuming more input than we needed, and want to provide that input to the next component in our monadic chain. We refer to this as leftovers. The simplest example of this is peek.

>>> :load Data.Conduit.List
>>> :set -XNoMonomorphismRestriction
>>> :module +Prelude
>>> let sink = do { first <- peek; total <- fold (+) 0; return (first, total) }
>>> sourceList [1..10] $$ sink
(Just 1,55)

Notice that, although we "consumed" the first value from the stream via peek, it was still available to fold. This idea becomes even more important when dealing with chunked data such as ByteStrings or Text.

Final note: Notice in the types below that Source, Sink, and Conduit are just type aliases. This will be explained later. Another important aspect is resource finalization, which will also be covered below.

Synopsis

Conduit interface

type Source m o = ConduitM () o m ()Source

Provides a stream of output values, without consuming any input or producing a final result.

Since 0.5.0

type Conduit i m o = ConduitM i o m ()Source

Consumes a stream of input values and produces a stream of output values, without producing a final result.

Since 0.5.0

type Sink i m r = ConduitM i Void m rSource

Consumes a stream of input values and produces a final result, without producing any output.

Since 0.5.0

Connect/fuse

It is important to understand the lifecycle of our components. Notice that we can connect or fuse two components together. When we do this, the component providing output is called upstream, and the component consuming this input is called downstream. We can have arbitrarily long chains of such fusion, so a single component can simultaneously function as upstream and downstream.

Each component can be in one of four states of operation at any given time:

  • It hasn't yet started operating.
  • It is providing output downstream.
  • It is waiting for input from upstream.
  • It has completed processing.

Let's use sourceFile and sinkFile as an example. When we run sourceFile input $$ sinkFile output, both components begin in the "not started" state. Next, we start running sinkFile (note: we always begin processing on the downstream component). sinkFile will open up the file, and then wait for input from upstream.

Next, we'll start running sourceFile, which will open the file, read some data from it, and provide it as output downstream. This will be fed to sinkFile (which was already waiting). sinkFile will write the data to a file, then ask for more input. This process will continue until sourceFile reaches the end of the input. It will close the file handle and switch to the completed state. When this happens, sinkFile is sent a signal that no more input is available. It will then close its file and return a result.

Now let's change things up a bit. Suppose we were instead connecting sourceFile to take 1. We start by running take 1, which will wait for some input. We'll then start sourceFile, which will open the file, read a chunk, and send it downstream. take 1 will take that single chunk and return it as a result. Once it does this, it has transitioned to the complete state.

We don't want to pull any more data from sourceFile, as we do not need it. So instead, we call sourceFile's finalizer. Each time upstream provides output, it also provides a finalizer to be run if downstream finishes processing.

One final case: suppose we connect sourceFile to return (). The latter does nothing: it immediately switches to the complete state. In this case, we never even start running sourceFile (it stays in the "not yet started" state), and so no finalization occurs.

So here are the takeaways from the above discussion:

  • When upstream completes before downstream, it cleans up all of its resources and sends some termination signal. We never think about upstream again. This can only occur while downstream is in the "waiting for input" state, since that is the only time that upstream is called.
  • When downstream completes before upstream, we finalize upstream immediately. This can only occur when upstream produces output, because that's the only time when control is passed back to downstream.
  • If downstream never awaits for input before it terminates, upstream was never started, and therefore it does not need to be finalized.

Note that all of the discussion above applies equally well to chains of components. If you have an upstream, middle, and downstream component, and downstream terminates, then the middle component will be finalized, which in turn will trigger upstream to be finalized. This setup ensures that we always have prompt resource finalization.

($$) :: Monad m => Source m a -> Sink a m b -> m bSource

The connect operator, which pulls data from a source and pushes to a sink. When either side closes, the other side will immediately be closed as well. If you would like to keep the Source open to be used for another operations, use the connect-and-resume operator $$+.

Since 0.4.0

($=) :: Monad m => Source m a -> Conduit a m b -> Source m bSource

Left fuse, combining a source and a conduit together into a new source.

Both the Source and Conduit will be closed when the newly-created Source is closed.

Leftover data from the Conduit will be discarded.

Since 0.4.0

(=$) :: Monad m => Conduit a m b -> Sink b m c -> Sink a m cSource

Right fuse, combining a conduit and a sink together into a new sink.

Both the Conduit and Sink will be closed when the newly-created Sink is closed.

Leftover data returned from the Sink will be discarded.

Since 0.4.0

(=$=) :: Monad m => Conduit a m b -> ConduitM b c m r -> ConduitM a c m rSource

Fusion operator, combining two Conduits together into a new Conduit.

Both Conduits will be closed when the newly-created Conduit is closed.

Leftover data returned from the right Conduit will be discarded.

Since 0.4.0

Primitives

While conduit provides a number of built-in Sources, Sinks, and Conduits, you will almost certainly want to construct some of your own. Previous versions recommended using the constructors directly. Beginning with 0.5, the recommended approach is to compose existing Pipes into larger ones.

It is certainly possible (and advisable!) to leverage existing Pipes- like those in Data.Conduit.List. However, you will often need to go to a lower level set of Pipes to start your composition. The following few functions should be sufficient for expressing all constructs besides finalization. Adding in bracketP and addCleanup, you should be able to create any Pipe you need. (In fact, that's precisely how the remainder of this package is written.)

The three basic operations are awaiting, yielding, and leftovers. Awaiting asks for a new value from upstream, or returns Nothing if upstream is done. For example:

>>> :load Data.Conduit.List
>>> sourceList [1..10] $$ await
Just 1
>>> :load Data.Conduit.List
>>> sourceList [] $$ await
Nothing

Similarly, we have a yield function, which provides a value to the downstream Pipe. yield features auto-termination: if the downstream Pipe has already completed processing, the upstream Pipe will stop processing when it tries to yield.

The upshot of this is that you can write code that appears to loop infinitely, and yet will terminate.

>>> :set -XNoMonomorphismRestriction
>>> let infinite = yield () >> infinite
>>> infinite $$ await
Just ()

Or for something a bit more sophisticated:

>>> let enumFrom' i = yield i >> enumFrom' (succ i)
>>> enumFrom' 1 $$ take 5
[1,2,3,4,5]

The final primitive Pipe is leftover. This allows you to return unused input to be used by the next Pipe in the monadic chain. A simple use case would be implementing the peek function:

>>> let peek = await >>= maybe (return Nothing) (\x -> leftover x >> return (Just x))
>>> enumFrom' 1 $$ do { mx <- peek; my <- await; mz <- await; return (mx, my, mz) }
(Just 1,Just 1,Just 2)

Note that you should only return leftovers that were previously yielded from upstream.

awaitForever :: Monad m => (i -> ConduitM i o m r) -> ConduitM i o m ()Source

yieldSource

Arguments

:: Monad m 
=> o

output value

-> ConduitM i o m () 

yieldOrSource

Arguments

:: Monad m 
=> o 
-> m ()

finalizer

-> ConduitM i o m () 

leftover :: i -> ConduitM i o m ()Source

Finalization

bracketP :: MonadResource m => IO a -> (a -> IO ()) -> (a -> ConduitM i o m r) -> ConduitM i o m rSource

addCleanupSource

Arguments

:: Monad m 
=> (Bool -> m ())

True if Pipe ran to completion, False for early termination.

-> ConduitM i o m r 
-> ConduitM i o m r 

Connect-and-resume

Sometimes, we do not want to force our entire application to live inside the Pipe monad. It can be convenient to keep normal control flow of our program, and incrementally apply data from a Source to various Sinks. A strong motivating example for this use case is interleaving multiple Sources, such as combining a conduit-powered HTTP server and client into an HTTP proxy.

Normally, when we run a Pipe, we get a result and can never run it again. Connect-and-resume allows us to connect a Source to a Sink until the latter completes, and then return the current state of the Source to be applied later. To do so, we introduce three new operators. Let' start off by demonstrating them:

>>> :load Data.Conduit.List
>>> (next, x) <- sourceList [1..10] $$+ take 5
>>> Prelude.print x
[1,2,3,4,5]
>>> (next, y) <- next $$++ (isolate 4 =$ fold (Prelude.+) 0)
>>> Prelude.print y
30
>>> next $$+- consume
[10]

data ResumableSource m o Source

A Source which has been started, but has not yet completed.

This type contains both the current state of the Source, and the finalizer to be run to close it.

Since 0.5.0

($$+) :: Monad m => Source m a -> Sink a m b -> m (ResumableSource m a, b)Source

The connect-and-resume operator. This does not close the Source, but instead returns it to be used again. This allows a Source to be used incrementally in a large program, without forcing the entire program to live in the Sink monad.

Mnemonic: connect + do more.

Since 0.5.0

($$++) :: Monad m => ResumableSource m a -> Sink a m b -> m (ResumableSource m a, b)Source

Continue processing after usage of $$+.

Since 0.5.0

($$+-) :: Monad m => ResumableSource m a -> Sink a m b -> m bSource

Complete processing of a ResumableSource. This will run the finalizer associated with the ResumableSource. In order to guarantee process resource finalization, you must use this operator after using $$+ and $$++.

Since 0.5.0

unwrapResumable :: MonadIO m => ResumableSource m o -> m (Source m o, m ())Source

Unwraps a ResumableSource into a Source and a finalizer.

A ResumableSource represents a Source which has already been run, and therefore has a finalizer registered. As a result, if we want to turn it into a regular Source, we need to ensure that the finalizer will be run appropriately. By appropriately, I mean:

  • If a new finalizer is registered, the old one should not be called. * If the old one is called, it should not be called again.

This function returns both a Source and a finalizer which ensures that the above two conditions hold. Once you call that finalizer, the Source is invalidated and cannot be used.

Since 0.5.2

Utility functions

transPipe :: Monad m => (forall a. m a -> n a) -> ConduitM i o m r -> ConduitM i o n rSource

mapOutput :: Monad m => (o1 -> o2) -> ConduitM i o1 m r -> ConduitM i o2 m rSource

mapOutputMaybe :: Monad m => (o1 -> Maybe o2) -> ConduitM i o1 m r -> ConduitM i o2 m rSource

mapInputSource

Arguments

:: Monad m 
=> (i1 -> i2)

map initial input to new input

-> (i2 -> Maybe i1)

map new leftovers to initial leftovers

-> ConduitM i2 o m r 
-> ConduitM i1 o m r 

Generalized conduit types

It's recommended to keep your type signatures as general as possible to encourage reuse. For example, a theoretical signature for the head function would be:

 head :: Sink a m (Maybe a)

However, doing so would prevent usage of head from inside a Conduit, since a Sink sets its output type parameter to Void. The most general type signature would instead be:

 head :: Pipe l a o u m (Maybe a)

However, that signature is much more confusing. To bridge this gap, we also provide some generalized conduit types. They follow a simple naming convention:

  • They have the same name as their non-generalized types, with a G prepended.
  • If they have leftovers, we add an L.
  • If they consume the entirety of their input stream and return the upstream result, we add Inf to indicate infinite consumption.

type Producer m o = forall i. ConduitM i o m ()Source

Generalized Source.

Since 0.5.0

type Consumer i m r = forall o. ConduitM i o m rSource

Generalized Sink without leftovers.

Since 0.5.0

toProducer :: Monad m => Source m a -> Producer m aSource

Convert a Source into a Producer.

Since 1.0.0

toConsumer :: Monad m => Sink a m b -> Consumer a m bSource

Convert a Sink into a Consumer.

Since 1.0.0

Flushing

data Flush a Source

Provide for a stream of data that can be flushed.

A number of Conduits (e.g., zlib compression) need the ability to flush the stream at some point. This provides a single wrapper datatype to be used in all such circumstances.

Since 0.3.0

Constructors

Chunk a 
Flush 

Instances

Functor Flush 
Eq a => Eq (Flush a) 
Ord a => Ord (Flush a) 
Show a => Show (Flush a) 

Convenience re-exports

data ResourceT m a

The Resource transformer. This transformer keeps track of all registered actions, and calls them upon exit (via runResourceT). Actions may be registered via register, or resources may be allocated atomically via allocate. allocate corresponds closely to bracket.

Releasing may be performed before exit via the release function. This is a highly recommended optimization, as it will ensure that scarce resources are freed early. Note that calling release will deregister the action, so that a release action will only ever be called once.

Since 0.3.0

class (MonadThrow m, MonadUnsafeIO m, MonadIO m, Applicative m) => MonadResource m

A Monad which allows for safe resource allocation. In theory, any monad transformer stack included a ResourceT can be an instance of MonadResource.

Note: runResourceT has a requirement for a MonadBaseControl IO m monad, which allows control operations to be lifted. A MonadResource does not have this requirement. This means that transformers such as ContT can be an instance of MonadResource. However, the ContT wrapper will need to be unwrapped before calling runResourceT.

Since 0.3.0

class Monad m => MonadThrow m where

A Monad which can throw exceptions. Note that this does not work in a vanilla ST or Identity monad. Instead, you should use the ExceptionT transformer in your stack if you are dealing with a non-IO base monad.

Since 0.3.0

Methods

monadThrow :: Exception e => e -> m a

class Monad m => MonadUnsafeIO m where

A Monad based on some monad which allows running of some IO actions, via unsafe calls. This applies to IO and ST, for instance.

Since 0.3.0

Methods

unsafeLiftIO :: IO a -> m a

runResourceT :: MonadBaseControl IO m => ResourceT m a -> m a

Unwrap a ResourceT transformer, and call all registered release actions.

Note that there is some reference counting involved due to resourceForkIO. If multiple threads are sharing the same collection of resources, only the last call to runResourceT will deallocate the resources.

Since 0.3.0

newtype ExceptionT m a

The express purpose of this transformer is to allow non-IO-based monad stacks to catch exceptions via the MonadThrow typeclass.

Since 0.3.0

Constructors

ExceptionT 

runExceptionT_ :: Monad m => ExceptionT m a -> m a

Same as runExceptionT, but immediately throw any exception returned.

Since 0.3.0

runException :: ExceptionT Identity a -> Either SomeException a

Run an ExceptionT Identity stack.

Since 0.4.2

runException_ :: ExceptionT Identity a -> a

Run an ExceptionT Identity stack, but immediately throw any exception returned.

Since 0.4.2