conduit-0.5.1: Streaming data processing library.

Safe HaskellSafe-Infered

Data.Conduit

Contents

Synopsis

Conduit interface

Let's start off with a few simple examples of conduit usage. First, a file copy utility:

>>> :load Data.Conduit.Binary
>>> runResourceT $ sourceFile "input.txt" $$ sinkFile "output.txt"

runResourceT is a function provided by the resourcet package, and ensures that resources are properly cleaned up, even in the presence of exceptions. The type system will enforce that runResourceT is called as needed. The remainder of this tutorial will not discuss runResourceT; please see the documentation in resourcet for more information.

Looking at the rest of our example, there are three components to understand: sourceFile, sinkFile, and the $$ operator (called "connect"). These represent the most basic building blocks in conduit: a Source produces a stream of values, a Sink consumes such a stream, and $$ will combine these together.

In the case of file copying, there was no value produced by the Sink. However, often times a Sink will produce some result value. For example:

>>> :load Data.Conduit.List
>>> :module +Prelude
>>> sourceList [1..10] $$ fold (+) 0
55

sourceList is a convenience function for turning a list into a Source. fold implements a strict left fold for consuming the input stream.

The next major aspect to the conduit library: the Conduit type. This type represents a stream transformer. In order to use a Conduit, we must fuse it with either a Source or Sink. For example:

>>> :load Data.Conduit.List
>>> :module +Prelude
>>> sourceList [1..10] $= Data.Conduit.List.map (+1) $$ consume
[2,3,4,5,6,7,8,9,10,11]

Notice the addition of the $=, or left fuse operator. This combines a Source and a Conduit into a new Source, which can then be connected to a Sink (in this case, consume). We can similarly perform right fusion to combine a Conduit and Sink, or middle fusion to combine two Conduits.

A number of very common functions are provided in the Data.Conduit.List module. Many of these functions correspond very closely to standard Haskell functions.

In addition to connecting and fusing components together, we can also build up more sophisticated components through monadic composition. For example, to create a Sink that ignores the first 3 numbers and returns the sum of the remaining numbers, we can use:

>>> :load Data.Conduit.List
>>> :module +Prelude
>>> sourceList [1..10] $$ Data.Conduit.List.drop 3 >> fold (+) 0
49

In some cases, we might end up consuming more input than we needed, and want to provide that input to the next component in our monadic chain. We refer to this as leftovers. The simplest example of this is peek.

>>> :load Data.Conduit.List
>>> :set -XNoMonomorphismRestriction
>>> :module +Prelude
>>> let sink = do { first <- peek; total <- fold (+) 0; return (first, total) }
>>> sourceList [1..10] $$ sink
(Just 1,55)

Notice that, although we "consumed" the first value from the stream via peek, it was still available to fold. This idea becomes even more important when dealing with chunked data such as ByteStrings or Text.

Final note: Notice in the types below that Source, Sink, and Conduit are just type aliases. This will be explained later. Another important aspect is resource finalization, which will also be covered below.

type Source m o = Pipe () () o () m ()Source

Provides a stream of output values, without consuming any input or producing a final result.

Since 0.5.0

type Conduit i m o = Pipe i i o () m ()Source

Consumes a stream of input values and produces a stream of output values, without producing a final result.

Since 0.5.0

type Sink i m r = Pipe i i Void () m rSource

Consumes a stream of input values and produces a final result, without producing any output.

Since 0.5.0

Connect/fuse

It is important to understand the lifecycle of our components. Notice that we can connect or fuse two components together. When we do this, the component providing output is called upstream, and the component consuming this input is called downstream. We can have arbitrarily long chains of such fusion, so a single component can simultaneously function as upstream and downstream.

Each component can be in one of four states of operation at any given time:

  • It hasn't yet started operating.
  • It is providing output downstream.
  • It is waiting for input from upstream.
  • It has completed processing.

Let's use sourceFile and sinkFile as an example. When we run sourceFile input $$ sinkFile output, both components begin in the "not started" state. Next, we start running sinkFile (note: we always begin processing on the downstream component). sinkFile will open up the file, and then wait for input from upstream.

Next, we'll start running sourceFile, which will open the file, read some data from it, and provide it as output downstream. This will be fed to sinkFile (which was already waiting). sinkFile will write the data to a file, then ask for more input. This process will continue until sourceFile reaches the end of the input. It will close the file handle and switch to the completed state. When this happens, sinkFile is sent a signal that no more input is available. It will then close its file and return a result.

Now let's change things up a bit. Suppose we were instead connecting sourceFile to take 1. We start by running take 1, which will wait for some input. We'll then start sourceFile, which will open the file, read a chunk, and send it downstream. take 1 will take that single chunk and return it as a result. Once it does this, it has transitioned to the complete state.

We don't want to pull any more data from sourceFile, as we do not need it. So instead, we call sourceFile's finalizer. Each time upstream provides output, it also provides a finalizer to be run if downstream finishes processing.

One final case: suppose we connect sourceFile to return (). The latter does nothing: it immediately switches to the complete state. In this case, we never even start running sourceFile (it stays in the "not yet started" state), and so no finalization occurs.

So here are the takeaways from the above discussion:

  • When upstream completes before downstream, it cleans up all of its resources and sends some termination signal. We never think about upstream again. This can only occur while downstream is in the "waiting for input" state, since that is the only time that upstream is called.
  • When downstream completes before upstream, we finalize upstream immediately. This can only occur when upstream produces output, because that's the only time when control is passed back to downstream.
  • If downstream never awaits for input before it terminates, upstream was never started, and therefore it does not need to be finalized.

Note that all of the discussion above applies equally well to chains of components. If you have an upstream, middle, and downstream component, and downstream terminates, then the middle component will be finalized, which in turn will trigger upstream to be finalized. This setup ensures that we always have prompt resource finalization.

($$) :: Monad m => Source m a -> Sink a m b -> m bSource

The connect operator, which pulls data from a source and pushes to a sink. When either side closes, the other side will immediately be closed as well. If you would like to keep the Source open to be used for another operations, use the connect-and-resume operator $$+.

Since 0.4.0

($=) :: Monad m => Source m a -> Conduit a m b -> Source m bSource

Left fuse, combining a source and a conduit together into a new source.

Both the Source and Conduit will be closed when the newly-created Source is closed.

Leftover data from the Conduit will be discarded.

Since 0.4.0

(=$) :: Monad m => Conduit a m b -> Sink b m c -> Sink a m cSource

Right fuse, combining a conduit and a sink together into a new sink.

Both the Conduit and Sink will be closed when the newly-created Sink is closed.

Leftover data returned from the Sink will be discarded.

Since 0.4.0

(=$=) :: Monad m => Conduit a m b -> Conduit b m c -> Conduit a m cSource

Fusion operator, combining two Conduits together into a new Conduit.

Both Conduits will be closed when the newly-created Conduit is closed.

Leftover data returned from the right Conduit will be discarded.

Since 0.4.0

Pipe interface

We discussed three main types in the conduit package: Source, Sink, and Conduit. In fact, these are all unified into a single type, Pipe. This greatly simplifies the internal workings of this package, and makes it much easier to build more powerful components from simpler ones. For example, it is easy to combine a number of simple Sinks together to produce a more powerful Conduit. To create a Conduit which drops 3 input elements and doubles the rest, we could use:

>>> :load Data.Conduit.List
>>> :set -XNoMonomorphismRestriction
>>> let conduit = do { drop 3; map (Prelude.* 2) }
>>> sourceList [1..10] $$ conduit =$ consume
[8,10,12,14,16,18,20]

If we look again at our examples from above, we'll see a few different aspects to Pipes:

  • Sinks and Conduits can consume a stream of input values. Both map and fold took a stream of Ints, while sinkFile took a stream of ByteStrings.
  • Sources and Conduits can produce a stream of output values. sourceFile produced a stream of ByteStrings, which was then consumed by sinkFile. This is an important point in conduit: the output of the left-hand pipe (a.k.a., upstream) must match the input of the right-hand pipe (a.k.a., downstream).
  • All Pipes have some underlying Monad. The sourceFile and sinkFile functions needed to use MonadResource from resourcet to get exception handling, but our other functions could live in any monad. Since Pipe provides a MonadTrans instance, you can actually lift any action from the underlying Monad into your Pipe.
  • Sinks can provide a result type. Our fold returned a final Int, while sinkFile returned ().

A Pipe also exposes two other features as well, not covered by the above three types:

  • Each Pipe has some leftover value. Above, we described a situation where the leftover would be identical to the input type. However, Pipe provides a type parameter for this instead, so that you can alternatively set the leftover type to Void, thereby ensuring that a Pipe does not provide any leftover values. This is important for ensuring that leftover values aren't accidentally discarded.
  • Above, we described a situation where only Sinks could return results. However, sometimes it's advantageous to allow stream producers to also produce a result type. We call this the upstream result.

Putting this all together, a Pipe has six type parameters: Pipe l i o u m r, corresponding to each of the bullets above. Source, Conduit, and Sink are simply type aliases that restrict one or more of these type parameters to specific types. For example, both Source and Conduit have r restricted to (), since neither may return a result.

There are two ways that Pipes can be composed: via the Monad instance, and via fusion. (Note: connecting is just a special case of fusion, where the Pipe is then run. We'll discuss that more later on.) In the pipes package, these are referred to as vertical and horizontal composition, respectively. Let's clarify the distinction between these two:

Monadic composition takes two Pipes with the same input and output types, and combines them into a single Pipe. These two Pipes will be run one after the other, and they will share the same input and output streams. Essentially, the second Pipe will continue consuming input where the first left off, and the output streams of each will be concatenated. Any leftover values from the first Pipe will be fed to the second Pipe. Let's see a simple example:

>>> :load Data.Conduit.List
>>> sourceList [1..10] $$ do { x <- take 3; y <- take 2; return (x, y) }
([1,2,3],[4,5])

Fusion, on the other hand, will connect the output from an upstream Pipe to the input of a downstream Pipe. The upstream Pipe is required to have a result type of (), since any results it produces are thrown out. This form of composition produces a new Pipe with the input parameter of the upstream Pipe and the output and result parameters of the downstream Pipe. (For examples, see the initial examples on this page. Every usage of the connect or fusion operators is fusion composition.)

Note: If you are building a library of conduit functions, it is best to keep the type signatures as general as possible. For example, even though the simplest type signature for the drop function would be Int -> Sink i m (), this would prevent it from being used in construction of Conduits. Instead, we give it a type signature of Int -> Pipe l i o u m ().

data Pipe l i o u m r Source

The underlying datatype for all the types in this package. In has six type parameters:

  • l is the type of values that may be left over from this Pipe. A Pipe with no leftovers would use Void here, and one with leftovers would use the same type as the i parameter. Leftovers are automatically provided to the next Pipe in the monadic chain.
  • i is the type of values for this Pipe's input stream.
  • o is the type of values for this Pipe's output stream.
  • u is the result type from the upstream Pipe.
  • m is the underlying monad.
  • r is the result type.

A basic intuition is that every Pipe produces a stream of output values (o), and eventually indicates that this stream is terminated by sending a result (r). On the receiving end of a Pipe, these become the i and u parameters.

Since 0.5.0

Instances

MonadBase base m => MonadBase base (Pipe l i o u m) 
MonadTrans (Pipe l i o u) 
Monad m => Monad (Pipe l i o u m) 
Monad m => Functor (Pipe l i o u m) 
Monad m => Applicative (Pipe l i o u m) 
MonadIO m => MonadIO (Pipe l i o u m) 
Monad m => Monoid (Pipe l i o u m ()) 

(>+>) :: Monad m => Pipe l a b r0 m r1 -> Pipe Void b c r1 m r2 -> Pipe l a c r0 m r2Source

Fuse together two Pipes, connecting the output from the left to the input of the right.

Notice that the leftover parameter for the Pipes must be Void. This ensures that there is no accidental data loss of leftovers during fusion. If you have a Pipe with leftovers, you must first call injectLeftovers. For example:

>>> :load Data.Conduit.List
>>> :set -XNoMonomorphismRestriction
>>> let pipe = peek >>= \x -> fold (Prelude.+) 0 >>= \y -> return (x, y)
>>> runPipe $ sourceList [1..10] >+> injectLeftovers pipe
(Just 1,55)

Since 0.5.0

(<+<) :: Monad m => Pipe Void b c r1 m r2 -> Pipe l a b r0 m r1 -> Pipe l a c r0 m r2Source

Same as >+>, but reverse the order of the arguments.

Since 0.5.0

runPipe :: Monad m => Pipe Void () Void () m r -> m rSource

Run a pipeline until processing completes.

Since 0.5.0

injectLeftovers :: Monad m => Pipe i i o u m r -> Pipe l i o u m rSource

Transforms a Pipe that provides leftovers to one which does not, allowing it to be composed.

This function will provide any leftover values within this Pipe to any calls to await. If there are more leftover values than are demanded, the remainder are discarded.

Since 0.5.0

Primitives

While conduit provides a number of built-in Sources, Sinks, and Conduits, you will almost certainly want to construct some of your own. Previous versions recommended using the constructors directly. Beginning with 0.5, the recommended approach is to compose existing Pipes into larger ones.

It is certainly possible (and advisable!) to leverage existing Pipes- like those in Data.Conduit.List. However, you will often need to go to a lower level set of Pipes to start your composition. The following few functions should be sufficient for expressing all constructs besides finalization. Adding in bracketP and addCleanup, you should be able to create any Pipe you need. (In fact, that's precisely how the remainder of this package is written.)

The three basic operations are awaiting, yielding, and leftovers. Awaiting asks for a new value from upstream, or returns Nothing if upstream is done. For example:

>>> :load Data.Conduit.List
>>> sourceList [1..10] $$ await
Just 1
>>> :load Data.Conduit.List
>>> sourceList [] $$ await
Nothing

Similarly, we have a yield function, which provides a value to the downstream Pipe. yield features auto-termination: if the downstream Pipe has already completed processing, the upstream Pipe will stop processing when it tries to yield.

The upshot of this is that you can write code that appears to loop infinitely, and yet will terminate.

>>> :set -XNoMonomorphismRestriction
>>> let infinite = yield () >> infinite
>>> infinite $$ await
Just ()

Or for something a bit more sophisticated:

>>> let enumFrom' i = yield i >> enumFrom' (succ i)
>>> enumFrom' 1 $$ take 5
[1,2,3,4,5]

The final primitive Pipe is leftover. This allows you to return unused input to be used by the next Pipe in the monadic chain. A simple use case would be implementing the peek function:

>>> let peek = await >>= maybe (return Nothing) (\x -> leftover x >> return (Just x))
>>> enumFrom' 1 $$ do { mx <- peek; my <- await; mz <- await; return (mx, my, mz) }
(Just 1,Just 1,Just 2)

Note that you should only return leftovers that were previously yielded from upstream.

await :: Pipe l i o u m (Maybe i)Source

Wait for a single input value from upstream, terminating immediately if no data is available.

Since 0.5.0

awaitE :: Pipe l i o u m (Either u i)Source

This is similar to await, but will return the upstream result value as Left if available.

Since 0.5.0

awaitForever :: Monad m => (i -> Pipe l i o r m r') -> Pipe l i o r m rSource

Wait for input forever, calling the given inner Pipe for each piece of new input. Returns the upstream result type.

Since 0.5.0

yieldSource

Arguments

:: Monad m 
=> o

output value

-> Pipe l i o u m () 

Send a single output value downstream. If the downstream Pipe terminates, this Pipe will terminate as well.

Since 0.5.0

yieldOrSource

Arguments

:: Monad m 
=> o 
-> m ()

finalizer

-> Pipe l i o u m () 

Similar to yield, but additionally takes a finalizer to be run if the downstream Pipe terminates.

Since 0.5.0

leftover :: l -> Pipe l i o u m ()Source

Provide a single piece of leftover input to be consumed by the next pipe in the current monadic binding.

Note: it is highly encouraged to only return leftover values from input already consumed from upstream.

Since 0.5.0

Finalization

bracketP :: MonadResource m => IO a -> (a -> IO ()) -> (a -> Pipe l i o u m r) -> Pipe l i o u m rSource

Perform some allocation and run an inner Pipe. Two guarantees are given about resource finalization:

  1. It will be prompt. The finalization will be run as early as possible.
  2. It is exception safe. Due to usage of resourcet, the finalization will be run in the event of any exceptions.

Since 0.5.0

addCleanupSource

Arguments

:: Monad m 
=> (Bool -> m ())

True if Pipe ran to completion, False for early termination.

-> Pipe l i o u m r 
-> Pipe l i o u m r 

Add some code to be run when the given Pipe cleans up.

Since 0.4.1

Connect-and-resume

Sometimes, we do not want to force our entire application to live inside the Pipe monad. It can be convenient to keep normal control flow of our program, and incrementally apply data from a Source to various Sinks. A strong motivating example for this use case is interleaving multiple Sources, such as combining a conduit-powered HTTP server and client into an HTTP proxy.

Normally, when we run a Pipe, we get a result and can never run it again. Connect-and-resume allows us to connect a Source to a Sink until the latter completes, and then return the current state of the Source to be applied later. To do so, we introduce three new operators. Let' start off by demonstrating them:

>>> :load Data.Conduit.List
>>> (next, x) <- sourceList [1..10] $$+ take 5
>>> Prelude.print x
[1,2,3,4,5]
>>> (next, y) <- next $$++ (isolate 4 =$ fold (Prelude.+) 0)
>>> Prelude.print y
30
>>> next $$+- consume
[10]

data ResumableSource m o Source

A Source which has been started, but has not yet completed.

This type contains both the current state of the Source, and the finalizer to be run to close it.

Since 0.5.0

($$+) :: Monad m => Source m a -> Sink a m b -> m (ResumableSource m a, b)Source

The connect-and-resume operator. This does not close the Source, but instead returns it to be used again. This allows a Source to be used incrementally in a large program, without forcing the entire program to live in the Sink monad.

Mnemonic: connect + do more.

Since 0.5.0

($$++) :: Monad m => ResumableSource m a -> Sink a m b -> m (ResumableSource m a, b)Source

Continue processing after usage of $$+.

Since 0.5.0

($$+-) :: Monad m => ResumableSource m a -> Sink a m b -> m bSource

Complete processing of a ResumableSource. This will run the finalizer associated with the ResumableSource. In order to guarantee process resource finalization, you must use this operator after using $$+ and $$++.

Since 0.5.0

Utility functions

transPipe :: Monad m => (forall a. m a -> n a) -> Pipe l i o u m r -> Pipe l i o u n rSource

Transform the monad that a Pipe lives in.

Since 0.4.0

mapOutput :: Monad m => (o1 -> o2) -> Pipe l i o1 u m r -> Pipe l i o2 u m rSource

Apply a function to all the output values of a Pipe.

This mimics the behavior of fmap for a Source and Conduit in pre-0.4 days.

Since 0.4.1

mapOutputMaybe :: Monad m => (o1 -> Maybe o2) -> Pipe l i o1 u m r -> Pipe l i o2 u m rSource

Same as mapOutput, but use a function that returns Maybe values.

Since 0.5.0

mapInputSource

Arguments

:: Monad m 
=> (i1 -> i2)

map initial input to new input

-> (l2 -> Maybe l1)

map new leftovers to initial leftovers

-> Pipe l2 i2 o u m r 
-> Pipe l1 i1 o u m r 

Apply a function to all the input values of a Pipe.

Since 0.5.0

withUpstream :: Monad m => Pipe l i o u m r -> Pipe l i o u m (u, r)Source

Returns a tuple of the upstream and downstream results. Note that this will force consumption of the entire input stream.

Since 0.5.0

Generalized conduit types

It's recommended to keep your type signatures as general as possible to encourage reuse. For example, a theoretical signature for the head function would be:

 head :: Sink a m (Maybe a)

However, doing so would prevent usage of head from inside a Conduit, since a Sink sets its output type parameter to Void. The most general type signature would instead be:

 head :: Pipe l a o u m (Maybe a)

However, that signature is much more confusing. To bridge this gap, we also provide some generalized conduit types. They follow a simple naming convention:

  • They have the same name as their non-generalized types, with a G prepended.
  • If they have leftovers, we add an L.
  • If they consume the entirety of their input stream and return the upstream result, we add Inf to indicate infinite consumption.

type GSource m o = forall l i u. Pipe l i o u m ()Source

Generalized Source.

Since 0.5.0

type GSink i m r = forall l o u. Pipe l i o u m rSource

Generalized Sink without leftovers.

Since 0.5.0

type GLSink i m r = forall o u. Pipe i i o u m rSource

Generalized Sink with leftovers.

Since 0.5.0

type GInfSink i m = forall l o r. Pipe l i o r m rSource

Generalized Sink without leftovers returning upstream result.

Since 0.5.0

type GLInfSink i m = forall o r. Pipe i i o r m rSource

Generalized Sink with leftovers returning upstream result.

Since 0.5.0

type GConduit i m o = forall l u. Pipe l i o u m ()Source

Generalized conduit without leftovers.

Since 0.5.0

type GLConduit i m o = forall u. Pipe i i o u m ()Source

Generalized conduit with leftovers.

Since 0.5.0

type GInfConduit i m o = forall l r. Pipe l i o r m rSource

Generalized conduit without leftovers returning upstream result.

Since 0.5.0

type GLInfConduit i m o = forall r. Pipe i i o r m rSource

Generalized conduit with leftovers returning upstream result.

Since 0.5.0

Flushing

data Flush a Source

Provide for a stream of data that can be flushed.

A number of Conduits (e.g., zlib compression) need the ability to flush the stream at some point. This provides a single wrapper datatype to be used in all such circumstances.

Since 0.3.0

Constructors

Chunk a 
Flush 

Instances

Functor Flush 
Eq a => Eq (Flush a) 
Ord a => Ord (Flush a) 
Show a => Show (Flush a) 

Convenience re-exports

data ResourceT m a

The Resource transformer. This transformer keeps track of all registered actions, and calls them upon exit (via runResourceT). Actions may be registered via register, or resources may be allocated atomically via allocate. allocate corresponds closely to bracket.

Releasing may be performed before exit via the release function. This is a highly recommended optimization, as it will ensure that scarce resources are freed early. Note that calling release will deregister the action, so that a release action will only ever be called once.

Since 0.3.0

class (MonadThrow m, MonadUnsafeIO m, MonadIO m, Applicative m) => MonadResource m

A Monad which allows for safe resource allocation. In theory, any monad transformer stack included a ResourceT can be an instance of MonadResource.

Note: runResourceT has a requirement for a MonadBaseControl IO m monad, which allows control operations to be lifted. A MonadResource does not have this requirement. This means that transformers such as ContT can be an instance of MonadResource. However, the ContT wrapper will need to be unwrapped before calling runResourceT.

Since 0.3.0

class Monad m => MonadThrow m where

A Monad which can throw exceptions. Note that this does not work in a vanilla ST or Identity monad. Instead, you should use the ExceptionT transformer in your stack if you are dealing with a non-IO base monad.

Since 0.3.0

Methods

monadThrow :: Exception e => e -> m a

class Monad m => MonadUnsafeIO m where

A Monad based on some monad which allows running of some IO actions, via unsafe calls. This applies to IO and ST, for instance.

Since 0.3.0

Methods

unsafeLiftIO :: IO a -> m a

runResourceT :: MonadBaseControl IO m => ResourceT m a -> m a

Unwrap a ResourceT transformer, and call all registered release actions.

Note that there is some reference counting involved due to resourceForkIO. If multiple threads are sharing the same collection of resources, only the last call to runResourceT will deallocate the resources.

Since 0.3.0