\section{Types and instances}

\subsection{Input streams}

A {\tt Stream} is a sequence of chunks generated by an enumerator or
enumeratee. Chunks might be composite values, such as a string, or atomic,
such as a parser event. Allowing a stream to support multiple chunks
slightly complicates iteratee and enumeratee implementation, but greatly
simplifies handling of leftover inputs.

{\tt (Chunks [])} is a legal value, used when a stream is still active but
no data is currently available. Iteratees and enumeratees often special-case
empty chunks for performance reasons, though they're not required to.

:d types and instances
|apidoc Data.Enumerator.Stream|
data Stream a
	= Chunks [a]
	| EOF
	deriving (Show, Eq)

instance Monad Stream where
	return = Chunks . return
	Chunks xs >>= f = mconcat (fmap f xs)
	EOF >>= _ = EOF
:

The {\tt Monoid} instance deserves some special attention, because it has
the unexpected behavior that {\tt mappend EOF (Chunks []) == EOF}. Although
it's reasonable that appending chunks to an {\sc eof} stream should provide
a valid stream, such behavior would violate the monoid laws.

:d types and instances
instance Monoid (Stream a) where
	mempty = Chunks mempty
	mappend (Chunks xs) (Chunks ys) = Chunks (xs ++ ys)
	mappend _ _ = EOF
:

\subsection{Iteratees}

The primary data type for this library is {\tt Iteratee}, which consumes
input until it either generates a value or encounters an error. Rather
than requiring all input at once, an iteratee will return {\tt Continue}
when it is capable of processing more data.

In general, iteratees begin in the {\tt Continue} state. As each chunk is
passed to the continuation, the iteratee may return the next step, which is
one of:

\begin{itemize}
\item {\tt Continue}: The iteratee is capable of accepting more input. Note
that more input is not required; the iteratee might be able to generate a
value immediately if the stream ends.

\item {\tt Yield}: The iteratee has received enough input to generate a
result. Included in this value is left-over input, which can be passed to
the next iteratee.

\item {\tt Error}: The iteratee encountered an error which prevents it from
proceeding further.
\end{itemize}

:d types and instances
data Step a m b
	|apidoc Data.Enumerator.Continue|
	= Continue (Stream a -> Iteratee a m b)
	
	|apidoc Data.Enumerator.Yield|
	| Yield b (Stream a)
	
	|apidoc Data.Enumerator.Error|
	| Error Exc.SomeException

|apidoc Data.Enumerator.Iteratee|
newtype Iteratee a m b = Iteratee
	{ runIteratee :: m (Step a m b)
	}
:

The pattern {\tt Iteratee (return (} \ldots{\tt ))} shows up a lot, so I define a
couple simple wrappers to save typing:

:d primitives
|apidoc Data.Enumerator.returnI|
returnI :: Monad m => Step a m b -> Iteratee a m b
returnI step = Iteratee (return step)

|apidoc Data.Enumerator.yield|
yield :: Monad m => b -> Stream a -> Iteratee a m b
yield x extra = returnI (Yield x extra)

|apidoc Data.Enumerator.continue|
continue :: Monad m => (Stream a -> Iteratee a m b) -> Iteratee a m b
continue k = returnI (Continue k)
:

\subsection*{Monad instances}

Iteratees are monads; by sequencing iteratees, very complex processing may
be applied to arbitrary input streams. Iteratees are also applicative
functors and monad transformers.

:d types and instances
instance Monad m => Monad (Iteratee a m) where
	return x = yield x (Chunks [])
	|optimized iteratee bind|
:

Because iteratees are often used for high-performance software, it is
important that the {\tt (>>=)} method be very efficient. We use a couple
magic-ish features here:

\begin{itemize}
\item First, the whole {\tt (>>=)} is worker-wrapper transformed so {\tt f}
can be cached when working with {\tt Continue}. This prevents a potential
space leak when working with infinite streams.

\item Second, the worker is rendered anonymous with {\tt fix}, so it doesn't
incur any additional time overhead.
\end{itemize}

The end result is a bind implementation with time performance equivalent to
the standard definition, but with significantly reduced memory allocation
rates.

:d optimized iteratee bind
m0 >>= f = ($ m0) $ fix $
	\bind m -> Iteratee $ runIteratee m >>= \r1 ->
		case r1 of
			Continue k -> return (Continue (bind . k))
			Error err -> return (Error err)
			Yield x (Chunks []) -> runIteratee (f x)
			Yield x extra -> runIteratee (f x) >>= \r2 ->
				case r2 of
					Continue k -> runIteratee (k extra)
					Error err -> return (Error err)
					Yield x' _ -> return (Yield x' extra)
:

Most iteratees are used to wrap \io{} operations, so it's sensible to define
instances for typeclasses from {\tt transformers}.

:d types and instances
instance MonadTrans (Iteratee a) where
	lift m = Iteratee (m >>= runIteratee . return)

instance MonadIO m => MonadIO (Iteratee a m) where
	liftIO = lift . liftIO
:

\subsection{Enumerators}

Enumerators typically read from an external source (parser, handle, random
number generator, etc). They feed chunks into an iteratee until the source
runs out of data (triggering {\tt EOF}) or the iteratee finishes processing
(yields a value).

Since {\tt Iteratee} is an alias for {\tt m (Step a m b)}, enumerators can
also be considered step transformers of type
{\tt Step a m b -> m (Step a m b)}.

:d types and instances
|apidoc Data.Enumerator.Enumerator|
type Enumerator a m b = Step a m b -> Iteratee a m b
:

Although enumerators can be encoded as a simple step transformer with the
type {\tt Step a m b -> Step a m b}, encoding as a computation allows easier
reasoning about the order of side effects. Consider the case of enumerating
two files:

:d enumerator example
let iterFoo = enumFile "foo.txt" iterWhatever
let iterBar = enumFile "bar.txt" iterFoo
:

It's impossible to determine, merely by looking at these lines, which file
will be opened first. In fact, depending on the implementation of
{\tt enumFile}, both files might be open at the same time. If enumerators
return monadic values, the order of events is more clear:

:d enumerator example
iterFoo <- enumFile "foo.txt" iterWhatever
iterBar <- enumFile "bar.txt" iterFoo
:

\subsection{Enumeratees}

In cases where an enumerator acts as both a source and sink, the resulting
type is named an {\tt Enumeratee}. Enumeratees have two input types,
``outer a'' ({\tt ao}) and ``inner a'' ({\tt ai}).

Enumeratees are encoded as an iteratee stack. The outer iteratee reads from
a stream of \emph{ao} values, transforms them into \emph{ai}, and passes them
to an inner iteratee. This model allows a single outer input to generate many
inner inputs, and vice-versa.

:d types and instances
|apidoc Data.Enumerator.Enumeratee|
type Enumeratee ao ai m b = Step ai m b -> Iteratee ao m (Step ai m b)
: