\section{Types and instances} \subsection{Input streams} A {\tt Stream} is a sequence of chunks generated by an enumerator or enumeratee. Chunks might be composite values, such as a string, or atomic, such as a parser event. Allowing a stream to support multiple chunks slightly complicates iteratee and enumeratee implementation, but greatly simplifies handling of leftover inputs. {\tt (Chunks [])} is a legal value, used when a stream is still active but no data is currently available. Iteratees and enumeratees often special-case empty chunks for performance reasons, though they're not required to. :d types and instances |apidoc Data.Enumerator.Stream| data Stream a = Chunks [a] | EOF deriving (Show, Eq) instance Monad Stream where return = Chunks . return Chunks xs >>= f = mconcat (fmap f xs) EOF >>= _ = EOF : The {\tt Monoid} instance deserves some special attention, because it has the unexpected behavior that {\tt mappend EOF (Chunks []) == EOF}. Although it's reasonable that appending chunks to an {\sc eof} stream should provide a valid stream, such behavior would violate the monoid laws. :d types and instances instance Monoid (Stream a) where mempty = Chunks mempty mappend (Chunks xs) (Chunks ys) = Chunks (xs ++ ys) mappend _ _ = EOF : \subsection{Iteratees} The primary data type for this library is {\tt Iteratee}, which consumes input until it either generates a value or encounters an error. Rather than requiring all input at once, an iteratee will return {\tt Continue} when it is capable of processing more data. In general, iteratees begin in the {\tt Continue} state. As each chunk is passed to the continuation, the iteratee may return the next step, which is one of: \begin{itemize} \item {\tt Continue}: The iteratee is capable of accepting more input. Note that more input is not required; the iteratee might be able to generate a value immediately if the stream ends. \item {\tt Yield}: The iteratee has received enough input to generate a result. Included in this value is left-over input, which can be passed to the next iteratee. \item {\tt Error}: The iteratee encountered an error which prevents it from proceeding further. \end{itemize} :d types and instances data Step a m b |apidoc Data.Enumerator.Continue| = Continue (Stream a -> Iteratee a m b) |apidoc Data.Enumerator.Yield| | Yield b (Stream a) |apidoc Data.Enumerator.Error| | Error Exc.SomeException |apidoc Data.Enumerator.Iteratee| newtype Iteratee a m b = Iteratee { runIteratee :: m (Step a m b) } : The pattern {\tt Iteratee (return (} \ldots{\tt ))} shows up a lot, so I define a couple simple wrappers to save typing: :d primitives |apidoc Data.Enumerator.returnI| returnI :: Monad m => Step a m b -> Iteratee a m b returnI step = Iteratee (return step) |apidoc Data.Enumerator.yield| yield :: Monad m => b -> Stream a -> Iteratee a m b yield x extra = returnI (Yield x extra) |apidoc Data.Enumerator.continue| continue :: Monad m => (Stream a -> Iteratee a m b) -> Iteratee a m b continue k = returnI (Continue k) : \subsection*{Monad instances} Iteratees are monads; by sequencing iteratees, very complex processing may be applied to arbitrary input streams. Iteratees are also applicative functors and monad transformers. :d types and instances instance Monad m => Monad (Iteratee a m) where return x = yield x (Chunks []) |optimized iteratee bind| : Because iteratees are often used for high-performance software, it is important that the {\tt (>>=)} method be very efficient. We use a couple magic-ish features here: \begin{itemize} \item First, the whole {\tt (>>=)} is worker-wrapper transformed so {\tt f} can be cached when working with {\tt Continue}. This prevents a potential space leak when working with infinite streams. \item Second, the worker is rendered anonymous with {\tt fix}, so it doesn't incur any additional time overhead. \end{itemize} The end result is a bind implementation with time performance equivalent to the standard definition, but with significantly reduced memory allocation rates. :d optimized iteratee bind m0 >>= f = ($ m0) $ fix $ \bind m -> Iteratee $ runIteratee m >>= \r1 -> case r1 of Continue k -> return (Continue (bind . k)) Error err -> return (Error err) Yield x (Chunks []) -> runIteratee (f x) Yield x extra -> runIteratee (f x) >>= \r2 -> case r2 of Continue k -> runIteratee (k extra) Error err -> return (Error err) Yield x' _ -> return (Yield x' extra) : Most iteratees are used to wrap \io{} operations, so it's sensible to define instances for typeclasses from {\tt transformers}. :d types and instances instance MonadTrans (Iteratee a) where lift m = Iteratee (m >>= runIteratee . return) instance MonadIO m => MonadIO (Iteratee a m) where liftIO = lift . liftIO : \subsection{Enumerators} Enumerators typically read from an external source (parser, handle, random number generator, etc). They feed chunks into an iteratee until the source runs out of data (triggering {\tt EOF}) or the iteratee finishes processing (yields a value). Since {\tt Iteratee} is an alias for {\tt m (Step a m b)}, enumerators can also be considered step transformers of type {\tt Step a m b -> m (Step a m b)}. :d types and instances |apidoc Data.Enumerator.Enumerator| type Enumerator a m b = Step a m b -> Iteratee a m b : Although enumerators can be encoded as a simple step transformer with the type {\tt Step a m b -> Step a m b}, encoding as a computation allows easier reasoning about the order of side effects. Consider the case of enumerating two files: :d enumerator example let iterFoo = enumFile "foo.txt" iterWhatever let iterBar = enumFile "bar.txt" iterFoo : It's impossible to determine, merely by looking at these lines, which file will be opened first. In fact, depending on the implementation of {\tt enumFile}, both files might be open at the same time. If enumerators return monadic values, the order of events is more clear: :d enumerator example iterFoo <- enumFile "foo.txt" iterWhatever iterBar <- enumFile "bar.txt" iterFoo : \subsection{Enumeratees} In cases where an enumerator acts as both a source and sink, the resulting type is named an {\tt Enumeratee}. Enumeratees have two input types, ``outer a'' ({\tt ao}) and ``inner a'' ({\tt ai}). Enumeratees are encoded as an iteratee stack. The outer iteratee reads from a stream of \emph{ao} values, transforms them into \emph{ai}, and passes them to an inner iteratee. This model allows a single outer input to generate many inner inputs, and vice-versa. :d types and instances |apidoc Data.Enumerator.Enumeratee| type Enumeratee ao ai m b = Step ai m b -> Iteratee ao m (Step ai m b) :