Monadic and General Iteratees: incremental input parsers, processors and transformers
The running example, parts 1 and 2 Part 1 is reading the headers, the sequence of lines terminated by an empty line. Each line is terminated by CR, LF, or CRLF. We should return the headers in order. In the case of error, we should return the headers read so far and the description of the error. Part 2 is reading the headers and reading all the lines from the HTTP-chunk-encoded content that follows the headers. Part 2 thus verifies layering of streams, and processing of one stream embedded (chunk encoded) into another stream.
- data StreamG a
- type Stream = StreamG Char
- data IterateeG el m a
- = IE_done a (StreamG el)
- | IE_cont (StreamG el -> IterateeGM el m a)
- newtype IterateeGM el m a = IM {}
- type Iteratee m a = IterateeG Char m a
- type IterateeM m a = IterateeGM Char m a
- liftI :: Monad m => IterateeG el m a -> IterateeGM el m a
- (>>==) :: Monad m => IterateeGM el m a -> (IterateeG el m a -> IterateeGM el' m b) -> IterateeGM el' m b
- (==<<) :: Monad m => (IterateeG el m a -> IterateeGM el' m b) -> IterateeGM el m a -> IterateeGM el' m b
- joinI :: Monad m => IterateeGM el m (IterateeG el' m a) -> IterateeGM el m a
- stream2list :: Monad m => IterateeGM el m [el]
- iter_report_err :: Monad m => IterateeGM el m (Maybe String)
- sbreak :: Monad m => (el -> Bool) -> IterateeGM el m ([el], Maybe el)
- sdropWhile :: Monad m => (el -> Bool) -> IterateeGM el m ()
- snext :: Monad m => IterateeGM el m (Maybe el)
- speek :: Monad m => IterateeGM el m (Maybe el)
- skip_till_eof :: Monad m => IterateeGM el m ()
- sdrop :: Monad m => Int -> IterateeGM el m ()
- type EnumeratorN el_outer el_inner m a = IterateeG el_inner m a -> IterateeGM el_outer m (IterateeG el_inner m a)
- stake :: Monad m => Int -> EnumeratorN el el m a
- map_stream :: Monad m => (el -> el') -> EnumeratorN el el' m a
- conv_stream :: Monad m => IterateeGM el m (Maybe [el']) -> EnumeratorN el el' m a
- type Line = String
- line :: Monad m => IterateeM m (Either Line Line)
- print_lines :: IterateeGM Line IO ()
- enum_lines :: Monad m => EnumeratorN Char Line m a
- enum_words :: Monad m => EnumeratorN Char String m a
- type EnumeratorGM el m a = IterateeG el m a -> IterateeGM el m a
- type EnumeratorM m a = EnumeratorGM Char m a
- enum_eof :: Monad m => EnumeratorGM el m a
- enum_err :: Monad m => String -> EnumeratorGM el m a
- (>.) :: Monad m => EnumeratorGM el m a -> EnumeratorGM el m a -> EnumeratorGM el m a
- enum_pure_1chunk :: Monad m => [el] -> EnumeratorGM el m a
- enum_pure_nchunk :: Monad m => [el] -> Int -> EnumeratorGM el m a
- enum_fd :: Fd -> EnumeratorM IO a
- enum_file :: FilePath -> EnumeratorM IO a
- enum_chunk_decoded :: Monad m => Iteratee m a -> IterateeM m a
Documentation
A stream is a (continuing) sequence of elements bundled in Chunks. The first two variants indicate termination of the stream. Chunk [a] gives the currently available part of the stream. The stream is not terminated yet. The case (Chunk []) signifies a stream with no currently available data but which is still continuing. A stream processor should, informally speaking, ``suspend itself'' and wait for more data to arrive. Later on, we can add another variant: IE_block (Ptr CChar) CSize so we could parse right from the buffer.
type Stream = StreamG CharSource
A particular instance of StreamG: the stream of characters. This stream is used by many input parsers.
Iteratee -- a generic stream processor, what is being folded over
a stream
When Iteratee is in the done
state, it contains the computed
result and the remaining part of the stream.
In the cont
state, the iteratee has not finished the computation
and needs more input.
We assume that all iteratees are good
-- given bounded input,
they do the bounded amount of computation and take the bounded amount
of resources. The monad m describes the sort of computations done
by the iteratee as it processes the stream. The monad m could be
the identity monad (for pure computations) or the IO monad
(to let the iteratee store the stream processing results as they
are computed).
We also assume that given a terminated stream, an iteratee
moves to the done state, so the results computed so far could be returned.
We could have used existentials instead, by doing the closure conversion
IE_done a (StreamG el) | |
IE_cont (StreamG el -> IterateeGM el m a) |
newtype IterateeGM el m a Source
MonadTrans (IterateeGM el) | |
Monad m => Monad (IterateeGM el m) | It turns out, IterateeGM form a monad. We can use the familiar do notation for composing Iteratees |
type IterateeM m a = IterateeGM Char m aSource
liftI :: Monad m => IterateeG el m a -> IterateeGM el m aSource
Useful combinators for implementing iteratees and enumerators
(>>==) :: Monad m => IterateeGM el m a -> (IterateeG el m a -> IterateeGM el' m b) -> IterateeGM el' m bSource
Just like bind (at run-time, this is indeed exactly bind)
(==<<) :: Monad m => (IterateeG el m a -> IterateeGM el' m b) -> IterateeGM el m a -> IterateeGM el' m bSource
Just like an application -- a call-by-value-like application
joinI :: Monad m => IterateeGM el m (IterateeG el' m a) -> IterateeGM el m aSource
The following is a variant
of join in the IterateeGM el m monad.
When el' is the same as el, the type of joinI is indeed that of
true monadic join. However, joinI is subtly different: since
generally el' is different from el, it makes no sense to
continue using the internal, IterateeG el' m a: we no longer
have elements of the type el' to feed to that iteratee.
We thus send EOF to the internal Iteratee and propagate its result.
This join function is useful when dealing with `derived iteratees'
for embedded/nested streams. In particular, joinI is useful to
process the result of stake, map_stream, or conv_stream below.
stream2list :: Monad m => IterateeGM el m [el]Source
Read a stream to the end and return all of its elements as a list
iter_report_err :: Monad m => IterateeGM el m (Maybe String)Source
Check to see if the stream is in error
sbreak :: Monad m => (el -> Bool) -> IterateeGM el m ([el], Maybe el)Source
The analogue of List.break
It takes an element predicate and returns a pair:
(str, Just c) -- the element c
is the first element of the stream
satisfying the break predicate;
The list str is the prefix of the stream up
to but including c
(str,Nothing) -- The stream is terminated with EOF or error before
any element satisfying the break predicate was found.
str is the scanned part of the stream.
None of the element in str satisfy the break predicate.
sdropWhile :: Monad m => (el -> Bool) -> IterateeGM el m ()Source
A particular optimized case of the above: skip all elements of the stream satisfying the given predicate -- until the first element that does not satisfy the predicate, or the end of the stream. This is the analogue of List.dropWhile
snext :: Monad m => IterateeGM el m (Maybe el)Source
Attempt to read the next element of the stream Return (Just c) if successful, return Nothing if the stream is terminated (by EOF or an error)
speek :: Monad m => IterateeGM el m (Maybe el)Source
Look ahead at the next element of the stream, without removing it from the stream. Return (Just c) if successful, return Nothing if the stream is terminated (by EOF or an error)
skip_till_eof :: Monad m => IterateeGM el m ()Source
Skip the rest of the stream
sdrop :: Monad m => Int -> IterateeGM el m ()Source
Skip n elements of the stream, if there are that many This is the analogue of List.drop
type EnumeratorN el_outer el_inner m a = IterateeG el_inner m a -> IterateeGM el_outer m (IterateeG el_inner m a)Source
Iteratee converters for stream embedding
The converters show a different way of composing two iteratees:
vertical
rather than horizontal
The type of the converter from the stream with elements el_outer to the stream with element el_inner. The result is the iteratee for the outer stream that uses an `IterateeG el_inner m a' to process the embedded, inner stream as it reads the outer stream.
stake :: Monad m => Int -> EnumeratorN el el m aSource
Read n elements from a stream and apply the given iteratee to the stream of the read elements. Unless the stream is terminated early, we read exactly n elements (even if the iteratee has accepted fewer).
map_stream :: Monad m => (el -> el') -> EnumeratorN el el' m aSource
Map the stream: yet another iteratee transformer Given the stream of elements of the type el and the function el->el', build a nested stream of elements of the type el' and apply the given iteratee to it. Note the contravariance
conv_stream :: Monad m => IterateeGM el m (Maybe [el']) -> EnumeratorN el el' m aSource
Convert one stream into another, not necessarily in lockstep
The transformer map_stream maps one element of the outer stream
to one element of the nested stream. The transformer below is more
general: it may take several elements of the outer stream to produce
one element of the inner stream, or the other way around.
The transformation from one stream to the other is specified as
IterateeGM el m (Maybe [el']). The Maybe
type reflects the
possibility of the conversion error.
Combining the primitive iteratees to solve the running problem: Reading headers and the content from an HTTP-like stream
line :: Monad m => IterateeM m (Either Line Line)Source
Read the line of text from the stream The line can be terminated by CR, LF or CRLF. Return (Right Line) if successful. Return (Left Line) if EOF or a stream error were encountered before the terminator is seen. The returned line is the string read so far.
The code is the same as that of pure Iteratee, only the signature has changed. Compare the code below with GHCBufferIO.line_lazy
print_lines :: IterateeGM Line IO ()Source
Line iteratees: processors of a stream whose elements are made of Lines
Collect all read lines and return them as a list see stream2list
Print lines as they are received. This is the first impure
iteratee
with non-trivial actions during chunk processing
enum_lines :: Monad m => EnumeratorN Char Line m aSource
Convert the stream of characters to the stream of lines, and apply the given iteratee to enumerate the latter. The stream of lines is normally terminated by the empty line. When the stream of characters is terminated, the stream of lines is also terminated, abnormally. This is the first proper iteratee-enumerator: it is the iteratee of the character stream and the enumerator of the line stream. More generally, we could have used conv_stream to implement enum_lines.
enum_words :: Monad m => EnumeratorN Char String m aSource
Convert the stream of characters to the stream of words, and apply the given iteratee to enumerate the latter. Words are delimited by white space. This is the analogue of List.words It is instructive to compare the code below with the code of List.words, which is:
words :: String -> [String] words s = case dropWhile isSpace s of "" -> [] s' -> w : words s'' where (w, s'') = break isSpace s'
One should keep in mind that enum_words is a more general, monadic function. More generally, we could have used conv_stream to implement enum_words.
type EnumeratorGM el m a = IterateeG el m a -> IterateeGM el m aSource
Enumerators Each enumerator takes an iteratee and returns an iteratee an Enumerator is an iteratee transformer. The enumerator normally stops when the stream is terminated or when the iteratee moves to the done state, whichever comes first. When to stop is of course up to the enumerator...
We have two choices of composition: compose iteratees or compose enumerators. The latter is useful when one iteratee reads from the concatenation of two data sources.
type EnumeratorM m a = EnumeratorGM Char m aSource
enum_eof :: Monad m => EnumeratorGM el m aSource
The most primitive enumerator: applies the iteratee to the terminated stream. The result is the iteratee usually in the done state.
enum_err :: Monad m => String -> EnumeratorGM el m aSource
Another primitive enumerator: report an error
(>.) :: Monad m => EnumeratorGM el m a -> EnumeratorGM el m a -> EnumeratorGM el m aSource
The composition of two enumerators: essentially the functional composition It is convenient to flip the order of the arguments of the composition though: in e1 >. e2, e1 is executed first
enum_pure_1chunk :: Monad m => [el] -> EnumeratorGM el m aSource
The pure 1-chunk enumerator It passes a given list of elements to the iteratee in one chunk This enumerator does no IO and is useful for testing of base parsing
enum_pure_nchunk :: Monad m => [el] -> Int -> EnumeratorGM el m aSource
The pure n-chunk enumerator It passes a given lift of elements to the iteratee in n chunks This enumerator does no IO and is useful for testing of base parsing and handling of chunk boundaries
enum_fd :: Fd -> EnumeratorM IO aSource
The enumerator of a POSIX Fd Unlike fdRead (which allocates a new buffer on each invocation), we use the same buffer all throughout
enum_file :: FilePath -> EnumeratorM IO aSource
enum_chunk_decoded :: Monad m => Iteratee m a -> IterateeM m aSource
HTTP chunk decoding Each chunk has the following format:
<chunk-size> CRLF <chunk-data> CRLF
where chunk-size is the hexadecimal number; chunk-data is a sequence of chunk-size bytes. The last chunk (so-called EOF chunk) has the format 0 CRLF CRLF (where 0 is an ASCII zero, a character with the decimal code 48). For more detail, see Chunked Transfer Coding, Sec 3.6.1 of the HTTP/1.1 standard: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
The following enum_chunk_decoded has the signature of the enumerator of the nested (encapsulated and chunk-encoded) stream. It receives an iteratee for the embedded stream and returns the iteratee for the base, embedding stream. Thus what is an enumerator and what is an iteratee may be a matter of perspective.
We have a decision to make: Suppose an iteratee has finished (either because it obtained all needed data or encountered an error that makes further processing meaningless). While skipping the rest of the stream/the trailer, we encountered a framing error (e.g., missing CRLF after chunk data). What do we do? We chose to disregard the latter problem. Rationale: when the iteratee has finished, we are in the process of skipping up to the EOF (draining the source). Disregarding the errors seems OK then. Also, the iteratee may have found an error and decided to abort further processing. Flushing the remainder of the input is reasonable then. One can make a different choice...