# tokenizer-streaming

**Motivation**: You might have stumpled upon the package tokenizer-monad. It
is another project by me, for writing tokenizers that act on pure
text/strings. However, there are situations when you cannot keep all the text
in memory. You might want to tokenize text from network streams or from large
corpus files.

**Main idea**: A monad transformer called `TokenizerT` implements exactly the
same methods as `Tokenizer` from tokenizer-monad, such that all tokenizers can
be ported without code changes (if you used `MonadTokenizer` in the type signatures)

## Supported text types

 - streams of Char lists can be tokenized into streams of Char lists
 - streams of strict Text can be tokenized into streams of strict Text
 - streams of lazy Text can be tokenized into streams of lazy Text
 - streams of strict ASCII ByteStrings can be tokenized into streams of strict ASCII ByteStrings
 - streams of lazy ASCII ByteStrings can be tokenized into streams of lazy ASCII ByteStrings
 - bytestring streams (from streaming-bytestring) with Unicode encodings (UTF-8, UTF-16 LE & BE, UTF-32 LE & BE) can be tokenized into streams of strict Text