# tokenizer-streaming **Motivation**: You might have stumpled upon the package tokenizer-monad. It is another project by me, for writing tokenizers that act on pure text/strings. However, there are situations when you cannot keep all the text in memory. You might want to tokenize text from network streams or from large corpus files. **Main idea**: A monad transformer called `TokenizerT` implements exactly the same methods as `Tokenizer` from tokenizer-monad, such that all tokenizers can be ported without code changes (if you used `MonadTokenizer` in the type signatures) ## Supported text types - streams of Char lists can be tokenized into streams of Char lists - streams of strict Text can be tokenized into streams of strict Text - streams of lazy Text can be tokenized into streams of lazy Text - streams of strict ASCII ByteStrings can be tokenized into streams of strict ASCII ByteStrings - streams of lazy ASCII ByteStrings can be tokenized into streams of lazy ASCII ByteStrings - bytestring streams (from streaming-bytestring) with Unicode encodings (UTF-8, UTF-16 LE & BE, UTF-32 LE & BE) can be tokenized into streams of strict Text