UTF8 encoded unicode characters can be parsed both forwards and backwards,
since the start of each Char is clearly marked. This Monoid accumulates
information about the characters represented and reduces that information
using a CharReducer, which is just a Reducer Monoid that knows what
it wants to do about an invalidChar -- a string of Word8 values that
don't form a valid UTF8 character.
As this monoid parses chars it just feeds them upstream to the underlying
CharReducer. Efficient left-to-right and right-to-left traversals are
supplied so that a lazy ByteString can be parsed efficiently by
chunking it into strict chunks, and batching the traversals over each
before stitching the edges together.
Because this needs to be a Monoid and should return the exact same result
regardless of forward or backwards parsing, it chooses to parse only
canonical UTF8 unlike most Haskell UTF8 parsers, which will blissfully
accept illegal alternative long encodings of a character.
This actually fixes a potential class of security issues in some scenarios:
NB: Due to naive use of a list to track the tail of an unfinished character
this may exhibit O(n^2) behavior parsing backwards along an invalid sequence
of a large number of bytes that all claim to be in the tail of a character.