Z-Data-0.7.2.0: Array, vector and text
Copyright(c) Dong Han 2017-2018
LicenseBSD
Maintainerwinterland1989@gmail.com
Stabilityexperimental
Portabilitynon-portable
Safe HaskellNone
LanguageHaskell2010

Z.Data.Text.UTF8Codec

Description

UTF-8 codecs and helpers.

Synopsis

Documentation

encodeCharLength :: Char -> Int Source #

Return a codepoint's encoded length in bytes

If the codepoint is invalid, we return 3(encoded bytes length of replacement char U+FFFD).

encodeChar :: PrimMonad m => MutablePrimArray (PrimState m) Word8 -> Int -> Char -> m Int Source #

Encode a Char into bytes, write replacementChar for invalid unicode codepoint.

This function assumed there're enough space for encoded bytes, and return the advanced index.

encodeChar# :: MutableByteArray# s -> Int# -> Char# -> State# s -> (# State# s, Int# #) Source #

The unboxed version of encodeChar.

This function is marked as NOINLINE to reduce code size, and stop messing up simplifier due to too much branches.

encodeCharModifiedUTF8 :: PrimMonad m => MutablePrimArray (PrimState m) Word8 -> Int -> Char -> m Int Source #

Encode a Char into bytes with non-standard UTF-8 encoding(Used in Data.CBytes).

\NUL is encoded as two bytes C0 80 , \xD800 ~ \xDFFF is encoded as a three bytes normal UTF-8 codepoint. This function assumed there're enough space for encoded bytes, and return the advanced index.

decodeChar :: PrimArray Word8 -> Int -> (# Char, Int #) Source #

Decode a Char from bytes

This function assumed all bytes are UTF-8 encoded, and the index param point to the beginning of a codepoint, the decoded character and the advancing offset are returned.

It's annoying to use unboxed tuple here but we really don't want allocation even if GHC can't optimize it away.

decodeChar# :: ByteArray# -> Int# -> (# Char#, Int# #) Source #

The unboxed version of decodeChar

This function is marked as NOINLINE to reduce code size, and stop messing up simplifier due to too much branches.

decodeCharLen :: PrimArray Word8 -> Int -> Int Source #

Decode a codepoint's length in bytes

This function assumed all bytes are UTF-8 encoded, and the index param point to the beginning of a codepoint.

decodeCharLen# :: ByteArray# -> Int# -> Int# Source #

The unboxed version of decodeCharLen

This function is marked as NOINLINE to reduce code size, and stop messing up simplifier due to too much branches.

decodeCharReverse :: PrimArray Word8 -> Int -> (# Char, Int #) Source #

Decode a Char from bytes in rerverse order.

This function assumed all bytes are UTF-8 encoded, and the index param point to the end of a codepoint, the decoded character and the backward advancing offset are returned.

decodeCharReverse# :: ByteArray# -> Int# -> (# Char#, Int# #) Source #

The unboxed version of decodeCharReverse

This function is marked as NOINLINE to reduce code size, and stop messing up simplifier due to too much branches.

decodeCharLenReverse :: PrimArray Word8 -> Int -> Int Source #

Decode a codepoint's length in bytes in reverse order.

This function assumed all bytes are UTF-8 encoded, and the index param point to the end of a codepoint.

decodeCharLenReverse# :: ByteArray# -> Int# -> Int# Source #

The unboxed version of decodeCharLenReverse

This function is marked as NOINLINE to reduce code size, and stop messing up simplifier due to too much branches.

copyChar :: Int -> MutablePrimArray s Word8 -> Int -> PrimArray Word8 -> Int -> ST s () Source #

Unrolled copy loop for copying a utf8-encoded codepoint from source array to target array.

copyChar' :: Int -> MutablePrimArray s Word8 -> Int -> MutablePrimArray s Word8 -> Int -> ST s () Source #

Unrolled copy loop for copying a utf8-encoded codepoint from source array to target array.

replacementChar :: Char Source #

xFFFD, which will be encoded as 0xEF 0xBF 0xBD 3 bytes.