Support for using
Text data with native code via the Haskell
foreign function interface.
- data I16
- fromPtr :: Ptr Word16 -> I16 -> IO Text
- useAsPtr :: Text -> (Ptr Word16 -> I16 -> IO a) -> IO a
- asForeignPtr :: Text -> IO (ForeignPtr Word16, I16)
- peekCStringLen :: CStringLen -> IO Text
- withCStringLen :: Text -> (CStringLen -> IO a) -> IO a
- lengthWord16 :: Text -> Int
- unsafeCopyToPtr :: Text -> Ptr Word16 -> IO ()
- dropWord16 :: I16 -> Text -> Text
- takeWord16 :: I16 -> Text -> Text
Interoperability with native code
Text type is implemented using arrays that are not guaranteed
to have a fixed address in the Haskell heap. All communication with
native code must thus occur by copying data back and forth.
Text type's internal representation is UTF-16, using the
platform's native endianness. This makes copied data suitable for
use with native libraries that use a similar representation, such
as ICU. To interoperate with native libraries that use different
internal representations, such as UTF-8 or UTF-32, consider using
the functions in the
A type representing a number of UTF-16 code units.
Safe conversion functions
O(n) Perform an action on a temporary, mutable copy of a
Text. The copy is freed as soon as the action returns.
Encoding as UTF-8
O(n) Decode a C string with explicit length, which is assumed
to have been encoded as UTF-8. If decoding fails, a
UnicodeException is thrown.
Text into a C string encoded as UTF-8 in temporary
storage, with explicit length information. The encoded string may
contain NUL bytes, and is not followed by a trailing NUL byte.
The temporary storage is freed when the subcomputation terminates (either normally or via an exception), so the pointer to the temporary storage must not be used after this function returns.
Unsafe conversion code
O(1) Return the length of a
Text in units of
is useful for sizing a target array appropriately before using
Foreign functions that use UTF-16 internally may return indices in
Word16 instead of characters. These functions may
safely be used with such indices, as they will adjust offsets if
necessary to preserve the validity of a Unicode string.