text-short-0.1.1: Memory-efficient representation of Unicode text strings

Copyright© Herbert Valerio Riedel 2017
LicenseBSD3
Maintainerhvr@gnu.org
Stabilitystable
Safe HaskellTrustworthy
LanguageHaskell2010

Data.Text.Short

Contents

Description

Memory-efficient representation of Unicode text strings.

Synopsis

The ShortText type

data ShortText Source #

A compact representation of Unicode strings.

This type relates to Text as ShortByteString relates to ByteString by providing a more compact type. Please consult the documentation of Data.ByteString.Short for more information.

Currently, a boxed unshared Text has a memory footprint of 6 words (i.e. 48 bytes on 64-bit systems) plus 2 or 4 bytes per code-point (due to the internal UTF-16 representation). Each Text value which can share its payload with another Text requires only 4 words additionally. Unlike ByteString, Text use unpinned memory.

In comparison, the footprint of a boxed ShortText is only 4 words (i.e. 32 bytes on 64-bit systems) plus 123/4 bytes per code-point (due to the internal UTF-8 representation). It can be shown that for realistic data UTF-16 has a space overhead of 50% over UTF-8.

Instances

Eq ShortText Source # 
Ord ShortText Source # 
Read ShortText Source # 
Show ShortText Source # 
IsString ShortText Source #

Behaviour for [U+D800 .. U+DFFF] matches the IsString instance for Text

Semigroup ShortText Source # 
Monoid ShortText Source # 
Binary ShortText Source #

The Binary encoding matches the one for Text

NFData ShortText Source # 

Methods

rnf :: ShortText -> () #

Hashable ShortText Source # 

Basic operations

null :: ShortText -> Bool Source #

\(\mathcal{O}(1)\) Test whether a ShortText is empty.

length :: ShortText -> Int Source #

\(\mathcal{O}(n)\) Count the number of Unicode code-points in a ShortText.

isAscii :: ShortText -> Bool Source #

\(\mathcal{O}(n)\) Test whether ShortText contains only ASCII code-points (i.e. only U+0000 through U+007F).

Conversions

String

fromString :: String -> ShortText Source #

\(\mathcal{O}(n)\) Construct/pack from String

Note: This function is total because it replaces the (invalid) code-points U+D800 through U+DFFF with the replacement character U+FFFD.

toString :: ShortText -> String Source #

\(\mathcal{O}(n)\) Convert to String

Text

fromText :: Text -> ShortText Source #

\(\mathcal{O}(n)\) Construct ShortText from Text

This is currently not \(\mathcal{O}(1)\) because currently Text uses UTF-16 as its internal representation. In the event that Text will change its internal representation to UTF-8 this operation will become \(\mathcal{O}(1)\).

toText :: ShortText -> Text Source #

\(\mathcal{O}(n)\) Convert to Text

This is currently not \(\mathcal{O}(1)\) because currently Text uses UTF-16 as its internal representation. In the event that Text will change its internal representation to UTF-8 this operation will become \(\mathcal{O}(1)\).

ByteString

fromShortByteString :: ShortByteString -> Maybe ShortText Source #

\(\mathcal{O}(n)\) Construct ShortText from UTF-8 encoded ShortByteString

This operation doesn't copy the input ShortByteString but it cannot be \(\mathcal{O}(1)\) because we need to validate the UTF-8 encoding.

Returns Nothing in case of invalid UTF-8 encoding.

toShortByteString :: ShortText -> ShortByteString Source #

\(\mathcal{O}(0)\) Converts to UTF-8 encoded ShortByteString

This operation has effectively no overhead, as it's currently merely a newtype-cast.

fromByteString :: ByteString -> Maybe ShortText Source #

\(\mathcal{O}(n)\) Construct ShortText from UTF-8 encoded ByteString

Returns Nothing in case of invalid UTF-8 encoding.

toByteString :: ShortText -> ByteString Source #

\(\mathcal{O}(n)\) Converts to UTF-8 encoded ByteString

toBuilder :: ShortText -> Builder Source #

Construct a Builder that encodes ShortText as UTF-8.