compact-string-0.3.1: Fast, packed and strict strings with Unicode support, based on bytestrings.

Portabilityuntested
Stabilityexperimental
Maintainertwanvl@gmail.com

Data.CompactString.UTF8

Contents

Description

CompactString specialized to UTF-8.

This module can be used to reduce the need for type signatures, since in most cases only a single encoding is used.

Synopsis

The CompactString type

type CompactString = CompactString UTF8Source

CompactString specialized to UTF-8.

Introducing and eliminating CompactStrings

pack :: String -> CompactStringSource

O(n) Convert a String into a CompactString.

Basic interface

cons :: Char -> CompactString -> CompactStringSource

O(n) cons is analogous to (:) for lists, but of different complexity, as it requires a memcpy.

snoc :: CompactString -> Char -> CompactStringSource

O(n) Append a byte to the end of a CompactString

append :: CompactString -> CompactString -> CompactStringSource

O(n) Append two CompactStrings

head :: CompactString -> CharSource

O(1) Extract the first element of a CompactString, which must be non-empty. An exception will be thrown in the case of an empty CompactString.

last :: CompactString -> CharSource

O(1) Extract the last element of a ByteString, which must be finite and non-empty. An exception will be thrown in the case of an empty ByteString.

tail :: CompactString -> CompactStringSource

O(1) Extract the elements after the head of a CompactString, which must be non-empty. An exception will be thrown in the case of an empty CompactString.

init :: CompactString -> CompactStringSource

O(1) Return all the elements of a CompactString except the last one. An exception will be thrown in the case of an empty ByteString.

headView :: CompactString -> Maybe (Char, CompactString)Source

O(1) A view of the front of a CompactString.

 headView s = if null s then Nothing else Just (head s, tail s)

lastView :: CompactString -> Maybe (CompactString, Char)Source

O(1) A view of the back of a CompactString.

 lastView s = if null s then Nothing else Just (init s, last s)

null :: CompactString -> BoolSource

O(1) Test whether a CompactString is empty.

length :: CompactString -> IntSource

O(n) length returns the length of a CompactString as an Int.

Transforming CompactStrings

map :: (Char -> Char) -> CompactString -> CompactStringSource

O(n) map f xs is the CompactString obtained by applying f to each element of xs. This function is subject to array fusion.

intersperse :: Char -> CompactString -> CompactStringSource

O(n) The intersperse function takes a Char and a CompactString and `intersperses' that character between the elements of the CompactString. It is analogous to the intersperse function on Lists.

intercalate :: CompactString -> [CompactString] -> CompactStringSource

O(n) The intercalate function takes a CompactString and a list of CompactStrings and concatenates the list after interspersing the first argument between each element of the list.

transpose :: [CompactString] -> [CompactString]Source

The transpose function transposes the rows and columns of its CompactString argument.

Reducing CompactStrings (folds)

foldl :: (acc -> Char -> acc) -> acc -> CompactString -> accSource

foldl, applied to a binary operator, a starting value (typically the left-identity of the operator), and a CompactString, reduces the CompactString using the binary operator, from left to right. This function is subject to array fusion.

foldl' :: (acc -> Char -> acc) -> acc -> CompactString -> accSource

'foldl\'' is like foldl, but strict in the accumulator. Though actually foldl is also strict in the accumulator.

foldl1 :: (Char -> Char -> Char) -> CompactString -> CharSource

foldl1 is a variant of foldl that has no starting value argument, and thus must be applied to non-empty CompactString. This function is subject to array fusion. An exception will be thrown in the case of an empty CompactString.

foldl1' :: (Char -> Char -> Char) -> CompactString -> CharSource

'foldl1\'' is like foldl1, but strict in the accumulator. An exception will be thrown in the case of an empty CompactString.

foldr :: (Char -> acc -> acc) -> acc -> CompactString -> accSource

foldr, applied to a binary operator, a starting value (typically the right-identity of the operator), and a CompactString, reduces the CompactString using the binary operator, from right to left.

foldr' :: (Char -> acc -> acc) -> acc -> CompactString -> accSource

foldr, applied to a binary operator, a starting value (typically the right-identity of the operator), and a CompactString, reduces the CompactString using the binary operator, from right to left.

foldr1 :: (Char -> Char -> Char) -> CompactString -> CharSource

foldr1 is a variant of foldr that has no starting value argument, and thus must be applied to non-empty CompactStrings An exception will be thrown in the case of an empty CompactString.

foldr1' :: (Char -> Char -> Char) -> CompactString -> CharSource

'foldr1\'' is a variant of foldr1, but is strict in the accumulator. An exception will be thrown in the case of an empty CompactString.

Special folds

concat :: [CompactString] -> CompactStringSource

O(n) Concatenate a list of CompactStrings.

concatMap :: (Char -> CompactString) -> CompactString -> CompactStringSource

Map a function over a CompactString and concatenate the results

any :: (Char -> Bool) -> CompactString -> BoolSource

O(n) Applied to a predicate and a CompactString, any determines if any element of the CompactString satisfies the predicate.

all :: (Char -> Bool) -> CompactString -> BoolSource

O(n) Applied to a predicate and a CompactString, any determines if all elements of the CompactString satisfy the predicate.

maximum :: CompactString -> CharSource

O(n) maximum returns the maximum value from a CompactString An exception will be thrown in the case of an empty CompactString.

minimum :: CompactString -> CharSource

O(n) minimum returns the minimum value from a CompactString An exception will be thrown in the case of an empty CompactString.

Building CompactStrings

Scans

scanl :: (Char -> Char -> Char) -> Char -> CompactString -> CompactStringSource

scanl is similar to foldl, but returns a list of successive reduced values from the left. This function will fuse.

 scanl f z [x1, x2, ...] == [z, z `f` x1, (z `f` x1) `f` x2, ...]

Note that

 last (scanl f z xs) == foldl f z xs.

scanl1 :: (Char -> Char -> Char) -> CompactString -> CompactStringSource

scanl1 is a variant of scanl that has no starting value argument. This function will fuse.

 scanl1 f [x1, x2, ...] == [x1, x1 `f` x2, ...]

scanr :: (Char -> Char -> Char) -> Char -> CompactString -> CompactStringSource

scanr is the right-to-left dual of scanl.

scanr1 :: (Char -> Char -> Char) -> CompactString -> CompactStringSource

scanr1 is a variant of scanr that has no starting value argument.

Accumulating maps

mapAccumL :: (acc -> Char -> (acc, Char)) -> acc -> CompactString -> (acc, CompactString)Source

The mapAccumL function behaves like a combination of map and foldl; it applies a function to each element of a CompactString, passing an accumulating parameter from left to right, and returning a final value of this accumulator together with the new CompactString.

mapAccumR :: (acc -> Char -> (acc, Char)) -> acc -> CompactString -> (acc, CompactString)Source

The mapAccumR function behaves like a combination of map and foldr; it applies a function to each element of a CompactString, passing an accumulating parameter from right to left, and returning a final value of this accumulator together with the new CompactString.

mapIndexed :: (Int -> Char -> Char) -> CompactString -> CompactStringSource

O(n) map Char functions, provided with the index at each position.

Unfolding CompactStrings

replicate :: Int -> Char -> CompactStringSource

O(n) replicate n x is a CompactString of length n with x the value of every element. The following holds:

 replicate w c = unfoldr w (\u -> Just (u,u)) c

unfoldr :: (acc -> Maybe (Char, acc)) -> acc -> CompactStringSource

O(n), where n is the length of the result. The unfoldr function is analogous to the List 'unfoldr'. unfoldr builds a ByteString from a seed value. The function takes the element and returns Nothing if it is done producing the CompactString or returns Just (a,b), in which case, a is the next byte in the string, and b is the seed value for further production.

Examples:

    unfoldr (\x -> if x <= 5 then Just (x, x + 1) else Nothing) 0
 == pack [0, 1, 2, 3, 4, 5]

unfoldrN :: Int -> (acc -> Maybe (Char, acc)) -> acc -> (CompactString, Maybe acc)Source

O(n) Like unfoldr, unfoldrN builds a ByteString from a seed value. However, the length of the result is limited by the first argument to unfoldrN. This function is more efficient than unfoldr when the maximum length of the result is known.

The following equation relates unfoldrN and unfoldr:

 fst (unfoldrN n f s) == take n (unfoldr f s)

Substrings

Breaking strings

take :: Int -> CompactString -> CompactStringSource

O(n) take n, applied to a CompactString xs, returns the prefix of xs of length n, or xs itself if n > length xs.

drop :: Int -> CompactString -> CompactStringSource

O(n) drop n xs returns the suffix of xs after the first n elements, or empty if n > length xs.

splitAt :: Int -> CompactString -> (CompactString, CompactString)Source

O(n) splitAt n xs is equivalent to (take n xs, drop n xs).

takeWhile :: (Char -> Bool) -> CompactString -> CompactStringSource

takeWhile, applied to a predicate p and a CompactString xs, returns the longest prefix (possibly empty) of xs of elements that satisfy p.

dropWhile :: (Char -> Bool) -> CompactString -> CompactStringSource

dropWhile p xs returns the suffix remaining after takeWhile p xs.

span :: (Char -> Bool) -> CompactString -> (CompactString, CompactString)Source

span p xs breaks the ByteString into two segments. It is equivalent to (takeWhile p xs, dropWhile p xs)

spanEnd :: (Char -> Bool) -> CompactString -> (CompactString, CompactString)Source

spanEnd behaves like span but from the end of the CompactString

We have

 spanEnd (not.isSpace) "x y z" == ("x y ","z")

and

 spanEnd (not . isSpace) cs
    == 
 let (x,y) = span (not.isSpace) (reverse cs) in (reverse y, reverse x)

break :: (Char -> Bool) -> CompactString -> (CompactString, CompactString)Source

break p is equivalent to span (not . p).

breakEnd :: (Char -> Bool) -> CompactString -> (CompactString, CompactString)Source

breakEnd behaves like break but from the end of the CompactString

 breakEnd p == spanEnd (not.p)

group :: CompactString -> [CompactString]Source

The group function takes a CompactString and returns a list of CompactStrings such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,

 group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]

It is a special case of groupBy, which allows the programmer to supply their own equality test.

groupBy :: (Char -> Char -> Bool) -> CompactString -> [CompactString]Source

The groupBy function is the non-overloaded version of group.

inits :: CompactString -> [CompactString]Source

O(n) Return all initial segments of the given CompactString, shortest first.

tails :: CompactString -> [CompactString]Source

O(n) Return all final segments of the given CompactString, longest first.

Breaking into many substrings

split :: Char -> CompactString -> [CompactString]Source

O(n) Break a ByteString into pieces separated by the byte argument, consuming the delimiter. I.e.

 split '\n' "a\nb\nd\ne" == ["a","b","d","e"]
 split 'a'  "aXaXaXa"    == ["","X","X","X",""]
 split 'x'  "x"          == ["",""]

and

 intercalate [c] . split c == id
 split == splitWith . (==)

As for all splitting functions in this library, this function does not copy the substrings, it just constructs new CompactString that are slices of the original.

splitWith :: (Char -> Bool) -> CompactString -> [CompactString]Source

O(n) Splits a CompactString into components delimited by separators, where the predicate returns True for a separator element. The resulting components do not contain the separators. Two adjacent separators result in an empty component in the output. eg.

 splitWith (=='a') "aabbaca" == ["","","bb","c",""]
 splitWith (=='a') []        == []

Breaking into lines and words

lines :: CompactString -> [CompactString]Source

lines breaks a CompactString up into a list of CompactStrings at newline Chars. The resulting strings do not contain newlines.

words :: CompactString -> [CompactString]Source

words breaks a ByteString up into a list of words, which were delimited by Chars representing white space. And

 words = filter (not . null) . splitWith isSpace

unlines :: [CompactString] -> CompactStringSource

unlines is an inverse operation to lines. It joins lines, after appending a terminating newline to each.

unwords :: [CompactString] -> CompactStringSource

The unwords function is analogous to the unlines function, on words.

Predicates

isPrefixOf :: CompactString -> CompactString -> BoolSource

O(n) The isPrefixOf function takes two CompactString and returns True iff the first is a prefix of the second.

isSuffixOf :: CompactString -> CompactString -> BoolSource

O(n) The isSuffixOf function takes two CompactString and returns True iff the first is a suffix of the second.

The following holds:

 isSuffixOf x y == reverse x `isPrefixOf` reverse y

isInfixOfSource

Arguments

:: CompactString

String to search for.

-> CompactString

String to search in.

-> Bool 

Check whether one string is a substring of another. isInfixOf p s is equivalent to not (null (findSubstrings p s)).

Search for arbitrary substrings

findSubstringSource

Arguments

:: CompactString

String to search for.

-> CompactString

String to seach in.

-> Maybe Int 

Get the first index of a substring in another string, or Nothing if the string is not found. findSubstring p s is equivalent to listToMaybe (findSubstrings p s).

findSubstringsSource

Arguments

:: CompactString

String to search for.

-> CompactString

String to seach in.

-> [Int] 

Find the indexes of all (possibly overlapping) occurances of a substring in a string. This function uses the Knuth-Morris-Pratt string matching algorithm.

Searching CompactStrings

Searching by equality

elem :: Char -> CompactString -> BoolSource

O(n) elem is the CompactString membership predicate.

notElem :: Char -> CompactString -> BoolSource

O(n) notElem is the inverse of elem

Searching with a predicate

find :: (Char -> Bool) -> CompactString -> Maybe CharSource

O(n) The find function takes a predicate and a CompactString, and returns the first element in matching the predicate, or Nothing if there is no such element.

 find f p = case findIndex f p of Just n -> Just (p `index` n) ; _ -> Nothing

filter :: (Char -> Bool) -> CompactString -> CompactStringSource

O(n) filter, applied to a predicate and a CompactString, returns a CompactString containing those characters that satisfy the predicate. This function is subject to array fusion.

partition :: (Char -> Bool) -> CompactString -> (CompactString, CompactString)Source

O(n) partition, applied to a predicate and a CompactString, returns a pair of CompactStrings. The first containing those characters that satisfy the predicate, the second containg those that don't.

Indexing CompactStrings

index :: CompactString -> Int -> CharSource

O(n) CompactString index (subscript) operator, starting from 0.

elemIndex :: Char -> CompactString -> Maybe IntSource

O(n) The elemIndex function returns the index of the first element in the given ByteString which is equal to the query element, or Nothing if there is no such element.

elemIndices :: Char -> CompactString -> [Int]Source

O(n) The elemIndices function extends elemIndex, by returning the indices of all elements equal to the query element, in ascending order.

elemIndexEnd :: Char -> CompactString -> Maybe IntSource

O(n) The elemIndexEnd function returns the last index of the element in the given CompactString which is equal to the query element, or Nothing if there is no such element. The following holds:

 elemIndexEnd c xs == 
 (-) (length xs - 1) `fmap` elemIndex c (reverse xs)

findIndex :: (Char -> Bool) -> CompactString -> Maybe IntSource

The findIndex function takes a predicate and a CompactString and returns the index of the first element in the CompactString satisfying the predicate.

findIndexEnd :: (Char -> Bool) -> CompactString -> Maybe IntSource

O(n) The findIndexEnd function returns the last index of the element in the given CompactString which satisfies the predicate, or Nothing if there is no such element. The following holds:

 findIndexEnd c xs == 
 (-) (length xs - 1) `fmap` findIndex c (reverse xs)

findIndices :: (Char -> Bool) -> CompactString -> [Int]Source

The findIndices function extends findIndex, by returning the indices of all elements satisfying the predicate, in ascending order.

count :: Char -> CompactString -> IntSource

count returns the number of times its argument appears in the CompactString

 count c = length . elemIndices c

Zipping and unzipping CompactStrings

zip :: CompactString -> CompactString -> [(Char, Char)]Source

O(n) zip takes two ByteStrings and returns a list of corresponding pairs of bytes. If one input ByteString is short, excess elements of the longer ByteString are discarded. This is equivalent to a pair of unpack operations.

zipWith :: (Char -> Char -> b) -> CompactString -> CompactString -> [b]Source

zipWith generalises zip by zipping with the function given as the first argument, instead of a tupling function. For example, zipWith (+) is applied to two ByteStrings to produce the list of corresponding sums.

zipWith' :: (Char -> Char -> Char) -> CompactString -> CompactString -> CompactStringSource

A specialised version of zipWith for the common case of a simultaneous map over two CompactStrings, to build a 3rd. Rewrite rules are used to automatically covert zipWith into zipWith' when a pack is performed on the result of zipWith, but we also export it for convenience.

unzip :: [(Char, Char)] -> (CompactString, CompactString)Source

O(n) unzip transforms a list of pairs of bytes into a pair of CompactStrings. Note that this performs two pack operations.

Ordered CompactStrings

sort :: CompactString -> CompactStringSource

O(n log n) Sort a CompactString

Encoding

toByteString :: CompactString -> ByteStringSource

Convert a CompactString to a ByteString

fromByteString :: MonadPlus m => ByteString -> m CompactStringSource

Convert a ByteString to a CompactString. Fails if the ByteString is not a valid encoded string.

fromByteString_ :: ByteString -> CompactStringSource

Convert a ByteString to a CompactString. Raises an error if the ByteString is not a valid encoded string.

validate :: MonadPlus m => CompactString -> m CompactStringSource

Validates a CompactString. If the string is invalid, fails, otherwise returns the input.

validate_ :: CompactString -> CompactStringSource

Validates a CompactString. If the string is invalid, throws an error, otherwise returns the input.

Encoding conversion

encode :: (Encoding e, MonadPlus m) => e -> CompactString -> m ByteStringSource

Encode a CompactString to a ByteString using the given encoding.

 encode e = liftM toByteString . recode

But it might be faster for some combinations of encodings.

Fails if the string is cannot be encoded in the target encoding.

encode_ :: Encoding e => e -> CompactString -> ByteStringSource

Encode a CompactString to a ByteString using the given encoding.

 encode_ e = toByteString . recode

But it might be faster for some combinations of encodings.

Raises an error if the string is cannot be encoded in the target encoding.

decode :: (Encoding e, MonadPlus m) => e -> ByteString -> m CompactStringSource

Decode a ByteString to a CompactString using the given encoding.

 decode e = recode =<< fromByteString

but it might be faster for some combinations of encodings.

Fails if the ByteString is not a valid encoded string

decode_ :: Encoding e => e -> ByteString -> CompactStringSource

Decode a ByteString to a CompactString using the given encoding.

 decode_ e = recode_ . fromByteString_

but it might be faster for some combinations of encodings.

Raises an error if the ByteString is not a valid encoded string

encodeBOM :: (Encoding e, MonadPlus m) => e -> CompactString -> m ByteStringSource

Encode a CompactString using the given encoding, and add a Byte Order Mark. Byte Order Marks are common on Windows, but not on other platforms.

Fails if the string is cannot be encoded in the target encoding.

encodeBOM_ :: Encoding e => e -> CompactString -> ByteStringSource

Encode a CompactString using the given encoding, and add a Byte Order Mark. Byte Order Marks are common on Windows, but not on other platforms.

Raises an error if the string is cannot be encoded in the target encoding.

decodeBOM :: MonadPlus m => ByteString -> m CompactStringSource

Decode a ByteString into a CompactString, by investigating the Byte Order Mark. If there is no BOM assumes UTF-8. Fails if the input is not a valid encoded string

For portability, this function should be prefered over decode UTF8 when reading files.

decodeBOM_ :: ByteString -> CompactStringSource

Decode a ByteString into a CompactString, by investigating the Byte Order Mark. If there is no BOM assumes UTF-8. Raises an error if the input is not a valid encoded string

For portability, this function should be prefered over decode UTF8 when reading files.

I/O with CompactStrings

Standard input and output

getLine :: IO CompactStringSource

Read a line from stdin.

getContents :: IO CompactStringSource

getContents. Equivalent to hGetContents stdin

Input is assumed to be in UTF-8, this may not be appropriate.

putStr :: CompactString -> IO ()Source

Write a CompactString to stdout.

Output is written in UTF-8, this may not be appropriate.

putStrLn :: CompactString -> IO ()Source

Write a CompactString to stdout, appending a newline character.

Output is written in UTF-8, this may not be appropriate.

interact :: (CompactString -> CompactString) -> IO ()Source

The interact function takes a function of type CompactString -> CompactString as its argument. The entire input from the standard input device is passed to this function as its argument, and the resulting string is output on the standard output device. It's great for writing one line programs!

Files

readFile :: FilePath -> IO CompactStringSource

Read an entire file strictly into a CompactString. This is far more efficient than reading the characters into a String and then using pack. Files are read using 'text mode' on Windows.

Files are assumed to be in UTF-8.

readFile' :: FilePath -> IO CompactStringSource

Read an entire file strictly into a CompactString. This is far more efficient than reading the characters into a String and then using pack. Files are read using 'text mode' on Windows.

The encoding of the file is determined based on a Byte Order Mark, see decodeBOM.

writeFile :: FilePath -> CompactString -> IO ()Source

Write a CompactString to a file.

Files are written using UTF-8.

writeFile' :: FilePath -> CompactString -> IO ()Source

Write a CompactString to a file.

Files are written using UTF-8. A Byte Order Mark is also written.

appendFile :: FilePath -> CompactString -> IO ()Source

Append a CompactString to a file.

Files are written using UTF-8.

appendFile' :: FilePath -> CompactString -> IO ()Source

Append a CompactString to a file.

The encoding of the file is determined based on a Byte Order Mark. If the file is empty, it is written using UTF-8 with a Byte Order Mark. If the encoding can not be determined the file is assumed to be UTF-8.

I/O with Handles

hGetLine :: Handle -> IO CompactStringSource

Read a line from a handle

hGetContents :: Handle -> IO CompactStringSource

Read entire handle contents into a CompactString.

The handle is interpreted as UTF-8.

hGetContents' :: Handle -> IO CompactStringSource

Read entire handle contents into a CompactString.

The encoding is determined based on a Byte Order Mark, see decodeBOM.

hGet :: Handle -> Int -> IO CompactStringSource

Read a CompactString directly from the specified Handle.

The handle is interpreted as UTF-8.

hGetNonBlocking :: Handle -> Int -> IO CompactStringSource

hGetNonBlocking is identical to hGet, except that it will never block waiting for data to become available, instead it returns only whatever data is available.

The handle is interpreted as UTF-8.

hPut :: Handle -> CompactString -> IO ()Source

Outputs a CompactString to the specified Handle.

Output is written in UTF-8.

hPutStr :: Handle -> CompactString -> IO ()Source

A synonym for hPut, for compatibility

hPutStrLn :: Handle -> CompactString -> IO ()Source

Write a CompactString to a handle, appending a newline byte

Output is written in UTF-8.