License | BSD-style |
---|---|
Maintainer | Vincent Hanquez <vincent@snarc.org> |
Stability | experimental |
Portability | portable |
Safe Haskell | None |
Language | Haskell2010 |
Opaque packed String encoded in UTF8.
The type is an instance of IsString and IsList, which allow OverloadedStrings
for string literal, and fromList
to convert a [Char] (Prelude String) to a packed
representation
{-# LANGUAGE OverloadedStrings #-} s = "Hello World" :: String
s = fromList ("Hello World" :: Prelude.String) :: String
Each unicode code point is represented by a variable encoding of 1 to 4 bytes,
For more information about UTF8: https://en.wikipedia.org/wiki/UTF-8
- data String
- data Encoding
- = ASCII7
- | UTF8
- | UTF16
- | UTF32
- | ISO_8859_1
- fromBytes :: Encoding -> UArray Word8 -> (String, Maybe ValidationFailure, UArray Word8)
- fromBytesLenient :: UArray Word8 -> (String, UArray Word8)
- fromBytesUnsafe :: UArray Word8 -> String
- toBytes :: Encoding -> String -> UArray Word8
- data ValidationFailure
- lines :: String -> [String]
- words :: String -> [String]
- upper :: String -> String
- lower :: String -> String
- replace :: String -> String -> String -> String
- indices :: String -> String -> [Offset8]
- toBase64 :: String -> String
- toBase64URL :: Bool -> String -> String
- toBase64OpenBSD :: String -> String
- breakLine :: String -> Either Bool (String, String)
Documentation
Opaque packed array of characters in the UTF8 encoding
IsList String Source # | |
Eq String Source # | |
Data String Source # | |
Ord String Source # | |
Show String Source # | |
IsString String Source # | |
Monoid String Source # | |
NormalForm String Source # | |
Copy String Source # | |
Buildable String Source # | |
Collection String Source # | |
IndexedCollection String Source # | |
InnerFunctor String Source # | |
Sequential String Source # | |
Zippable String Source # | |
ParserSource String Source # | |
Hashable String Source # | |
Arbitrary String Source # | |
IsProperty (String, Bool) Source # | |
type Item String Source # | |
type Element String Source # | |
type Mutable String Source # | |
type Step String Source # | |
type Chunk String Source # | |
Various String Encoding that can be use to convert to and from bytes
fromBytes :: Encoding -> UArray Word8 -> (String, Maybe ValidationFailure, UArray Word8) Source #
Convert a ByteArray to a string assuming a specific encoding.
It returns a 3-tuple of:
- The string that has been succesfully converted without any error
- An optional validation error
- The remaining buffer that hasn't been processed (either as a result of an error, or because the encoded sequence is not fully available)
Considering a stream of data that is fetched chunk by chunk, it's valid to assume that some sequence might fall in a chunk boundary. When converting chunks, if the error is Nothing and the remaining buffer is not empty, then this buffer need to be prepended to the next chunk
fromBytesLenient :: UArray Word8 -> (String, UArray Word8) Source #
Convert a UTF8 array of bytes to a String.
If there's any error in the stream, it will automatically insert replacement bytes to replace invalid sequences.
In the case of sequence that fall in the middle of 2 chunks, the remaining buffer is supposed to be preprended to the next chunk, and resume the parsing.
fromBytesUnsafe :: UArray Word8 -> String Source #
Convert a Byte Array representing UTF8 data directly to a string without checking for UTF8 validity
If the input contains invalid sequences, it will trigger runtime async errors when processing data.
In doubt, use fromBytes
toBytes :: Encoding -> String -> UArray Word8 Source #
Convert a String to a bytearray in a specific encoding
if the encoding is UTF8, the underlying buffer is returned without extra allocation or any processing
In any other encoding, some allocation and processing are done to convert.
data ValidationFailure Source #
Possible failure related to validating bytes of UTF8 sequences.
lines :: String -> [String] Source #
Split lines in a string using newline as separation.
Note that carriage return preceding a newline are also strip for maximum compatibility between Windows and Unix system.
upper :: String -> String Source #
Convert a String
to the upper-case equivalent.
Does not properly support multicharacter Unicode conversions.
lower :: String -> String Source #
Convert a String
to the upper-case equivalent.
Does not properly support multicharacter Unicode conversions.
replace :: String -> String -> String -> String Source #
Replace all the occurrencies of needle
with replacement
in
the haystack
string.
toBase64URL :: Bool -> String -> String Source #
Transform string src
to URL-safe base64 binary representation.
The result will be either padded or unpadded, depending on the boolean
padded
argument.
toBase64OpenBSD :: String -> String Source #
Transform string src
to OpenBSD base64 binary representation.
breakLine :: String -> Either Bool (String, String) Source #
Same as break but cut on a line feed with an optional carriage return.
This is the same operation as 'breakElem LF' dropping the last character of the string if it's a CR.
Also for efficiency reason (streaming), it returns if the last character was a CR character.