Portability | GHC |
---|---|
Stability | experimental |
Maintainer | bos@serpentine.com |
Safe Haskell | None |
String collation functions for Unicode, implemented as bindings to the International Components for Unicode (ICU) libraries.
- data MCollator
- data Attribute
- data AlternateHandling
- = NonIgnorable
- | Shifted
- data CaseFirst
- = UpperFirst
- | LowerFirst
- data Strength
- = Primary
- | Secondary
- | Tertiary
- | Quaternary
- | Identical
- open :: LocaleName -> IO MCollator
- collate :: MCollator -> Text -> Text -> IO Ordering
- collateIter :: MCollator -> CharIterator -> CharIterator -> IO Ordering
- equals :: MCollator -> MCollator -> IO Bool
- getAttribute :: MCollator -> Attribute -> IO Attribute
- setAttribute :: MCollator -> Attribute -> IO ()
- sortKey :: MCollator -> Text -> IO ByteString
- clone :: MCollator -> IO MCollator
- freeze :: MCollator -> IO Collator
Unicode collation API
French Bool | Direction of secondary weights, used in French. |
AlternateHandling AlternateHandling | For handling variable elements. |
CaseFirst (Maybe CaseFirst) | Control the ordering of upper and lower case letters.
|
CaseLevel Bool | Controls whether an extra case level (positioned
before the third level) is generated or not. When
|
NormalizationMode Bool | Controls whether the normalization check and necessary
normalizations are performed. When |
Strength Strength | |
HiraganaQuaternaryMode Bool | When turned on, this attribute positions Hiragana before all non-ignorables on quaternary level. This is a sneaky way to produce JIS sort order. |
Numeric Bool | When enabled, this attribute generates a collation key for the numeric value of substrings of digits. This is a way to get '100' to sort after '2'. |
data AlternateHandling Source
Control the handling of variable weight elements.
NonIgnorable | Treat all codepoints with non-ignorable primary weights in the same way. |
Shifted | Cause codepoints with primary weights that are equal to or below the variable top value to be ignored on primary level and moved to the quaternary level. |
Control the ordering of upper and lower case letters.
UpperFirst | Force upper case letters to sort before lower case. |
LowerFirst | Force lower case letters to sort before upper case. |
The strength attribute. The usual strength for most locales (except
Japanese) is tertiary. Quaternary strength is useful when combined with
shifted setting for alternate handling attribute and for JIS x 4061
collation, when it is used to distinguish between Katakana and Hiragana
(this is achieved by setting HiraganaQuaternaryMode
mode to
True
). Otherwise, quaternary level is affected only by the number of
non ignorable code points in the string. Identical strength is rarely
useful, as it amounts to codepoints of the NFD
form of the string.
Functions
:: LocaleName | The locale containing the required collation rules. |
-> IO MCollator |
Open a Collator
for comparing strings.
collateIter :: MCollator -> CharIterator -> CharIterator -> IO OrderingSource
Compare two CharIterator
s.
If either iterator was constructed from a ByteString
, it does not need
to be copied or converted internally, so this function can be quite
cheap.
Utility functions
equals :: MCollator -> MCollator -> IO BoolSource
MCollator
s are considered equal if they will sort strings
identically. This means that both the current attributes and the rules
must be equivalent.
sortKey :: MCollator -> Text -> IO ByteStringSource
Create a key for sorting the Text
using the given Collator
.
The result of comparing two ByteString
s that have been
transformed with sortKey
will be the same as the result of
collate
on the two untransformed Text
s.