text-icu-0.6.3.2: Bindings to the ICU library

PortabilityGHC
Stabilityexperimental
Maintainerbos@serpentine.com

Data.Text.ICU.Collate

Contents

Description

String collation functions for Unicode, implemented as bindings to the International Components for Unicode (ICU) libraries.

Synopsis

Unicode collation API

 

data MCollator Source

String collator type.

Instances

data Attribute Source

Constructors

French Bool

Direction of secondary weights, used in French. True, results in secondary weights being considered backwards, while False treats secondary weights in the order in which they appear.

AlternateHandling AlternateHandling

For handling variable elements. NonIgnorable is default.

CaseFirst (Maybe CaseFirst)

Control the ordering of upper and lower case letters. Nothing (the default) orders upper and lower case letters in accordance to their tertiary weights.

CaseLevel Bool

Controls whether an extra case level (positioned before the third level) is generated or not. When False (default), case level is not generated; when True, the case level is generated. Contents of the case level are affected by the value of the CaseFirst attribute. A simple way to ignore accent differences in a string is to set the strength to Primary and enable case level.

NormalizationMode Bool

Controls whether the normalization check and necessary normalizations are performed. When False (default) no normalization check is performed. The correctness of the result is guaranteed only if the input data is in so-called FCD form (see users manual for more info). When True, an incremental check is performed to see whether the input data is in FCD form. If the data is not in FCD form, incremental NFD normalization is performed.

Strength Strength 
HiraganaQuaternaryMode Bool

When turned on, this attribute positions Hiragana before all non-ignorables on quaternary level. This is a sneaky way to produce JIS sort order.

Numeric Bool

When enabled, this attribute generates a collation key for the numeric value of substrings of digits. This is a way to get '100' to sort after '2'.

data AlternateHandling Source

Control the handling of variable weight elements.

Constructors

NonIgnorable

Treat all codepoints with non-ignorable primary weights in the same way.

Shifted

Cause codepoints with primary weights that are equal to or below the variable top value to be ignored on primary level and moved to the quaternary level.

data CaseFirst Source

Control the ordering of upper and lower case letters.

Constructors

UpperFirst

Force upper case letters to sort before lower case.

LowerFirst

Force lower case letters to sort before upper case.

data Strength Source

The strength attribute. The usual strength for most locales (except Japanese) is tertiary. Quaternary strength is useful when combined with shifted setting for alternate handling attribute and for JIS x 4061 collation, when it is used to distinguish between Katakana and Hiragana (this is achieved by setting HiraganaQuaternaryMode mode to True). Otherwise, quaternary level is affected only by the number of non ignorable code points in the string. Identical strength is rarely useful, as it amounts to codepoints of the NFD form of the string.

Functions

openSource

Arguments

:: LocaleName

The locale containing the required collation rules.

-> IO MCollator 

Open a Collator for comparing strings.

collate :: MCollator -> Text -> Text -> IO OrderingSource

Compare two strings.

collateIter :: MCollator -> CharIterator -> CharIterator -> IO OrderingSource

Compare two CharIterators.

If either iterator was constructed from a ByteString, it does not need to be copied or converted internally, so this function can be quite cheap.

Utility functions

equals :: MCollator -> MCollator -> IO BoolSource

MCollators are considered equal if they will sort strings identically. This means that both the current attributes and the rules must be equivalent.

getAttribute :: MCollator -> Attribute -> IO AttributeSource

Get the value of an MCollator attribute.

It is safe to provide a dummy argument to an Attribute constructor when using this function, so the following will work:

 getAttribute mcol (NormalizationMode undefined)

setAttribute :: MCollator -> Attribute -> IO ()Source

Set the value of an MCollator attribute.

sortKey :: MCollator -> Text -> IO ByteStringSource

Create a key for sorting the Text using the given Collator. The result of comparing two ByteStrings that have been transformed with sortKey will be the same as the result of collate on the two untransformed Texts.

freeze :: MCollator -> IO CollatorSource

Make a safe copy of a mutable MCollator for use in pure code. Subsequent changes to the MCollator will not affect the state of the returned Collator.