Data.Char.Properties.Misc

Synopsis

Documentation

getCombiningClass :: Char -> Word8 Source

getDecimalDigit :: Char -> Maybe Word8 Source

getNumber :: Char -> Maybe Rational Source

Space characters and those format control characters (such as TAB, CR and LF) which should be treated by programming languages as "white space" for the purpose of parsing elements. Note: ZERO WIDTH SPACE and ZERO WIDTH NO-BREAK SPACE are not included, since their functions are restricted to line-break control. Their names are unfortunately misleading in this respect. Note: There are other senses of "whitespace" that encompass a different set of characters.

isBidiControl :: Char -> Bool Source

Those format control characters which have specific functions in the Bidirectional Algorithm.

isJoinControl :: Char -> Bool Source

Those format control characters which have specific functions for control of cursive joining and ligation.

isDash :: Char -> Bool Source

Those punctuation characters explicitly called out as dashes in the Unicode Standard, plus compatibility equivalents to those. Most of these have the Pd General Category, but some have the Sm General Category because of their use in mathematics.

isHyphen :: Char -> Bool Source

Those dashes used to mark connections between pieces of words, plus the Katakana middle dot. The Katakana middle dot functions like a hyphen, but is shaped like a dot rather than a dash.

isQuotationMark :: Char -> Bool Source

Those punctuation characters that function as quotation marks.

isTerminalPunctuation :: Char -> Bool Source

Those punctuation characters that generally mark the end of textual units.

isOtherMath :: Char -> Bool Source

Used in deriving the Math property.

isHexDigit :: Char -> Bool Source

Characters commonly used for the representation of hexadecimal numbers, plus their compatibility equivalents.

isASCIIHexDigit :: Char -> Bool Source

ASCII characters commonly used for the representation of hexadecimal numbers.

isOtherAlphabetic :: Char -> Bool Source

Used in deriving the Alphabetic property.

isIdeographic :: Char -> Bool Source

Characters considered to be CJKV (Chinese, Japanese, Korean, and Vietnamese) ideographs.

isDiacritic :: Char -> Bool Source

Characters that linguistically modify the meaning of another character to which they apply. Some diacritics are not combining characters, and some combining characters are not diacritics.

isExtender :: Char -> Bool Source

Characters whose principal function is to extend the value or shape of a preceding alphabetic character. Typical of these are length and iteration marks.

isOtherLowercase :: Char -> Bool Source

Used in deriving the Lowercase property.

isOtherUppercase :: Char -> Bool Source

Used in deriving the Uppercase property.

isNoncharacterCodePoint :: Char -> Bool Source

Code points that are explicitly defined as illegal for the encoding of characters. See Unicode 3.1 for more information.

isOtherGraphemeExtend :: Char -> Bool Source

Used in deriving the Grapheme_Extend property.

isGraphemeLink :: Char -> Bool Source

Used in determining default grapheme cluster boundaries. For more information, see UTR #29: Text Boundaries (in proposed draft status at publication of Unicode 3.2).

isIDSBinaryOperator :: Char -> Bool Source

For a machine-readable list of Ideographic Description Sequences. For more information, see Unicode 3.2.

isIDSTrinaryOperator :: Char -> Bool Source

For a machine-readable list of Ideographic Description Sequences. For more information, see Unicode 3.2.

isRadical :: Char -> Bool Source

For a machine-readable list of Ideographic Description Sequences. For more information, see Unicode 3.2.

isUnifiedIdeograph :: Char -> Bool Source

For a machine-readable list of Ideographic Description Sequences. For more information, see Unicode 3.2.

isOtherDefaultIgnorableCodePoint :: Char -> Bool Source

Used in deriving the Default_Ignorable_Code_Point property.

isDeprecated :: Char -> Bool Source

For a machine-readable list of deprecated characters. No characters will ever be removed from the standard, but the usage of deprecated characters is strongly discouraged. For more information, see Unicode 3.2.

isSoftDotted :: Char -> Bool Source

Characters with a "soft dot", like i or j. An accent placed on these characters causes the dot to disappear. An explicit dot above can be added where required, such as in Lithuanian. For more information, see Unicode 3.0, Chapter 7, Diacritics on i and j

isLogicalOrderException :: Char -> Bool Source

There are a small number of characters that do not use logical order. These characters require special handling in most processing. For more information, see Unicode 3.2.

isCGJ :: Char -> Bool Source

Combining Grapheme Joiner character.

isMath :: Char -> Bool Source

Characters with the Math property. For more information, see Chapter 4, Character Properties.

Math = Sm + Other_Math.

isAlphabetic :: Char -> Bool Source

Characters with the Alphabetic property. For more information, see Chapter 4, Character Properties.

Alphabetic = Lu+Ll+Lt+Lm+Lo+ Other_Alphabetic.

isLowercase :: Char -> Bool Source

Characters with the Lowercase property. For more information, see Chapter 4, Character Properties and UAX #21: Case Mappings.

Lowercase = Ll + Other_Lowercase.

isUppercase :: Char -> Bool Source

Characters with the Uppercase property. For more information, see Chapter 4, Character Properties and UAX #21: Case Mappings.

Uppercase = Lu + Other_Uppercase.

isIDStart :: Char -> Bool Source

Characters that can start an identifier.

ID_Start = Lu+Ll+Lt+Lm+Lo+Nl.

isIDContinue :: Char -> Bool Source

Characters that can continue an identifier. See Cf Note.

ID_Continue = ID_Start + Mn+Mc+Nd+Pc.

isDefaultIgnorableCodePoint :: Char -> Bool Source

For programmatic determination of default-ignorable code points. New characters that should be ignored in processing (unless explicitly supported) will be assigned in these ranges, permitting programs to correctly handle the default behavior of such characters when not otherwise supported. For more information, see UTR #29: Text Boundaries (in proposed draft status at release time for Unicode 3.2).

Default_Ignorable_Code_Point = Other_Default_Ignorable_Code_Point + Cf + Cc + Cs - White_Space.

isGraphemeBase :: Char -> Bool Source

For programmatic determination of grapheme cluster boundaries. For more information, see UTR #29: Text Boundaries (in proposed draft status at publication of Unicode 3.2).

Grapheme_Base = [0..10FFFF] - Cc - Cf - Cs - Co - Cn - Zl - Zp - Grapheme_Extend - Grapheme_Link - CGJ.

isGraphemeExtend :: Char -> Bool Source

For programmatic determination of grapheme cluster boundaries. For more information, see UTR #29: Text Boundaries (in proposed draft status at publication of Unicode 3.2).

Grapheme_Extend = Me + Mn + Mc + Other_Grapheme_Extend - Grapheme_Link - CGJ.

isTitlecase :: Char -> Bool Source

Returns true if the general category is Lt.

isLineBreak :: Char -> Bool Source