| Copyright | (c) 2020 Composewell Technologies and Contributors | 
|---|---|
| License | Apache-2.0 | 
| Maintainer | streamly@composewell.com | 
| Stability | experimental | 
| Safe Haskell | Safe-Inferred | 
| Language | Haskell2010 | 
Unicode.Char.General
Description
General character property related functions.
Synopsis
- data GeneralCategory- = UppercaseLetter
- | LowercaseLetter
- | TitlecaseLetter
- | ModifierLetter
- | OtherLetter
- | NonSpacingMark
- | SpacingCombiningMark
- | EnclosingMark
- | DecimalNumber
- | LetterNumber
- | OtherNumber
- | ConnectorPunctuation
- | DashPunctuation
- | OpenPunctuation
- | ClosePunctuation
- | InitialQuote
- | FinalQuote
- | OtherPunctuation
- | MathSymbol
- | CurrencySymbol
- | ModifierSymbol
- | OtherSymbol
- | Space
- | LineSeparator
- | ParagraphSeparator
- | Control
- | Format
- | Surrogate
- | PrivateUse
- | NotAssigned
 
- generalCategoryAbbr :: GeneralCategory -> String
- generalCategory :: Char -> GeneralCategory
- isAlphabetic :: Char -> Bool
- isAlphaNum :: Char -> Bool
- isControl :: Char -> Bool
- isMark :: Char -> Bool
- isPrint :: Char -> Bool
- isPunctuation :: Char -> Bool
- isSeparator :: Char -> Bool
- isSymbol :: Char -> Bool
- isWhiteSpace :: Char -> Bool
- isLetter :: Char -> Bool
- isSpace :: Char -> Bool
- isAscii :: Char -> Bool
- isLatin1 :: Char -> Bool
- isAsciiUpper :: Char -> Bool
- isAsciiLower :: Char -> Bool
- isJamo :: Char -> Bool
- jamoNCount :: Int
- jamoLFirst :: Int
- jamoLCount :: Int
- jamoLIndex :: Char -> Maybe Int
- jamoLLast :: Int
- jamoVFirst :: Int
- jamoVCount :: Int
- jamoVIndex :: Char -> Maybe Int
- jamoVLast :: Int
- jamoTFirst :: Int
- jamoTCount :: Int
- jamoTIndex :: Char -> Maybe Int
- jamoTLast :: Int
- hangulFirst :: Int
- hangulLast :: Int
- isHangul :: Char -> Bool
- isHangulLV :: Char -> Bool
Unicode general categories
data GeneralCategory Source #
Unicode General Categories.
These classes are defined in the [Unicode Character Database](http:/www.unicode.orgreportstr44tr44-14.html#GC_Values_Table), part of the Unicode standard
Note: the classes must be in the same order they are listed in the Unicode Standard,
because some functions (e.g. generalCategory) rely on the Enum instance.
Since: 0.3.0
Constructors
| UppercaseLetter | 
 | 
| LowercaseLetter | 
 | 
| TitlecaseLetter | 
 | 
| ModifierLetter | 
 | 
| OtherLetter | 
 | 
| NonSpacingMark | 
 | 
| SpacingCombiningMark | 
 | 
| EnclosingMark | 
 | 
| DecimalNumber | 
 | 
| LetterNumber | 
 | 
| OtherNumber | 
 | 
| ConnectorPunctuation | 
 | 
| DashPunctuation | 
 | 
| OpenPunctuation | 
 | 
| ClosePunctuation | 
 | 
| InitialQuote | 
 | 
| FinalQuote | 
 | 
| OtherPunctuation | 
 | 
| MathSymbol | 
 | 
| CurrencySymbol | 
 | 
| ModifierSymbol | 
 | 
| OtherSymbol | 
 | 
| Space | 
 | 
| LineSeparator | 
 | 
| ParagraphSeparator | 
 | 
| Control | 
 | 
| Format | 
 | 
| Surrogate | 
 | 
| PrivateUse | 
 | 
| NotAssigned | 
 | 
Instances
generalCategoryAbbr :: GeneralCategory -> String Source #
Abbreviation of GeneralCategory used in the Unicode standard.
Since: 0.3.0
generalCategory :: Char -> GeneralCategory Source #
The Unicode general category of the character.
This property is defined in the column 2 of the UnicodeData table.
This relies on the Enum instance of GeneralCategory, which must remain in the
same order as the categories are presented in the Unicode standard.
show (generalCategory c) == show (Data.Char.generalCategory c)
Since: 0.3.0
Character classification
isAlphabetic :: Char -> Bool Source #
Returns True for alphabetic Unicode characters (lower-case, upper-case
and title-case letters, plus letters of caseless scripts and modifiers
letters).
Note: this function is not equivalent to
isAlpha / isLetter:
- isAlphamatches the following general categories:- UppercaseLetter(- Lu)
- LowercaseLetter(- Ll)
- TitlecaseLetter(- Lt)
- ModifierLetter(- Lm)
- OtherLetter(- Lo)
 
- whereas - isAlphabeticmatches:- Uppercaseproperty
- Lowercaseproperty
- TitlecaseLetter(- Lt)
- ModifierLetter(- Lm)
- OtherLetter(- Lo)
- LetterNumber(- Nl)
- Other_Alphabeticproperty
 
Since: 0.3.0
isAlphaNum :: Char -> Bool Source #
Selects alphabetic or numeric Unicode characters.
This function returns True if its argument has one of the
following GeneralCategorys, or False otherwise:
- UppercaseLetter
- LowercaseLetter
- TitlecaseLetter
- ModifierLetter
- OtherLetter
- DecimalNumber
- LetterNumber
- OtherNumber
isAlphaNum c == Data.Char.isAlphaNum c
Since: 0.3.0
isControl :: Char -> Bool Source #
Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.
This function returns True if its argument has the GeneralCategory Control.
isControl c == Data.Char.isControl c
Since: 0.3.0
isMark :: Char -> Bool Source #
Selects Unicode mark characters, for example accents and the like, which combine with preceding characters.
This function returns True if its argument has one of the
following GeneralCategorys, or False otherwise:
isMark c == Data.Char.isMark c
Since: 0.3.0
isPrint :: Char -> Bool Source #
Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).
This function returns False if its argument has one of the
following GeneralCategorys, or True otherwise:
isPrint c == Data.Char.isPrint c
Since: 0.3.0
isPunctuation :: Char -> Bool Source #
Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.
This function returns True if its argument has one of the
following GeneralCategorys, or False otherwise:
- ConnectorPunctuation
- DashPunctuation
- OpenPunctuation
- ClosePunctuation
- InitialQuote
- FinalQuote
- OtherPunctuation
isPunctuation c == Data.Char.isPunctuation c
Since: 0.3.0
isSeparator :: Char -> Bool Source #
Selects Unicode space and separator characters.
This function returns True if its argument has one of the
following GeneralCategorys, or False otherwise:
isSeparator c == Data.Char.isSeparator c
Since: 0.3.0
isSymbol :: Char -> Bool Source #
Selects Unicode symbol characters, including mathematical and currency symbols.
This function returns True if its argument has one of the
following GeneralCategorys, or False otherwise:
* MathSymbol
* CurrencySymbol
* ModifierSymbol
* OtherSymbol
isSymbol c == Data.Char.isSymbol c
Since: 0.3.0
isWhiteSpace :: Char -> Bool Source #
Returns True for any whitespace characters, and the control
characters \t, \n, \r, \f, \v.
See: Unicode White_Space.
Note: isWhiteSpace is not equivalent to isSpace.
isWhiteSpace selects the same characters from isSpace plus the following:
- U+0085NEXT LINE (NEL)
- U+2028LINE SEPARATOR
- U+2029PARAGRAPH SEPARATOR
Since: 0.3.0
isLetter :: Char -> Bool Source #
Deprecated: Use isAlphabetic instead. Note that the behavior of this function does not match base:Data.Char.isLetter. See Unicode.Char.General.Compat for behavior compatible with base:Data.Char.
Returns True for alphabetic Unicode characters (lower-case, upper-case
 and title-case letters, plus letters of caseless scripts and modifiers
 letters).
Since: 0.1.0
isSpace :: Char -> Bool Source #
Deprecated: Use isWhiteSpace instead. Note that the behavior of this function does not match base:Data.Char.isSpace. See Unicode.Char.General.Compat for behavior compatible with base:Data.Char.
Returns True for any whitespace characters, and the control
 characters \t, \n, \r, \f, \v.
Since: 0.1.0
Re-export
Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.
Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.
isAsciiUpper :: Char -> Bool #
isAsciiLower :: Char -> Bool #
Korean Hangul Characters
The Hangul script used in the Korean writing system consists of individual consonant and vowel letters (jamo) that are visually combined into square display cells to form entire syllable blocks. Hangul syllables may be encoded directly as precomposed combinations of individual jamo or as decomposed sequences of conjoining jamo. Modern Hangul syllable blocks can be expressed with either two or three jamo, either in the form consonant + vowel or in the form consonant + vowel + consonant. The leading consonant is represented as L, the vowel as V and the trailing consonant as T.
The Unicode Standard contains both a large set of precomposed modern Hangul syllables and a set of conjoining Hangul jamo, which can be used to encode archaic Korean syllable blocks as well as modern Korean syllable blocks.
Hangul characters can be composed or decomposed algorithmically instead of via mappings. These APIs are used mainly for Unicode normalization of Hangul text.
Please refer to the following resources for more information:
- The Hangulsection of theEast Asiachapter of the Unicode Standard
- Conformance chapter of the Unicode Standard
- Unicode® Standard Annex #15 - Unicode Normalization Forms
- UCD file HangulSyllableType.txt
- https://en.wikipedia.org/wiki/Hangul_Jamo_(Unicode_block)
- https://en.wikipedia.org/wiki/List_of_Hangul_jamo
Conjoining Jamo
Jamo L, V and T letters.
isJamo :: Char -> Bool Source #
Determine whether a character is a jamo L, V or T character.
Since: 0.1.0
jamoNCount :: Int Source #
Total count of all jamo characters.
jamoNCount = jamoVCount * jamoTCount
Since: 0.1.0
Jamo Leading (L)
jamoLFirst :: Int Source #
First leading consonant jamo.
Since: 0.1.0
jamoLCount :: Int Source #
Total count of leading consonant jamo.
Since: 0.3.0
jamoLIndex :: Char -> Maybe Int Source #
Given a Unicode character, if it is a leading jamo, return its index in
 the list of leading jamo consonants, otherwise return Nothing.
Since: 0.1.0
Jamo Vowel (V)
jamoVFirst :: Int Source #
First vowel jamo.
Since: 0.1.0
jamoVCount :: Int Source #
Total count of vowel jamo.
Since: 0.1.0
jamoVIndex :: Char -> Maybe Int Source #
Given a Unicode character, if it is a vowel jamo, return its index in the
 list of vowel jamo, otherwise return Nothing.
Since: 0.1.0
Jamo Trailing (T)
jamoTFirst :: Int Source #
The first trailing consonant jamo.
Note that jamoTFirst does not represent a valid T, it represents a missing
 T i.e. LV without a T. See comments under jamoTIndex .
Since: 0.1.0
jamoTCount :: Int Source #
Total count of trailing consonant jamo.
Since: 0.1.0
jamoTIndex :: Char -> Maybe Int Source #
Given a Unicode character, if it is a trailing jamo consonant, return its
 index in the list of trailing jamo consonants, otherwise return Nothing.
Note that index 0 is not a valid index for a trailing consonant. Index 0 corresponds to an LV syllable, without a T. See "Hangul Syllable Decomposition" in the Conformance chapter of the Unicode standard for more details.
Since: 0.1.0
Hangul Syllables
Precomposed Hangul syllables.
hangulFirst :: Int Source #
Codepoint of the first pre-composed Hangul character.
Since: 0.1.0
hangulLast :: Int Source #
Codepoint of the last Hangul character.
Since: 0.1.0
isHangul :: Char -> Bool Source #
Determine if the given character is a precomposed Hangul syllable.
Since: 0.1.0
isHangulLV :: Char -> Bool Source #
Determine if the given character is a Hangul LV syllable.
Note: this function requires a precomposed Hangul syllable but does not
 check it. Use isHangul to check the input character before passing it to
 isHangulLV.
Since: 0.1.0