unicode-data-0.3.1: Access Unicode Character Database (UCD)
Copyright(c) 2020 Composewell Technologies and Contributors
LicenseApache-2.0
Maintainerstreamly@composewell.com
Stabilityexperimental
Safe HaskellSafe-Inferred
LanguageHaskell2010

Unicode.Char.General

Description

General character property related functions.

Synopsis

Unicode general categories

data GeneralCategory Source #

Unicode General Categories.

These classes are defined in the [Unicode Character Database](http:/www.unicode.orgreportstr44tr44-14.html#GC_Values_Table), part of the Unicode standard

Note: the classes must be in the same order they are listed in the Unicode Standard, because some functions (e.g. generalCategory) rely on the Enum instance.

Since: 0.3.0

Constructors

UppercaseLetter

Lu: Letter, Uppercase

LowercaseLetter

Ll: Letter, Lowercase

TitlecaseLetter

Lt: Letter, Titlecase

ModifierLetter

Lm: Letter, Modifier

OtherLetter

Lo: Letter, Other

NonSpacingMark

Mn: Mark, Non-Spacing

SpacingCombiningMark

Mc: Mark, Spacing Combining

EnclosingMark

Me: Mark, Enclosing

DecimalNumber

Nd: Number, Decimal

LetterNumber

Nl: Number, Letter

OtherNumber

No: Number, Other

ConnectorPunctuation

Pc: Punctuation, Connector

DashPunctuation

Pd: Punctuation, Dash

OpenPunctuation

Ps: Punctuation, Open

ClosePunctuation

Pe: Punctuation, Close

InitialQuote

Pi: Punctuation, Initial quote

FinalQuote

Pf: Punctuation, Final quote

OtherPunctuation

Po: Punctuation, Other

MathSymbol

Sm: Symbol, Math

CurrencySymbol

Sc: Symbol, Currency

ModifierSymbol

Sk: Symbol, Modifier

OtherSymbol

So: Symbol, Other

Space

Zs: Separator, Space

LineSeparator

Zl: Separator, Line

ParagraphSeparator

Zp: Separator, Paragraph

Control

Cc: Other, Control

Format

Cf: Other, Format

Surrogate

Cs: Other, Surrogate

PrivateUse

Co: Other, Private Use

NotAssigned

Cn: Other, Not Assigned

Instances

Instances details
Bounded GeneralCategory Source # 
Instance details

Defined in Unicode.Char.General

Enum GeneralCategory Source # 
Instance details

Defined in Unicode.Char.General

Ix GeneralCategory Source # 
Instance details

Defined in Unicode.Char.General

Show GeneralCategory Source # 
Instance details

Defined in Unicode.Char.General

Eq GeneralCategory Source # 
Instance details

Defined in Unicode.Char.General

Ord GeneralCategory Source # 
Instance details

Defined in Unicode.Char.General

generalCategoryAbbr :: GeneralCategory -> String Source #

Abbreviation of GeneralCategory used in the Unicode standard.

Since: 0.3.0

generalCategory :: Char -> GeneralCategory Source #

The Unicode general category of the character.

This property is defined in the column 2 of the UnicodeData table.

This relies on the Enum instance of GeneralCategory, which must remain in the same order as the categories are presented in the Unicode standard.

show (generalCategory c) == show (Data.Char.generalCategory c)

Since: 0.3.0

Character classification

isAlphabetic :: Char -> Bool Source #

Returns True for alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters).

Note: this function is not equivalent to isAlpha / isLetter:

Since: 0.3.0

isAlphaNum :: Char -> Bool Source #

Selects alphabetic or numeric Unicode characters.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

isAlphaNum c == Data.Char.isAlphaNum c

Since: 0.3.0

isControl :: Char -> Bool Source #

Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

This function returns True if its argument has the GeneralCategory Control.

isControl c == Data.Char.isControl c

Since: 0.3.0

isMark :: Char -> Bool Source #

Selects Unicode mark characters, for example accents and the like, which combine with preceding characters.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

isMark c == Data.Char.isMark c

Since: 0.3.0

isPrint :: Char -> Bool Source #

Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).

This function returns False if its argument has one of the following GeneralCategorys, or True otherwise:

isPrint c == Data.Char.isPrint c

Since: 0.3.0

isPunctuation :: Char -> Bool Source #

Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

isPunctuation c == Data.Char.isPunctuation c

Since: 0.3.0

isSeparator :: Char -> Bool Source #

Selects Unicode space and separator characters.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

isSeparator c == Data.Char.isSeparator c

Since: 0.3.0

isSymbol :: Char -> Bool Source #

Selects Unicode symbol characters, including mathematical and currency symbols.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise: * MathSymbol * CurrencySymbol * ModifierSymbol * OtherSymbol

isSymbol c == Data.Char.isSymbol c

Since: 0.3.0

isWhiteSpace :: Char -> Bool Source #

Returns True for any whitespace characters, and the control characters \t, \n, \r, \f, \v.

See: Unicode White_Space.

Note: isWhiteSpace is not equivalent to isSpace. isWhiteSpace selects the same characters from isSpace plus the following:

  • U+0085 NEXT LINE (NEL)
  • U+2028 LINE SEPARATOR
  • U+2029 PARAGRAPH SEPARATOR

Since: 0.3.0

isLetter :: Char -> Bool Source #

Deprecated: Use isAlphabetic instead. Note that the behavior of this function does not match base:Data.Char.isLetter. See Unicode.Char.General.Compat for behavior compatible with base:Data.Char.

Returns True for alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters).

Since: 0.1.0

isSpace :: Char -> Bool Source #

Deprecated: Use isWhiteSpace instead. Note that the behavior of this function does not match base:Data.Char.isSpace. See Unicode.Char.General.Compat for behavior compatible with base:Data.Char.

Returns True for any whitespace characters, and the control characters \t, \n, \r, \f, \v.

Since: 0.1.0

Re-export

isAscii :: Char -> Bool #

Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

isLatin1 :: Char -> Bool #

Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

isAsciiUpper :: Char -> Bool #

Selects ASCII upper-case letters, i.e. characters satisfying both isAscii and isUpper.

isAsciiLower :: Char -> Bool #

Selects ASCII lower-case letters, i.e. characters satisfying both isAscii and isLower.

Korean Hangul Characters

The Hangul script used in the Korean writing system consists of individual consonant and vowel letters (jamo) that are visually combined into square display cells to form entire syllable blocks. Hangul syllables may be encoded directly as precomposed combinations of individual jamo or as decomposed sequences of conjoining jamo. Modern Hangul syllable blocks can be expressed with either two or three jamo, either in the form consonant + vowel or in the form consonant + vowel + consonant. The leading consonant is represented as L, the vowel as V and the trailing consonant as T.

The Unicode Standard contains both a large set of precomposed modern Hangul syllables and a set of conjoining Hangul jamo, which can be used to encode archaic Korean syllable blocks as well as modern Korean syllable blocks.

Hangul characters can be composed or decomposed algorithmically instead of via mappings. These APIs are used mainly for Unicode normalization of Hangul text.

Please refer to the following resources for more information:

Conjoining Jamo

Jamo L, V and T letters.

isJamo :: Char -> Bool Source #

Determine whether a character is a jamo L, V or T character.

Since: 0.1.0

jamoNCount :: Int Source #

Total count of all jamo characters.

jamoNCount = jamoVCount * jamoTCount

Since: 0.1.0

Jamo Leading (L)

jamoLFirst :: Int Source #

First leading consonant jamo.

Since: 0.1.0

jamoLCount :: Int Source #

Total count of leading consonant jamo.

Since: 0.3.0

jamoLIndex :: Char -> Maybe Int Source #

Given a Unicode character, if it is a leading jamo, return its index in the list of leading jamo consonants, otherwise return Nothing.

Since: 0.1.0

jamoLLast :: Int Source #

Last leading consonant jamo.

Since: 0.1.0

Jamo Vowel (V)

jamoVFirst :: Int Source #

First vowel jamo.

Since: 0.1.0

jamoVCount :: Int Source #

Total count of vowel jamo.

Since: 0.1.0

jamoVIndex :: Char -> Maybe Int Source #

Given a Unicode character, if it is a vowel jamo, return its index in the list of vowel jamo, otherwise return Nothing.

Since: 0.1.0

jamoVLast :: Int Source #

Last vowel jamo.

Since: 0.1.0

Jamo Trailing (T)

jamoTFirst :: Int Source #

The first trailing consonant jamo.

Note that jamoTFirst does not represent a valid T, it represents a missing T i.e. LV without a T. See comments under jamoTIndex .

Since: 0.1.0

jamoTCount :: Int Source #

Total count of trailing consonant jamo.

Since: 0.1.0

jamoTIndex :: Char -> Maybe Int Source #

Given a Unicode character, if it is a trailing jamo consonant, return its index in the list of trailing jamo consonants, otherwise return Nothing.

Note that index 0 is not a valid index for a trailing consonant. Index 0 corresponds to an LV syllable, without a T. See "Hangul Syllable Decomposition" in the Conformance chapter of the Unicode standard for more details.

Since: 0.1.0

jamoTLast :: Int Source #

Last trailing consonant jamo.

Since: 0.1.0

Hangul Syllables

Precomposed Hangul syllables.

hangulFirst :: Int Source #

Codepoint of the first pre-composed Hangul character.

Since: 0.1.0

hangulLast :: Int Source #

Codepoint of the last Hangul character.

Since: 0.1.0

isHangul :: Char -> Bool Source #

Determine if the given character is a precomposed Hangul syllable.

Since: 0.1.0

isHangulLV :: Char -> Bool Source #

Determine if the given character is a Hangul LV syllable.

Note: this function requires a precomposed Hangul syllable but does not check it. Use isHangul to check the input character before passing it to isHangulLV.

Since: 0.1.0