Portability | portable |
---|---|
Stability | stable |
Maintainer | libraries@haskell.org |
Safe Haskell | Trustworthy |
The Char type and associated operations.
- data Char
- isControl :: Char -> Bool
- isSpace :: Char -> Bool
- isLower :: Char -> Bool
- isUpper :: Char -> Bool
- isAlpha :: Char -> Bool
- isAlphaNum :: Char -> Bool
- isPrint :: Char -> Bool
- isDigit :: Char -> Bool
- isOctDigit :: Char -> Bool
- isHexDigit :: Char -> Bool
- isLetter :: Char -> Bool
- isMark :: Char -> Bool
- isNumber :: Char -> Bool
- isPunctuation :: Char -> Bool
- isSymbol :: Char -> Bool
- isSeparator :: Char -> Bool
- isAscii :: Char -> Bool
- isLatin1 :: Char -> Bool
- isAsciiUpper :: Char -> Bool
- isAsciiLower :: Char -> Bool
- data GeneralCategory
- = UppercaseLetter
- | LowercaseLetter
- | TitlecaseLetter
- | ModifierLetter
- | OtherLetter
- | NonSpacingMark
- | SpacingCombiningMark
- | EnclosingMark
- | DecimalNumber
- | LetterNumber
- | OtherNumber
- | ConnectorPunctuation
- | DashPunctuation
- | OpenPunctuation
- | ClosePunctuation
- | InitialQuote
- | FinalQuote
- | OtherPunctuation
- | MathSymbol
- | CurrencySymbol
- | ModifierSymbol
- | OtherSymbol
- | Space
- | LineSeparator
- | ParagraphSeparator
- | Control
- | Format
- | Surrogate
- | PrivateUse
- | NotAssigned
- generalCategory :: Char -> GeneralCategory
- toUpper :: Char -> Char
- toLower :: Char -> Char
- toTitle :: Char -> Char
- digitToInt :: Char -> Int
- intToDigit :: Int -> Char
- ord :: Char -> Int
- chr :: Int -> Char
- showLitChar :: Char -> ShowS
- lexLitChar :: ReadS String
- readLitChar :: ReadS Char
Documentation
The character type Char
is an enumeration whose values represent
Unicode (or equivalently ISO/IEC 10646) characters (see
http://www.unicode.org/ for details). This set extends the ISO 8859-1
(Latin-1) character set (the first 256 characters), which is itself an extension
of the ASCII character set (the first 128 characters). A character literal in
Haskell has type Char
.
To convert a Char
to or from the corresponding Int
value defined
by Unicode, use toEnum
and fromEnum
from the
Enum
class respectively (or equivalently ord
and chr
).
Character classification
Unicode characters are divided into letters, numbers, marks, punctuation, symbols, separators (including spaces) and others (including control characters).
isControl :: Char -> BoolSource
Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.
Returns True
for any Unicode space character, and the control
characters \t
, \n
, \r
, \f
, \v
.
Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.
Selects alphabetic Unicode characters (lower-case, upper-case and
title-case letters, plus letters of caseless scripts and modifiers letters).
This function is equivalent to isLetter
.
isAlphaNum :: Char -> BoolSource
Selects alphabetic or numeric digit Unicode characters.
Note that numeric digits outside the ASCII range are selected by this
function but not by isDigit
. Such digits may be part of identifiers
but are not used by the printer and reader to represent numbers.
Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).
isOctDigit :: Char -> BoolSource
Selects ASCII octal digits, i.e. '0'
..'7'
.
isHexDigit :: Char -> BoolSource
Selects ASCII hexadecimal digits,
i.e. '0'
..'9'
, 'a'
..'f'
, 'A'
..'F'
.
isLetter :: Char -> BoolSource
Selects alphabetic Unicode characters (lower-case, upper-case and
title-case letters, plus letters of caseless scripts and modifiers letters).
This function is equivalent to isAlpha
.
Selects Unicode mark characters, e.g. accents and the like, which combine with preceding letters.
isNumber :: Char -> BoolSource
Selects Unicode numeric characters, including digits from various scripts, Roman numerals, etc.
isPunctuation :: Char -> BoolSource
Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.
isSymbol :: Char -> BoolSource
Selects Unicode symbol characters, including mathematical and currency symbols.
isSeparator :: Char -> BoolSource
Selects Unicode space and separator characters.
Subranges
Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.
isLatin1 :: Char -> BoolSource
Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.
isAsciiUpper :: Char -> BoolSource
isAsciiLower :: Char -> BoolSource
Unicode general categories
data GeneralCategory Source
Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard.
UppercaseLetter | Lu: Letter, Uppercase |
LowercaseLetter | Ll: Letter, Lowercase |
TitlecaseLetter | Lt: Letter, Titlecase |
ModifierLetter | Lm: Letter, Modifier |
OtherLetter | Lo: Letter, Other |
NonSpacingMark | Mn: Mark, Non-Spacing |
SpacingCombiningMark | Mc: Mark, Spacing Combining |
EnclosingMark | Me: Mark, Enclosing |
DecimalNumber | Nd: Number, Decimal |
LetterNumber | Nl: Number, Letter |
OtherNumber | No: Number, Other |
ConnectorPunctuation | Pc: Punctuation, Connector |
DashPunctuation | Pd: Punctuation, Dash |
OpenPunctuation | Ps: Punctuation, Open |
ClosePunctuation | Pe: Punctuation, Close |
InitialQuote | Pi: Punctuation, Initial quote |
FinalQuote | Pf: Punctuation, Final quote |
OtherPunctuation | Po: Punctuation, Other |
MathSymbol | Sm: Symbol, Math |
CurrencySymbol | Sc: Symbol, Currency |
ModifierSymbol | Sk: Symbol, Modifier |
OtherSymbol | So: Symbol, Other |
Space | Zs: Separator, Space |
LineSeparator | Zl: Separator, Line |
ParagraphSeparator | Zp: Separator, Paragraph |
Control | Cc: Other, Control |
Format | Cf: Other, Format |
Surrogate | Cs: Other, Surrogate |
PrivateUse | Co: Other, Private Use |
NotAssigned | Cn: Other, Not Assigned |
generalCategory :: Char -> GeneralCategorySource
The Unicode general category of the character.
Case conversion
Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.
Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.
Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.
Single digit characters
digitToInt :: Char -> IntSource
Convert a single digit Char
to the corresponding Int
.
This function fails unless its argument satisfies isHexDigit
,
but recognises both upper and lower-case hexadecimal digits
(i.e. '0'
..'9'
, 'a'
..'f'
, 'A'
..'F'
).
intToDigit :: Int -> CharSource
Numeric representations
String representations
showLitChar :: Char -> ShowSSource
Convert a character to a string using only printable characters, using Haskell source-language escape conventions. For example:
showLitChar '\n' s = "\\n" ++ s
lexLitChar :: ReadS StringSource
Read a string representation of a character, using Haskell source-language escape conventions. For example:
lexLitChar "\\nHello" = [("\\n", "Hello")]
readLitChar :: ReadS CharSource
Read a string representation of a character, using Haskell source-language escape conventions, and convert it to the character that it encodes. For example:
readLitChar "\\nHello" = [('\n', "Hello")]