cjk-0.1.0.1: Data about Chinese, Japanese and Korean characters and languages

Safe HaskellNone

CJK.Data.Unihan.Readings

Contents

Synopsis

Documentation

definition :: Char -> [CharDefinition]Source

An English definition for this character. Definitions are for modern written Chinese and are usually (but not always) the same as the definition in other Chinese dialects or non-Chinese languages. In some cases, synonyms are indicated. Fuller variant information can be found using the various variant fields.

Definitions specific to non-Chinese languages or Chinese dialects other than modern Mandarin are marked, e.g., (Cant.) or (J). Minor definitions are separated by commas.

Mandarin

type IsHDZSubstitution = BoolSource

Whether this reference had an encoded variant substituted for an unencoded character used by the Hànyǔ Dà Zìdiǎn

mandarinBestEffort :: Char -> [Phone]Source

Returns how to pronounce an ideograph in Mandarin, making the best effort to use all of the CEDICT data to get a good answer. Readings are returned in approximate frequency order.

This algorithm is based on the Unihan FAQ http://www.unicode.org/faq/han_cjk.html, which states that the best way is to use the kHanyuPinlu, kXHC1983, and kHanyuPinyin fields in that order. The kMandarin field may have some readings the other three do not but should be used with caution. The kHanyuPinlu field lists the most common readings for ideographs in order of frequency of use and is the most useful for most purposes. The kXHC1983 field contains the most important readings for characters in modern use, and the kHanyuPinyin field contains an exhaustive set of readings for a large set of characters, but includes obscure readings of historic interest only

mandarin :: Char -> Maybe (Phone, Phone)Source

The most customary pinyin reading for this character; that is, the reading most commonly used in modern text, with some preference given to readings most likely to be in sorted lists.

The first value returned is preferred for zh-Hans (CN) and the second is preferred for zh-Hant (TW). Commonly, they will be exactly the same.

You may want to use mandarinBestEffort instead of this function.

xhc1983 :: Char -> [([(HDZEntry, IsHDZSubstitution)], [Phone])]Source

One or more Hànyǔ Pīnyīn readings as given in the Xiàndài Hànyǔ Cídiǎn.

You may want to use mandarinBestEffort instead of this function.

Cantonese

Ancient Chinese

type CommonTangCharacter = BoolSource

Whether the word or morpheme represented in toto or in part by the given character with the given reading occurs more than four times in the seven hundred poems covered by T’ang Poetic Vocabulary by Hugh M. Stimson, Far Eastern Publications, Yale Univ. 1976

tang :: Char -> [(CommonTangCharacter, Text)]Source

The Tang dynasty pronunciation(s) of this character, in an undefined romanization.

Korean

hangul :: Char -> [Phone]Source

The modern Korean pronunciation(s) for this character in Hangul.

korean :: Char -> [Phone]Source

The Korean pronunciation(s) of this character, using the Yale romanization system.

Japanese

japaneseKun :: Char -> [Text]Source

The Japanese kun'yomi pronunciation of this character, in an undefined romanization system. It is recommended that you use kanjidic2 http://www.csse.monash.edu.au/~jwb/kanjidic2/ instead of this data.

japaneseOn :: Char -> [Text]Source

The Japanese on'yomi pronunciation of this character, in an undefined romanization system. It is recommended that you use kanjidic2 http://www.csse.monash.edu.au/~jwb/kanjidic2/ instead of this data.

Vietnamese

vietnamese :: Char -> [Phone]Source

The character’s pronunciation(s) in Quốc ngữ.