Copyright | (c) Colin Woodbury 2015 - 2019 |
---|---|
License | GPL3 |
Maintainer | Colin Woodbury <colin@fosskers.ca> |
Safe Haskell | None |
Language | Haskell2010 |
A library for analysing the density of Kanji in given texts, according to their Level classification, as defined by the Japan Kanji Aptitude Testing Foundation (日本漢字能力検定協会).
Synopsis
- data Kanji
- kanji :: Char -> Maybe Kanji
- _kanji :: Kanji -> Char
- allKanji :: Map Level (Set Kanji)
- isKanji :: Char -> Bool
- isHiragana :: Char -> Bool
- isKatakana :: Char -> Bool
- data CharCat
- = Hanzi
- | Hiragana
- | Katakana
- | Numeral
- | RomanLetter
- | Punctuation
- | Other
- category :: Char -> CharCat
- data Level
- level :: Kanji -> Level
- percentSpread :: [Kanji] -> Map Kanji Float
- levelDist :: [Kanji] -> Map Level Float
- uniques :: [Kanji] -> Map Level (Set Kanji)
- densities :: Text -> Map CharCat Float
- elementaryDen :: Map Level Float -> Float
- middleDen :: Map Level Float -> Float
- highDen :: Map Level Float -> Float
Kanji
A single symbol of Kanji. Japanese Kanji were borrowed from China over several waves during the last 1,500 years. Japan names 2,136 of these as their standard set, with rarer characters being the domain of academia and esoteric writers.
Japanese has several Japan-only Kanji, including:
- 畑 (a type of rice field)
- 峠 (a narrow mountain pass)
- 働 (to do physical labour)
Instances
Eq Kanji Source # | |
Ord Kanji Source # | |
Show Kanji Source # | |
Generic Kanji Source # | |
Hashable Kanji Source # | |
Defined in Data.Kanji.Types | |
ToJSON Kanji Source # | |
Defined in Data.Kanji.Types | |
FromJSON Kanji Source # | |
NFData Kanji Source # | |
Defined in Data.Kanji.Types | |
type Rep Kanji Source # | |
Defined in Data.Kanji.Types |
isHiragana :: Char -> Bool Source #
あ to ん.
isKatakana :: Char -> Bool Source #
ア to ン.
Character Categories
General categories for characters, at least as is useful for thinking about Japanese.
Japanese "full-width" numbers and letters will be counted as Numeral
and RomanLetter
respectively, alongside their usual ASCII forms.
Instances
Eq CharCat Source # | |
Ord CharCat Source # | |
Show CharCat Source # | |
Generic CharCat Source # | |
Hashable CharCat Source # | |
Defined in Data.Kanji.Types | |
ToJSON CharCat Source # | |
Defined in Data.Kanji.Types | |
ToJSONKey CharCat Source # | |
Defined in Data.Kanji.Types | |
FromJSON CharCat Source # | |
NFData CharCat Source # | |
Defined in Data.Kanji.Types | |
type Rep CharCat Source # | |
Defined in Data.Kanji.Types type Rep CharCat = D1 (MetaData "CharCat" "Data.Kanji.Types" "kanji-3.4.0.2-EeavXsj13naCjEbT6ezLoZ" False) ((C1 (MetaCons "Hanzi" PrefixI False) (U1 :: Type -> Type) :+: (C1 (MetaCons "Hiragana" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "Katakana" PrefixI False) (U1 :: Type -> Type))) :+: ((C1 (MetaCons "Numeral" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "RomanLetter" PrefixI False) (U1 :: Type -> Type)) :+: (C1 (MetaCons "Punctuation" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "Other" PrefixI False) (U1 :: Type -> Type)))) |
Levels
A Level or Kyuu (級) of Japanese Kanji ranking. There are 12 of these, from 10 to 1, including intermediate levels between 3 and 2, and 2 and 1.
Japanese students will typically have Level-5 ability by the time they finish elementary school. Level-5 accounts for 1,006 characters.
By the end of middle school, they would have covered up to Level-3 (1607 Kanji) in their Japanese class curriculum.
While Level-2 (2,136 Kanji) is considered "standard adult" ability, many adults could not pass the Level-2, or even the Level-Pre2 (1940 Kanji) exam without considerable study.
Level data for Kanji above Level-2 is currently not provided by this library.
Instances
Analysis
percentSpread :: [Kanji] -> Map Kanji Float Source #
The distribution of each Kanji
in a set of them.
The distribution values must sum to 1.
levelDist :: [Kanji] -> Map Level Float Source #
How much of each Level
is represented by a group of Kanji?
The distribution values will sum to 1.
uniques :: [Kanji] -> Map Level (Set Kanji) Source #
Which Kanji appeared from each Level in the text?
Densities
densities :: Text -> Map CharCat Float Source #
Percentage of appearance of each CharCat
in the source text.
The percentages will sum to 1.0.
elementaryDen :: Map Level Float -> Float Source #
How much of the Kanji found are learnt in elementary school in Japan?
elementaryDen . levelDist :: [Kanji] -> Float