kanji-3.4.0.2: Perform 漢字検定 (Japan Kanji Aptitude Test) level analysis on Japanese Kanji

Copyright(c) Colin Woodbury 2015 - 2019
LicenseGPL3
MaintainerColin Woodbury <colin@fosskers.ca>
Safe HaskellNone
LanguageHaskell2010

Data.Kanji

Contents

Description

A library for analysing the density of Kanji in given texts, according to their Level classification, as defined by the Japan Kanji Aptitude Testing Foundation (日本漢字能力検定協会).

Synopsis

Kanji

data Kanji Source #

A single symbol of Kanji. Japanese Kanji were borrowed from China over several waves during the last 1,500 years. Japan names 2,136 of these as their standard set, with rarer characters being the domain of academia and esoteric writers.

Japanese has several Japan-only Kanji, including:

  • 畑 (a type of rice field)
  • 峠 (a narrow mountain pass)
  • 働 (to do physical labour)
Instances
Eq Kanji Source # 
Instance details

Defined in Data.Kanji.Types

Methods

(==) :: Kanji -> Kanji -> Bool #

(/=) :: Kanji -> Kanji -> Bool #

Ord Kanji Source # 
Instance details

Defined in Data.Kanji.Types

Methods

compare :: Kanji -> Kanji -> Ordering #

(<) :: Kanji -> Kanji -> Bool #

(<=) :: Kanji -> Kanji -> Bool #

(>) :: Kanji -> Kanji -> Bool #

(>=) :: Kanji -> Kanji -> Bool #

max :: Kanji -> Kanji -> Kanji #

min :: Kanji -> Kanji -> Kanji #

Show Kanji Source # 
Instance details

Defined in Data.Kanji.Types

Methods

showsPrec :: Int -> Kanji -> ShowS #

show :: Kanji -> String #

showList :: [Kanji] -> ShowS #

Generic Kanji Source # 
Instance details

Defined in Data.Kanji.Types

Associated Types

type Rep Kanji :: Type -> Type #

Methods

from :: Kanji -> Rep Kanji x #

to :: Rep Kanji x -> Kanji #

Hashable Kanji Source # 
Instance details

Defined in Data.Kanji.Types

Methods

hashWithSalt :: Int -> Kanji -> Int #

hash :: Kanji -> Int #

ToJSON Kanji Source # 
Instance details

Defined in Data.Kanji.Types

FromJSON Kanji Source # 
Instance details

Defined in Data.Kanji.Types

NFData Kanji Source # 
Instance details

Defined in Data.Kanji.Types

Methods

rnf :: Kanji -> () #

type Rep Kanji Source # 
Instance details

Defined in Data.Kanji.Types

type Rep Kanji = D1 (MetaData "Kanji" "Data.Kanji.Types" "kanji-3.4.0.2-EeavXsj13naCjEbT6ezLoZ" True) (C1 (MetaCons "Kanji" PrefixI False) (S1 (MetaSel (Nothing :: Maybe Symbol) NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Char)))

kanji :: Char -> Maybe Kanji Source #

Construct a Kanji value from some Char if it falls in the correct UTF8 range.

_kanji :: Kanji -> Char Source #

The original Char of a Kanji.

allKanji :: Map Level (Set Kanji) Source #

All Japanese Kanji, grouped by their Level (級).

isKanji :: Char -> Bool Source #

Legal Kanji appear between UTF-8 characters 19968 and 40959.

isHiragana :: Char -> Bool Source #

あ to ん.

isKatakana :: Char -> Bool Source #

ア to ン.

Character Categories

data CharCat Source #

General categories for characters, at least as is useful for thinking about Japanese.

Japanese "full-width" numbers and letters will be counted as Numeral and RomanLetter respectively, alongside their usual ASCII forms.

Instances
Eq CharCat Source # 
Instance details

Defined in Data.Kanji.Types

Methods

(==) :: CharCat -> CharCat -> Bool #

(/=) :: CharCat -> CharCat -> Bool #

Ord CharCat Source # 
Instance details

Defined in Data.Kanji.Types

Show CharCat Source # 
Instance details

Defined in Data.Kanji.Types

Generic CharCat Source # 
Instance details

Defined in Data.Kanji.Types

Associated Types

type Rep CharCat :: Type -> Type #

Methods

from :: CharCat -> Rep CharCat x #

to :: Rep CharCat x -> CharCat #

Hashable CharCat Source # 
Instance details

Defined in Data.Kanji.Types

Methods

hashWithSalt :: Int -> CharCat -> Int #

hash :: CharCat -> Int #

ToJSON CharCat Source # 
Instance details

Defined in Data.Kanji.Types

ToJSONKey CharCat Source # 
Instance details

Defined in Data.Kanji.Types

FromJSON CharCat Source # 
Instance details

Defined in Data.Kanji.Types

NFData CharCat Source # 
Instance details

Defined in Data.Kanji.Types

Methods

rnf :: CharCat -> () #

type Rep CharCat Source # 
Instance details

Defined in Data.Kanji.Types

type Rep CharCat = D1 (MetaData "CharCat" "Data.Kanji.Types" "kanji-3.4.0.2-EeavXsj13naCjEbT6ezLoZ" False) ((C1 (MetaCons "Hanzi" PrefixI False) (U1 :: Type -> Type) :+: (C1 (MetaCons "Hiragana" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "Katakana" PrefixI False) (U1 :: Type -> Type))) :+: ((C1 (MetaCons "Numeral" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "RomanLetter" PrefixI False) (U1 :: Type -> Type)) :+: (C1 (MetaCons "Punctuation" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "Other" PrefixI False) (U1 :: Type -> Type))))

Levels

data Level Source #

A Level or Kyuu (級) of Japanese Kanji ranking. There are 12 of these, from 10 to 1, including intermediate levels between 3 and 2, and 2 and 1.

Japanese students will typically have Level-5 ability by the time they finish elementary school. Level-5 accounts for 1,006 characters.

By the end of middle school, they would have covered up to Level-3 (1607 Kanji) in their Japanese class curriculum.

While Level-2 (2,136 Kanji) is considered "standard adult" ability, many adults could not pass the Level-2, or even the Level-Pre2 (1940 Kanji) exam without considerable study.

Level data for Kanji above Level-2 is currently not provided by this library.

Instances
Enum Level Source # 
Instance details

Defined in Data.Kanji.Types

Eq Level Source # 
Instance details

Defined in Data.Kanji.Types

Methods

(==) :: Level -> Level -> Bool #

(/=) :: Level -> Level -> Bool #

Ord Level Source # 
Instance details

Defined in Data.Kanji.Types

Methods

compare :: Level -> Level -> Ordering #

(<) :: Level -> Level -> Bool #

(<=) :: Level -> Level -> Bool #

(>) :: Level -> Level -> Bool #

(>=) :: Level -> Level -> Bool #

max :: Level -> Level -> Level #

min :: Level -> Level -> Level #

Show Level Source # 
Instance details

Defined in Data.Kanji.Types

Methods

showsPrec :: Int -> Level -> ShowS #

show :: Level -> String #

showList :: [Level] -> ShowS #

Generic Level Source # 
Instance details

Defined in Data.Kanji.Types

Associated Types

type Rep Level :: Type -> Type #

Methods

from :: Level -> Rep Level x #

to :: Rep Level x -> Level #

Hashable Level Source # 
Instance details

Defined in Data.Kanji.Types

Methods

hashWithSalt :: Int -> Level -> Int #

hash :: Level -> Int #

ToJSON Level Source # 
Instance details

Defined in Data.Kanji.Types

ToJSONKey Level Source # 
Instance details

Defined in Data.Kanji.Types

FromJSON Level Source # 
Instance details

Defined in Data.Kanji.Types

NFData Level Source # 
Instance details

Defined in Data.Kanji.Types

Methods

rnf :: Level -> () #

type Rep Level Source # 
Instance details

Defined in Data.Kanji.Types

type Rep Level = D1 (MetaData "Level" "Data.Kanji.Types" "kanji-3.4.0.2-EeavXsj13naCjEbT6ezLoZ" False) (((C1 (MetaCons "Ten" PrefixI False) (U1 :: Type -> Type) :+: (C1 (MetaCons "Nine" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "Eight" PrefixI False) (U1 :: Type -> Type))) :+: (C1 (MetaCons "Seven" PrefixI False) (U1 :: Type -> Type) :+: (C1 (MetaCons "Six" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "Five" PrefixI False) (U1 :: Type -> Type)))) :+: ((C1 (MetaCons "Four" PrefixI False) (U1 :: Type -> Type) :+: (C1 (MetaCons "Three" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "PreTwo" PrefixI False) (U1 :: Type -> Type))) :+: ((C1 (MetaCons "Two" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "PreOne" PrefixI False) (U1 :: Type -> Type)) :+: (C1 (MetaCons "One" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "Unknown" PrefixI False) (U1 :: Type -> Type)))))

level :: Kanji -> Level Source #

What Level does a Kanji belong to? Unknown for Kanji above level Two.

Analysis

percentSpread :: [Kanji] -> Map Kanji Float Source #

The distribution of each Kanji in a set of them. The distribution values must sum to 1.

levelDist :: [Kanji] -> Map Level Float Source #

How much of each Level is represented by a group of Kanji? The distribution values will sum to 1.

uniques :: [Kanji] -> Map Level (Set Kanji) Source #

Which Kanji appeared from each Level in the text?

Densities

densities :: Text -> Map CharCat Float Source #

Percentage of appearance of each CharCat in the source text. The percentages will sum to 1.0.

elementaryDen :: Map Level Float -> Float Source #

How much of the Kanji found are learnt in elementary school in Japan?

elementaryDen . levelDist :: [Kanji] -> Float

middleDen :: Map Level Float -> Float Source #

How much of the Kanji found are learnt by the end of middle school?

middleDen . levelDist :: [Kanji] -> Float

highDen :: Map Level Float -> Float Source #

How much of the Kanji found are learnt by the end of high school?

highDen . levelDist :: [Kanji] -> Float