language-guess-0.1.2: Guess at which language a text is written in using trigrams.

Safe HaskellNone



Example usage:

>>> dat <- loadData'
>>> head $ guess dat "this is a teststring"
>>> take 2 $ guess dat "dette er en teststreng"
>>> head $ guess dat "lorem ipsum dolor sit amet"



loadData :: FilePath -> IO (Map Language (Map Trigram Rank))Source

Load a cerealized file.

loadData' :: IO (Map Language (Map Trigram Rank))Source

Load the default cerealized file.

guess :: Map Language (Map Trigram Rank) -> String -> [(Language, Double)]Source

Guess the language of a string.

distance :: Map Trigram Rank -> Map Trigram Rank -> DoubleSource

Calculate distance between ranked trigram sets. Cavnar & Trenkle (1994)

rank :: Map Trigram Frequency -> Map Trigram RankSource

Convert a set of trigram frequencies to ranks. Maximum of threshold, uses alphabetical sort to break ties.

parse :: String -> Map (Char, Char, Char) FrequencySource

Make a trigram frequency map out of a string.

clean :: String -> StringSource

Clean a string, removing punctiation, lowering cases, and collapsing adjacent spaces.