spelling-suggest-0.5.0.1: Spelling suggestion tool with library and command-line interfaces.

Text.SpellingSuggest.LowLevel

Description

Implementation-level interface for spelling suggestion.

Synopsis

Documentation

nearbyWordFilter :: String -> String -> BoolSource

Return True if the editDistance from the target word to the given word is small enough.

anyWordFilter :: String -> String -> BoolSource

Always returns True.

editDistance :: String -> String -> IntSource

The weighted edit distance between a pair of strings, with weights for insertion, deletion, transposition and substitution chose to try to mimic spelling errors.

soundex :: Bool -> String -> String

Compute a full soundex code; i.e., do not drop any encodable characters from the result. The leading character of the code will be folded to uppercase. Non-alphabetics are not encoded. If no alphabetics are present, the soundex code will be 0.

The two commonly encountered forms of soundex are Simplified and another known as American, Miracode, NARA or Knuth. This code will calculate either---passing True gets NARA, and False gets Simplified.

phonix :: String -> String

Compute a full phonix code; i.e., do not drop any encodable characters from the result. The leading character of the code will be folded to uppercase. Non-alphabetics are not encoded. If no alphabetics are present, the phonix code will be 0.

There appear to be many, many variants of phonix implemented on the web, and I'm too cheap and lazy to go find the original paper by Gadd (1990) that actually describes the original algorithm. Thus, I am taking some big guesses on intent here as I implement. Corrections, especially those involving getting me a copy of the article, are welcome.

Dropping the trailing sound seems to be an integral part of Gadd's technique, but I'm not sure how it is supposed to be done. I am currently compressing runs of vowels, and then dropping the trailing digit or vowel from the code.

Another area of confusion is whether to compress strings of the same code, as in Soundex, or merely strings of the same consonant. I have chosen the former.

trivialPhoneticCode :: String -> StringSource

Map any given word to a constant phonetic code. In other words, suppress phonetic coding.

tryWord :: SpellingWordFilter -> SpellingWordCoder -> String -> [String] -> [String]Source

Core algorithm for spelling suggestion. Takes a prefiltering function, a phonetic coding function, a limit on the number of choices returned, a target word, and a list of candidate words. Returns an ordered list of suggested candidates.