Implementation-level interface for spelling suggestion.
- type SpellingWordFilter = String -> String -> Bool
- type SpellingWordCoder = String -> String
- nearbyWordFilter :: String -> String -> Bool
- anyWordFilter :: String -> String -> Bool
- editDistance :: String -> String -> Int
- soundex :: Bool -> String -> String
- phonix :: String -> String
- trivialPhoneticCode :: String -> String
- tryWord :: SpellingWordFilter -> SpellingWordCoder -> String -> [String] -> [String]
The weighted edit distance between a pair of strings, with weights for insertion, deletion, transposition and substitution chose to try to mimic spelling errors.
Compute a full soundex code; i.e., do not drop any encodable characters from the result. The leading character of the code will be folded to uppercase. Non-alphabetics are not encoded. If no alphabetics are present, the soundex code will be 0.
The two commonly encountered forms of soundex are Simplified and another known as American, Miracode, NARA or Knuth. This code will calculate either---passing True gets NARA, and False gets Simplified.
Compute a full phonix code; i.e., do not drop any encodable characters from the result. The leading character of the code will be folded to uppercase. Non-alphabetics are not encoded. If no alphabetics are present, the phonix code will be 0.
There appear to be many, many variants of phonix implemented on the web, and I'm too cheap and lazy to go find the original paper by Gadd (1990) that actually describes the original algorithm. Thus, I am taking some big guesses on intent here as I implement. Corrections, especially those involving getting me a copy of the article, are welcome.
Dropping the trailing sound seems to be an integral part of Gadd's technique, but I'm not sure how it is supposed to be done. I am currently compressing runs of vowels, and then dropping the trailing digit or vowel from the code.
Another area of confusion is whether to compress strings of the same code, as in Soundex, or merely strings of the same consonant. I have chosen the former.
Map any given word to a constant phonetic code. In other words, suppress phonetic coding.
Core algorithm for spelling suggestion. Takes a prefiltering function, a phonetic coding function, a limit on the number of choices returned, a target word, and a list of candidate words. Returns an ordered list of suggested candidates.