phonetic-code- Phonetic codes: Soundex and Phonix

Safe HaskellSafe-Inferred



Soundex is a phonetic coding algorithm. It transforms word into a similarity hash based on an approximation of its sounds. Thus, similar-sounding words tend to have the same hash.

This implementation is based on a number of sources, including a description of soundex at and in Knuth's "The Art of Computer Programming" 2nd ed v1 pp394-395. A very helpful reference on the details and differences among soundex algorithms is "Soundex: The True Story", accessed 11 September 2008.

This code was originally written for the "thimk" spelling suggestion application in Nickle ( in July 2002 based on a description from which is now The code was ported September 2008; the Soundex variants were also added at this time.



soundex :: Bool -> String -> String Source

Compute a "full" soundex code; i.e., do not drop any encodable characters from the result. The leading character of the code will be folded to uppercase. Non-alphabetics are not encoded. If no alphabetics are present, the soundex code will be "0".

The two commonly encountered forms of soundex are Simplified and another known as American, Miracode, NARA or Knuth. This code will calculate either---passing True gets NARA, and False gets Simplified.

soundexCodes :: Array Char Char Source

Array of soundex codes for single characters. The array maps uppercase letters (only) to a character representing a code in the range ['1'..'7'] or ?. Code '7' is returned as a coding convenience for AmericanMiracodeNARA/Knuth soundex.