phonetic-code-0.1: Phonetic codes: Soundex and Phonix



Soundex is a phonetic coding algorithm. It transforms word into a similarity hash based on an approximation of its sounds. Thus, similar-sounding words tend to have the same hash.

This implementation is based on a number of sources, including a description of soundex at http:wikipedia.orgwikiSoundex and in Knuth's The Art of Computer Programming 2nd ed v1 pp394-395. A very helpful reference on the details and differences among soundex algorithms is Soundex: The True Story, accessed 11 September 2008.

This code was originally written for the thimk spelling suggestion application in Nickle ( in July 2002 based on a description from http:www.geocities.comHeartlandHills3916soundex.html which is now The code was ported September 2008; the Soundex variants were also added at this time.



soundex :: Bool -> String -> StringSource

Compute a full soundex code; i.e., do not drop any encodable characters from the result. The leading character of the code will be folded to uppercase. Non-alphabetics are not encoded. If no alphabetics are present, the soundex code will be 0.

The two commonly encountered forms of soundex are Simplified and another known as American, Miracode, NARA or Knuth. This code will calculate either---passing True gets NARA, and False gets Simplified.

soundexCodes :: Array Char CharSource

Array of soundex codes for single characters. The array maps uppercase letters (only) to a character representing a code in the range ['1'..'7'] or ?. Code '7' is returned as a coding convenience for AmericanMiracodeNARA/Knuth soundex.