-- Hoogle documentation, generated by Haddock
-- See Hoogle, http://www.haskell.org/hoogle/
-- | Phonetic codes: Soundex and Phonix
--
-- This package implements the phonetic coding algorithms Soundex
-- and Phonix. A phonetic coding algorithm transforms a word into a
-- similarity hash based on an approximation of its sounds. Thus,
-- similar-sounding words tend to have the same hash.
@package phonetic-code
@version 0.1
-- | Phonix codes (Gadd 1990) augment slightly improved Soundex codes with
-- a preprocessing step for cleaning up certain n-grams. Since the
-- preprocessing step contains more than 150 rules processed by a slow
-- custom-written scanner, this implementation is not too fast.
--
-- This code was based on a number of sources, including the CPAN Phonix
-- code calculator Text::Phonetic::Phonix.pm . Because the paper
-- describing the codes is not freely available and I'm lazy, I did not
-- use it as a reference. Also because Phonix involves over 150
-- substitution rules, I transformed the Perl ones, which was easier than
-- generating them from scratch.
module Text.PhoneticCode.Phonix
-- | Compute a full phonix code; i.e., do not drop any encodable
-- characters from the result. The leading character of the code will be
-- folded to uppercase. Non-alphabetics are not encoded. If no
-- alphabetics are present, the phonix code will be 0.
--
-- There appear to be many, many variants of phonix implemented on the
-- web, and I'm too cheap and lazy to go find the original paper by Gadd
-- (1990) that actually describes the original algorithm. Thus, I am
-- taking some big guesses on intent here as I implement. Corrections,
-- especially those involving getting me a copy of the article, are
-- welcome.
--
-- Dropping the trailing sound seems to be an integral part of
-- Gadd's technique, but I'm not sure how it is supposed to be done. I am
-- currently compressing runs of vowels, and then dropping the trailing
-- digit or vowel from the code.
--
-- Another area of confusion is whether to compress strings of the same
-- code, as in Soundex, or merely strings of the same consonant. I have
-- chosen the former.
phonix :: String -> String
-- | Array of phonix codes for single characters. The array maps uppercase
-- letters (only) to a character representing a code in the range
-- ['1'..'8'] or ?.
phonixCodes :: Array Char Char
-- | Substitution rules for Phonix canonicalization. ^ ($) is
-- used to anchor a pattern to the beginning (end) of the word. c
-- (v, .) at the beginning or end of a pattern match a
-- consonant (vowel, arbitrary character). A character matched in this
-- fashion is automatically tacked onto the beginning (end) of the
-- pattern.
phonixRules :: [(String, String)]
-- | Soundex is a phonetic coding algorithm. It transforms word into a
-- similarity hash based on an approximation of its sounds. Thus,
-- similar-sounding words tend to have the same hash.
--
-- This implementation is based on a number of sources, including a
-- description of soundex at http:wikipedia.orgwikiSoundex
-- and in Knuth's The Art of Computer Programming 2nd ed v1
-- pp394-395. A very helpful reference on the details and differences
-- among soundex algorithms is Soundex: The True Story,
-- http:west-penwith.org.ukmiscsoundex.htm accessed 11
-- September 2008.
--
-- This code was originally written for the thimk spelling
-- suggestion application in Nickle (http:nickle.org) in July 2002
-- based on a description from
-- http:www.geocities.comHeartlandHills3916soundex.html
-- which is now http:www.searchforancestors.com/soundex.html The
-- code was ported September 2008; the Soundex variants were also added
-- at this time.
module Text.PhoneticCode.Soundex
-- | Compute a full soundex code; i.e., do not drop any encodable
-- characters from the result. The leading character of the code will be
-- folded to uppercase. Non-alphabetics are not encoded. If no
-- alphabetics are present, the soundex code will be 0.
--
-- The two commonly encountered forms of soundex are Simplified and
-- another known as American, Miracode, NARA or Knuth. This code will
-- calculate either---passing True gets NARA, and False gets Simplified.
soundex :: Bool -> String -> String
soundexSimple :: String -> String
soundexNARA :: String -> String
-- | Array of soundex codes for single characters. The array maps uppercase
-- letters (only) to a character representing a code in the range
-- ['1'..'7'] or ?. Code '7' is returned as a coding convenience
-- for AmericanMiracodeNARA/Knuth soundex.
soundexCodes :: Array Char Char