-- Hoogle documentation, generated by Haddock
-- See Hoogle, http://www.haskell.org/hoogle/
-- | Phonetic codes: Soundex and Phonix
--
@package phonetic-code
@version 0.1.1.1
-- | Phonix codes (Gadd 1990) augment slightly improved Soundex codes with
-- a preprocessing step for cleaning up certain n-grams. Since the
-- preprocessing step contains around 90 rules processed by a slow
-- custom-written scanner, this implementation is not too fast.
--
-- This code was based on a number of sources, including the CPAN Phonix
-- code calculator `Text::Phonetic::Phonix.pm`. Because the paper
-- describing the codes is not freely available and I'm lazy, I did not
-- use it as a reference. Also because Phonix involves around 90
-- substitution rules, I transformed the Perl ones, which was easier than
-- generating them from scratch.
module Text.PhoneticCode.Phonix
-- | Compute a "full" phonix code; i.e., do not drop any encodable
-- characters from the result. The leading character of the code will be
-- folded to uppercase. Non-alphabetics are not encoded. If no
-- alphabetics are present, the phonix code will be "0".
--
-- There appear to be many, many variants of phonix implemented on the
-- web, and I'm too cheap and lazy to go find the original paper by Gadd
-- (1990) that actually describes the original algorithm. Thus, I am
-- taking some big guesses on intent here as I implement. Corrections,
-- especially those involving getting me a copy of the article, are
-- welcome.
--
-- Dropping the "trailing sound" seems to be an integral part of Gadd's
-- technique, but I'm not sure how it is supposed to be done. I am
-- currently compressing runs of vowels, and then dropping the trailing
-- digit or vowel from the code.
--
-- Another area of confusion is whether to compress strings of the same
-- code, as in Soundex, or merely strings of the same consonant. I have
-- chosen the former.
phonix :: String -> String
-- | Array of phonix codes for single characters. The array maps uppercase
-- letters (only) to a character representing a code in the range
-- ['1'..'8'] or ?.
phonixCodes :: Array Char Char
-- | Substitution rules for Phonix canonicalization. "^" ("$") is used to
-- anchor a pattern to the beginning (end) of the word. "c" ("v", ".") at
-- the beginning or end of a pattern match a consonant (vowel, arbitrary
-- character). A character matched in this fashion is automatically
-- tacked onto the beginning (end) of the pattern.
phonixRules :: [(String, String)]
-- | List of pattern/substitution pairs built from the phonixRules.
phonixRulesPatSubsts :: [(String, String)]
-- | Apply each of the Phonix preprocessing rules in turn to the target
-- word returning the resulting accumulated substitution.
applyPhonixRules :: String -> String
-- | Soundex is a phonetic coding algorithm. It transforms word into a
-- similarity hash based on an approximation of its sounds. Thus,
-- similar-sounding words tend to have the same hash.
--
-- This implementation is based on a number of sources, including a
-- description of soundex at http://wikipedia.org/wiki/Soundex and
-- in Knuth's "The Art of Computer Programming" 2nd ed v1 pp394-395. A
-- very helpful reference on the details and differences among soundex
-- algorithms is "Soundex: The True Story",
-- http://west-penwith.org.uk/misc/soundex.htm accessed 11
-- September 2008.
--
-- This code was originally written for the "thimk" spelling suggestion
-- application in Nickle (http://nickle.org) in July 2002 based on a
-- description from
-- http://www.geocities.com/Heartland/Hills/3916/soundex.html
-- which is now http://www.searchforancestors.com/soundex.html The
-- code was ported September 2008; the Soundex variants were also added
-- at this time.
module Text.PhoneticCode.Soundex
-- | Compute a "full" soundex code; i.e., do not drop any encodable
-- characters from the result. The leading character of the code will be
-- folded to uppercase. Non-alphabetics are not encoded. If no
-- alphabetics are present, the soundex code will be "0".
--
-- The two commonly encountered forms of soundex are Simplified and
-- another known as American, Miracode, NARA or Knuth. This code will
-- calculate either---passing True gets NARA, and False gets Simplified.
soundex :: Bool -> String -> String
soundexSimple :: String -> String
soundexNARA :: String -> String
-- | Array of soundex codes for single characters. The array maps uppercase
-- letters (only) to a character representing a code in the range
-- ['1'..'7'] or ?. Code '7' is returned as a coding convenience
-- for AmericanMiracodeNARA/Knuth soundex.
soundexCodes :: Array Char Char