fuzzyset-0.1.0.5: Fuzzy set for approximate string matching

Safe HaskellNone
LanguageHaskell2010

Data.FuzzySet.Internal

Synopsis

Documentation

getMatch :: GetContext -> Size -> [(Double, Text)] Source

results :: GetContext -> Size -> [(Double, Text)] Source

matches :: FuzzySet -> HashMap Text Int -> HashMap Int Int Source

gramMap Source

Arguments

:: Text

An input string

-> Size

The gram size n, which must be at least 2

-> HashMap Text Int

A mapping from n-gram keys to the number of occurrences of the key in the list returned by grams (i.e., the list of all n-length substrings of the input enclosed in hyphens).

Normalize the input string, call grams on the normalized input, and then translate the result to a HashMap with the n-grams as keys and Int values corresponding to the number of occurences of the key in the generated gram list.

>>> gramMap "xxxx" 2
fromList [("-x",1), ("xx",3), ("x-",1)]
>>> Data.HashMap.Strict.lookup "nts" (gramMap "intrent'srestaurantsomeoftrent'saunt'santswantsamtorentsomepants" 3)
Just 8

grams Source

Arguments

:: Text

An input string

-> Size

The variable n, which must be at least 2

-> [Text]

A k-length list of grams of size n, with (k = s − n + 3)

Break apart the normalized input string into a list of n-grams. For instance, the string "Destroido Corp." is first normalized into the form "destroido corp", and then enclosed in hyphens, so that it becomes "-destroido corp-". The 3-grams generated from this normalized string are

"-de", "des", "est", "str", "tro", "roi", "oid", "ido", "do ", "o c", " co", "cor", "orp", "rp-"

Given a normalized string of length s, we take all substrings of length n, letting the offset range from (0 text{ to } s + 2 − n). The number of n-grams for a normalized string of length s is thus (s + 2 − n + 1 = s − n + 3), where (0 < n < s − 2).