Safe Haskell | None |
---|---|
Language | Haskell2010 |
Documentation
getMatch :: GetContext -> Size -> [(Double, Text)] Source
results :: GetContext -> Size -> [(Double, Text)] Source
:: Text | An input string |
-> Size | The gram size n, which must be at least 2 |
-> HashMap Text Int | A mapping from n-gram keys to the number of occurrences of the
key in the list returned by |
Normalize the input string, call grams
on the normalized input, and then
translate the result to a HashMap
with the n-grams as keys and Int
values corresponding to the number of occurences of the key in the
generated gram list.
>>>
gramMap "xxxx" 2
fromList [("-x",1), ("xx",3), ("x-",1)]
>>>
Data.HashMap.Strict.lookup "nts" (gramMap "intrent'srestaurantsomeoftrent'saunt'santswantsamtorentsomepants" 3)
Just 8
:: Text | An input string |
-> Size | The variable n, which must be at least 2 |
-> [Text] | A k-length list of grams of size n, with (k = s − n + 3) |
Break apart the normalized input string into a list of n-grams. For instance, the string "Destroido Corp." is first normalized into the form "destroido corp", and then enclosed in hyphens, so that it becomes "-destroido corp-". The 3-grams generated from this normalized string are
"-de", "des", "est", "str", "tro", "roi", "oid", "ido", "do ", "o c", " co", "cor", "orp", "rp-"
Given a normalized string of length s, we take all substrings of length n, letting the offset range from (0 text{ to } s + 2 − n). The number of n-grams for a normalized string of length s is thus (s + 2 − n + 1 = s − n + 3), where (0 < n < s − 2).