Copyright | (c) 2017-present Heikki Johannes Hildén |
---|---|
License | BSD3 |
Maintainer | hildenjohannes@gmail.com |
Stability | experimental |
Portability | GHC |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Synopsis
- data FuzzySet
- type FuzzyMatch = (Double, Text)
- emptySet :: Int -> Int -> Bool -> FuzzySet
- defaultSet :: FuzzySet
- fromList :: [Text] -> FuzzySet
- addToSet :: Text -> FuzzySet -> (Bool, FuzzySet)
- add :: Text -> FuzzySet -> FuzzySet
- addManyToSet :: [Text] -> FuzzySet -> ([Text], FuzzySet)
- addMany :: [Text] -> FuzzySet -> FuzzySet
- (>+<) :: FuzzySet -> Text -> FuzzySet
- findMin :: Double -> Text -> FuzzySet -> [FuzzyMatch]
- findOneMin :: Double -> Text -> FuzzySet -> Maybe FuzzyMatch
- closestMatchMin :: Double -> Text -> FuzzySet -> Maybe Text
- find :: Text -> FuzzySet -> [FuzzyMatch]
- findOne :: Text -> FuzzySet -> Maybe FuzzyMatch
- closestMatch :: Text -> FuzzySet -> Maybe Text
- values :: FuzzySet -> [Text]
- size :: FuzzySet -> Int
- isEmpty :: FuzzySet -> Bool
A note about the simple API
This module exposes a pure, simpler API for working with fuzzy sets.
If you anticipate using the fuzzy search functionality in multiple
places of your application, consider using the default monadic
interface in FuzzySet
.
How to use this module
Make sure the OverloadedStrings
pragma is enabled and import the module:
import Data.FuzzySet.Simple
After that, three steps are typically involved:
- Create a set using one of
defaultSet
,emptySet
, orfromList
. - To add entries, use
add
,addToSet
, oraddMany
. - Query the set with
find
,closestMatch
,findMin
, orclosestMatchMin
.
>>>
closestMatch "percolator" (defaultSet >+< "Jurassic Park" >+< "Terminator" >+< "The Matrix")
Just "Terminator"
>>>
find "Shaggy Jones" (defaultSet >+< "Shaggy Rogers" >+< "Fred Jones" >+< "Daphne Blake" >+< "Velma Dinkley")
[(0.7692307692307693,"Shaggy Rogers"),(0.5,"Fred Jones")]
There are also a few functions to inspect a set: size
, isEmpty
, and values
.
More examples
{-# LANGUAGE OverloadedStrings #-} import Data.FuzzySet.Simple states = [ "Alabama" , "Alaska" , "American Samoa" , "Arizona" , "Arkansas" , "California" , "Colorado" , "Connecticut" , "Delaware" , "District of Columbia" , "Florida" , "Georgia" , "Guam" , "Hawaii" , "Idaho" , "Illinois" , "Indiana" , "Iowa" , "Kansas" , "Kentucky" , "Louisiana" , "Maine" , "Maryland" , "Massachusetts" , "Michigan" , "Minnesota" , "Mississippi" , "Missouri" , "Montana" , "Nebraska" , "Nevada" , "New Hampshire" , "New Jersey" , "New Mexico" , "New York" , "North Carolina" , "North Dakota" , "Northern Marianas Islands" , "Ohio" , "Oklahoma" , "Oregon" , "Pennsylvania" , "Puerto Rico" , "Rhode Island" , "South Carolina" , "South Dakota" , "Tennessee" , "Texas" , "Utah" , "Vermont" , "Virginia" , "Virgin Islands" , "Washington" , "West Virginia" , "Wisconsin" , "Wyoming" ] statesSet = fromList states main = mapM_ print (find "Burger Islands" statesSet)
The output of this program is:
(0.7142857142857143,"Virgin Islands") (0.5714285714285714,"Rhode Island") (0.44,"Northern Marianas Islands") (0.35714285714285715,"Maryland")
Using the definition of statesSet
from previous example:
>>> find "Why-oh-me-ing" statesSet [(0.5384615384615384,"Wyoming")]
>>> find "Connect a cat" statesSet [(0.6923076923076923,"Connecticut")]
>>> find "Transylvania" statesSet [(0.75,"Pennsylvania"),(0.3333333333333333,"California"),(0.3333333333333333,"Arkansas"),(0.3333333333333333,"Kansas")]
>>> find "CanOfSauce" statesSet [(0.4,"Kansas")]
>>> find "Alaska" statesSet [(1.0,"Alaska")]
>>> find "Alaskanbraskansas" statesSet [(0.47058823529411764,"Arkansas"),(0.35294117647058826,"Kansas"),(0.35294117647058826,"Alaska"),(0.35294117647058826,"Alabama"),(0.35294117647058826,"Nebraska")]
Types
Main fuzzy string set data type.
Instances
Show FuzzySet Source # | |
Eq FuzzySet Source # | |
Monad m => MonadState FuzzySet (FuzzySearchT m) Source # | |
Defined in Data.FuzzySet.Monad get :: FuzzySearchT m FuzzySet # put :: FuzzySet -> FuzzySearchT m () # state :: (FuzzySet -> (a, FuzzySet)) -> FuzzySearchT m a # | |
MonadFuzzySearch m => MonadFuzzySearch (StateT FuzzySet m) Source # | |
type FuzzyMatch = (Double, Text) Source #
An individual result when looking up a string in the set, consisting of
- a similarity score in the range \([0, 1]\), and
- the matching string.
Initialization
:: Int | Lower bound on gram sizes to use (inclusive) |
-> Int | Upper bound on gram sizes to use (inclusive) |
-> Bool | Whether or not to use the Levenshtein distance to determine the score |
-> FuzzySet | An empty fuzzy string set |
Initialize an empty FuzzySet
.
defaultSet :: FuzzySet Source #
An empty FuzzySet
with the following defaults:
- Gram size lower:
2
- Gram size upper:
3
- Use Levenshtein distance:
True
Create a new set from a list of entries, using the default settings.
Insertion
:: Text | The new entry |
-> FuzzySet | Fuzzy string set to add the entry to |
-> (Bool, FuzzySet) | A flag to indicate if the value was added (i.e., did not already exist in the set), and the updated set. |
Add a string to the set, unless it is already present. A pair is returned consisting of a boolean which denotes whether or not anything was inserted, and the updated set.
Add a string to the set, or do nothing if a key that matches the string already exists.
:: [Text] | A list of strings to add to the set |
-> FuzzySet | The set to add the strings to |
-> ([Text], FuzzySet) | A pair where the first component is a list of all values that were inserted, and the second is the updated set. |
Add a list of strings to the set, all at once.
Unless you need to know the subset of values that were actually inserted,
use addMany
instead.
:: [Text] | A list of strings to add to the set |
-> FuzzySet | The set to add the strings to |
-> FuzzySet | The updated set |
Add a list of strings to the set, all at once.
This function is identical to addManyToSet
, except that it only returns
the set itself. If you need to know what values were inserted, then use the
latter instead.
Infix operator to add entries to a FuzzySet
, defined as flip add
.
Lookup
:: Double | A minimum score |
-> Text | The string to search for |
-> FuzzySet | The fuzzy string set to compare the string against |
-> [FuzzyMatch] | A list of results (score and matched value) |
Try to match a string against the entries in the set, and return a list of all results with a score greater than or equal to the specified minimum score (i.e., the first argument). The results are ordered by similarity, with the closest match first.
:: Double | A minimum score |
-> Text | The string to search for |
-> FuzzySet | The fuzzy string set to compare the string against |
-> Maybe FuzzyMatch | The closest match, if one is found |
Try to match the given string against the entries in the set using the specified minimum score and return the closest match, if one is found.
:: Double | A minimum score |
-> Text | The string to search for |
-> FuzzySet | The fuzzy string set to compare the string against |
-> Maybe Text | The string most closely matching the input, if a match is found |
Try to match the given string against the entries in the set using the specified minimum score and return the string that most closely matches the input, if a match is found.
:: Text | The string to search for |
-> FuzzySet | The fuzzy string set to compare the string against |
-> [FuzzyMatch] | A list of results (score and matched value) |
Try to match the given string against the entries in the set, using a
minimum score of 0.33. Return a list of results ordered by similarity
score, with the closest match first. Use findMin
if you need to specify
a custom threshold value.
:: Text | The string to search for |
-> FuzzySet | The fuzzy string set to compare the string against |
-> Maybe FuzzyMatch | The closest match, if one is found |
Try to match the given string against the entries in the set, and return
the closest match, if one is found. A minimum score of 0.33 is used. To
specify a custom threshold value, instead use findOneMin
.
:: Text | The string to search for |
-> FuzzySet | The fuzzy string set to compare the string against |
-> Maybe Text | The string most closely matching the input, if a match is found |
Try to match the given string against the entries in the set, and return
the string that most closely matches the input, if a match is found. A
minimum score of 0.33 is used. To specify a custom threshold value,
instead use closestMatchMin
.
Inspection
values :: FuzzySet -> [Text] Source #
Return the elements of the set. No particular order is guaranteed.
>>>
values (fromList ["bass", "craze", "space", "lace", "daze", "haze", "ace", "maze"])
["space","daze","bass","maze","ace","craze","lace","haze"]