| Copyright | (c) 2017-present Heikki Johannes Hildén |
|---|---|
| License | BSD3 |
| Maintainer | hildenjohannes@gmail.com |
| Stability | experimental |
| Portability | GHC |
| Safe Haskell | Safe-Inferred |
| Language | Haskell2010 |
Data.FuzzySet.Simple
Description
Synopsis
- data FuzzySet
- type FuzzyMatch = (Double, Text)
- emptySet :: Int -> Int -> Bool -> FuzzySet
- defaultSet :: FuzzySet
- fromList :: [Text] -> FuzzySet
- addToSet :: Text -> FuzzySet -> (Bool, FuzzySet)
- add :: Text -> FuzzySet -> FuzzySet
- addManyToSet :: [Text] -> FuzzySet -> ([Text], FuzzySet)
- addMany :: [Text] -> FuzzySet -> FuzzySet
- (>+<) :: FuzzySet -> Text -> FuzzySet
- findMin :: Double -> Text -> FuzzySet -> [FuzzyMatch]
- findOneMin :: Double -> Text -> FuzzySet -> Maybe FuzzyMatch
- closestMatchMin :: Double -> Text -> FuzzySet -> Maybe Text
- find :: Text -> FuzzySet -> [FuzzyMatch]
- findOne :: Text -> FuzzySet -> Maybe FuzzyMatch
- closestMatch :: Text -> FuzzySet -> Maybe Text
- values :: FuzzySet -> [Text]
- size :: FuzzySet -> Int
- isEmpty :: FuzzySet -> Bool
A note about the simple API
This module exposes a pure, simpler API for working with fuzzy sets.
If you anticipate using the fuzzy search functionality in multiple
places of your application, consider using the default monadic
interface in FuzzySet.
How to use this module
Make sure the OverloadedStrings pragma is enabled and import the module:
import Data.FuzzySet.Simple
After that, three steps are typically involved:
- Create a set using one of
defaultSet,emptySet, orfromList. - To add entries, use
add,addToSet, oraddMany. - Query the set with
find,closestMatch,findMin, orclosestMatchMin.
>>>closestMatch "percolator" (defaultSet >+< "Jurassic Park" >+< "Terminator" >+< "The Matrix")Just "Terminator"
>>>find "Shaggy Jones" (defaultSet >+< "Shaggy Rogers" >+< "Fred Jones" >+< "Daphne Blake" >+< "Velma Dinkley")[(0.7692307692307693,"Shaggy Rogers"),(0.5,"Fred Jones")]
There are also a few functions to inspect a set: size, isEmpty, and values.
More examples
{-# LANGUAGE OverloadedStrings #-}
import Data.FuzzySet.Simple
states = [ "Alabama" , "Alaska" , "American Samoa" , "Arizona" , "Arkansas"
, "California" , "Colorado" , "Connecticut" , "Delaware" , "District of Columbia"
, "Florida" , "Georgia" , "Guam" , "Hawaii" , "Idaho"
, "Illinois" , "Indiana" , "Iowa" , "Kansas" , "Kentucky"
, "Louisiana" , "Maine" , "Maryland" , "Massachusetts" , "Michigan"
, "Minnesota" , "Mississippi" , "Missouri" , "Montana" , "Nebraska"
, "Nevada" , "New Hampshire" , "New Jersey" , "New Mexico" , "New York"
, "North Carolina" , "North Dakota" , "Northern Marianas Islands" , "Ohio" , "Oklahoma"
, "Oregon" , "Pennsylvania" , "Puerto Rico" , "Rhode Island" , "South Carolina"
, "South Dakota" , "Tennessee" , "Texas" , "Utah" , "Vermont"
, "Virginia" , "Virgin Islands" , "Washington" , "West Virginia" , "Wisconsin"
, "Wyoming" ]
statesSet = fromList states
main = mapM_ print (find "Burger Islands" statesSet)The output of this program is:
(0.7142857142857143,"Virgin Islands") (0.5714285714285714,"Rhode Island") (0.44,"Northern Marianas Islands") (0.35714285714285715,"Maryland")
Using the definition of statesSet from previous example:
>>> find "Why-oh-me-ing" statesSet [(0.5384615384615384,"Wyoming")]
>>> find "Connect a cat" statesSet [(0.6923076923076923,"Connecticut")]
>>> find "Transylvania" statesSet [(0.75,"Pennsylvania"),(0.3333333333333333,"California"),(0.3333333333333333,"Arkansas"),(0.3333333333333333,"Kansas")]
>>> find "CanOfSauce" statesSet [(0.4,"Kansas")]
>>> find "Alaska" statesSet [(1.0,"Alaska")]
>>> find "Alaskanbraskansas" statesSet [(0.47058823529411764,"Arkansas"),(0.35294117647058826,"Kansas"),(0.35294117647058826,"Alaska"),(0.35294117647058826,"Alabama"),(0.35294117647058826,"Nebraska")]
Types
Main fuzzy string set data type.
Instances
| Show FuzzySet Source # | |
| Eq FuzzySet Source # | |
| Monad m => MonadState FuzzySet (FuzzySearchT m) Source # | |
Defined in Data.FuzzySet.Monad Methods get :: FuzzySearchT m FuzzySet # put :: FuzzySet -> FuzzySearchT m () # state :: (FuzzySet -> (a, FuzzySet)) -> FuzzySearchT m a # | |
| MonadFuzzySearch m => MonadFuzzySearch (StateT FuzzySet m) Source # | |
type FuzzyMatch = (Double, Text) Source #
An individual result when looking up a string in the set, consisting of
- a similarity score in the range \([0, 1]\), and
- the matching string.
Initialization
Arguments
| :: Int | Lower bound on gram sizes to use (inclusive) |
| -> Int | Upper bound on gram sizes to use (inclusive) |
| -> Bool | Whether or not to use the Levenshtein distance to determine the score |
| -> FuzzySet | An empty fuzzy string set |
Initialize an empty FuzzySet.
defaultSet :: FuzzySet Source #
An empty FuzzySet with the following defaults:
- Gram size lower:
2 - Gram size upper:
3 - Use Levenshtein distance:
True
Create a new set from a list of entries, using the default settings.
Insertion
Arguments
| :: Text | The new entry |
| -> FuzzySet | Fuzzy string set to add the entry to |
| -> (Bool, FuzzySet) | A flag to indicate if the value was added (i.e., did not already exist in the set), and the updated set. |
Add a string to the set, unless it is already present. A pair is returned consisting of a boolean which denotes whether or not anything was inserted, and the updated set.
Add a string to the set, or do nothing if a key that matches the string already exists.
Arguments
| :: [Text] | A list of strings to add to the set |
| -> FuzzySet | The set to add the strings to |
| -> ([Text], FuzzySet) | A pair where the first component is a list of all values that were inserted, and the second is the updated set. |
Add a list of strings to the set, all at once.
Unless you need to know the subset of values that were actually inserted,
use addMany instead.
Arguments
| :: [Text] | A list of strings to add to the set |
| -> FuzzySet | The set to add the strings to |
| -> FuzzySet | The updated set |
Add a list of strings to the set, all at once.
This function is identical to addManyToSet, except that it only returns
the set itself. If you need to know what values were inserted, then use the
latter instead.
Infix operator to add entries to a FuzzySet, defined as flip add.
Lookup
Arguments
| :: Double | A minimum score |
| -> Text | The string to search for |
| -> FuzzySet | The fuzzy string set to compare the string against |
| -> [FuzzyMatch] | A list of results (score and matched value) |
Try to match a string against the entries in the set, and return a list of all results with a score greater than or equal to the specified minimum score (i.e., the first argument). The results are ordered by similarity, with the closest match first.
Arguments
| :: Double | A minimum score |
| -> Text | The string to search for |
| -> FuzzySet | The fuzzy string set to compare the string against |
| -> Maybe FuzzyMatch | The closest match, if one is found |
Try to match the given string against the entries in the set using the specified minimum score and return the closest match, if one is found.
Arguments
| :: Double | A minimum score |
| -> Text | The string to search for |
| -> FuzzySet | The fuzzy string set to compare the string against |
| -> Maybe Text | The string most closely matching the input, if a match is found |
Try to match the given string against the entries in the set using the specified minimum score and return the string that most closely matches the input, if a match is found.
Arguments
| :: Text | The string to search for |
| -> FuzzySet | The fuzzy string set to compare the string against |
| -> [FuzzyMatch] | A list of results (score and matched value) |
Try to match the given string against the entries in the set, using a
minimum score of 0.33. Return a list of results ordered by similarity
score, with the closest match first. Use findMin if you need to specify
a custom threshold value.
Arguments
| :: Text | The string to search for |
| -> FuzzySet | The fuzzy string set to compare the string against |
| -> Maybe FuzzyMatch | The closest match, if one is found |
Try to match the given string against the entries in the set, and return
the closest match, if one is found. A minimum score of 0.33 is used. To
specify a custom threshold value, instead use findOneMin.
Arguments
| :: Text | The string to search for |
| -> FuzzySet | The fuzzy string set to compare the string against |
| -> Maybe Text | The string most closely matching the input, if a match is found |
Try to match the given string against the entries in the set, and return
the string that most closely matches the input, if a match is found. A
minimum score of 0.33 is used. To specify a custom threshold value,
instead use closestMatchMin.
Inspection
values :: FuzzySet -> [Text] Source #
Return the elements of the set. No particular order is guaranteed.
>>>values (fromList ["bass", "craze", "space", "lace", "daze", "haze", "ace", "maze"])["space","daze","bass","maze","ace","craze","lace","haze"]