Copyright | (c) 2017-present Heikki Johannes Hildén |
---|---|
License | BSD3 |
Maintainer | hildenjohannes@gmail.com |
Stability | experimental |
Portability | GHC |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Synopsis
- type FuzzySearch = FuzzySearchT Identity
- class MonadState FuzzySet m => MonadFuzzySearch m
- runFuzzySearch :: FuzzySearch a -> Int -> Int -> Bool -> a
- runDefaultFuzzySearch :: FuzzySearch a -> a
- data FuzzySearchT m a
- runFuzzySearchT :: Monad m => FuzzySearchT m a -> Int -> Int -> Bool -> m a
- runDefaultFuzzySearchT :: Monad m => FuzzySearchT m a -> m a
- add :: MonadFuzzySearch m => Text -> m Bool
- add_ :: MonadFuzzySearch m => Text -> m ()
- addMany :: MonadFuzzySearch m => [Text] -> m [Text]
- addMany_ :: MonadFuzzySearch m => [Text] -> m ()
- find :: MonadFuzzySearch m => Text -> m [FuzzyMatch]
- findMin :: MonadFuzzySearch m => Double -> Text -> m [FuzzyMatch]
- findOne :: MonadFuzzySearch m => Text -> m (Maybe FuzzyMatch)
- findOneMin :: MonadFuzzySearch m => Double -> Text -> m (Maybe FuzzyMatch)
- closestMatchMin :: MonadFuzzySearch m => Double -> Text -> m (Maybe Text)
- closestMatch :: MonadFuzzySearch m => Text -> m (Maybe Text)
- values :: MonadFuzzySearch m => m [Text]
- size :: MonadFuzzySearch m => m Int
- isEmpty :: MonadFuzzySearch m => m Bool
Getting started
This library provides two similar, but independent APIs. The Data.FuzzySet.Simple
module offers a simpler (pure) interface for working with the FuzzySet
data
structure directly (similar to earlier versions of the library). A
disadvantage of this approach is that it scales poorly when the code involves
IO, and possibly other effects. For most real-world use cases, it is
therefore recommended to use the default API and the FuzzySearch
monad
exposed by Data.FuzzySet (see below for more examples).
findPlanet :: (MonadIO m, MonadFuzzySearch m) => Text -> m () findPlanet planetName = do addMany_ [ "Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune" ] findOne planetName >>= liftIO . print
>>>
runDefaultFuzzySearchT (findPlanet "Joopiter")
Just (0.75,"Jupiter")
Note that all strings are represented as text
values. Examples on this page use the OverloadedStrings
language extension
to allow string literals to be translated into this form.
Import the main module:
import Data.FuzzySet
Fuzzy search involves three types of operations:
- Insertion: For adding entries to the set, see
add
,add_
,addMany
, andaddMany_
. - Lookup: To match a string against all values of the set, use
find
,findMin
,findOne
,findOneMin
,closestMatchMin
, andclosestMatch
. - Inspection: The function
values
returns all strings currently in the set.size
andisEmpty
are mostly self-explanatory.
Finally, use runFuzzySearch
, runDefaultFuzzySearch
, runFuzzySearchT
, or runDefaultFuzzySearchT
to get the result of the computation from the monad.
Simple search example
The following is a simple program to serve as a 'Hello World' example:
{-# LANGUAGE OverloadedStrings #-} module Main where import Data.Text (Text) import Data.FuzzySet (FuzzySearch, add_, closestMatch, runDefaultFuzzySearch) findMovie :: Text -> FuzzySearch (Maybe Text) findMovie title = do add_ "Jurassic Park" add_ "Terminator" add_ "The Matrix" closestMatch title main :: IO () main = do let result = runDefaultFuzzySearch (findMovie "The Percolator") print result
The output of this program is:
Just "Terminator"
Adding IO
Changing the previous example to instead use the FuzzySearchT
transformer,
we can combine the search monad with IO and other effects.
{-# LANGUAGE OverloadedStrings #-} module Main where import Control.Monad.Trans.Class (lift) import Data.Text (Text) import Data.FuzzySet (FuzzySearchT, add_, closestMatch, runDefaultFuzzySearchT) findMovie :: Text -> FuzzySearchT IO (Maybe Text) findMovie = closestMatch prog :: FuzzySearchT IO () prog = do add_ "Jurassic Park" add_ "Terminator" add_ "The Matrix" result <- findMovie "The Percolator" lift (print result) main :: IO () main = runDefaultFuzzySearchT prog
To make the search more restrictive, we can set a custom min score:
findMovie :: Text -> FuzzySearchT IO (Maybe Text) findMovie = closestMatchMin 0.8
The output is now:
Nothing
Another example: Favorite fruit
{-# LANGUAGE OverloadedStrings #-} module Main where import Control.Monad (when) import Control.Monad.IO.Class (liftIO) import Data.FuzzySet import Data.Text (Text, pack, unpack) import qualified Data.Text as Text repl :: FuzzySearchT IO () repl = do str <- liftIO $ do putStrLn "Enter your favorite fruit below, or type \".exit\"." putStr "> " getLine when (str /= ".exit") $ do result <- findOneMin 0.6 (pack str) liftIO $ case result of Nothing -> putStrLn "I don't know that fruit." Just (1, match) -> putStrLn ("You like " <> unpack (Text.toLower match) <> ". Me too!") Just (_, match) -> putStrLn ("Did you mean \"" <> unpack match <> "\"?") repl main :: IO () main = runDefaultFuzzySearchT $ do addMany_ fruits repl fruits :: [Text] fruits = [ "Apple", "Apricot", "Avocado", "Banana", "Bilberry", "Blackberry", "Blackcurrant", "Blueberry", "Boysenberry", "Currant", "Cherry", "Cherimoya", "Chico fruit", "Cloudberry", "Coconut", "Cranberry", "Cucumber", "Custard apple", "Damson", "Date", "Dragonfruit", "Durian", "Elderberry", "Feijoa", "Fig", "Goji berry", "Gooseberry", "Grape", "Raisin", "Grapefruit", "Guava", "Honeyberry", "Huckleberry", "Jabuticaba", "Jackfruit", "Jambul", "Jujube", "Juniper berry", "Kiwano", "Kiwifruit", "Kumquat", "Lemon", "Lime", "Loquat", "Longan", "Lychee", "Mango", "Mangosteen", "Marionberry", "Melon", "Cantaloupe", "Honeydew", "Watermelon", "Miracle fruit", "Mulberry", "Nectarine", "Nance", "Olive", "Orange", "Blood orange", "Clementine", "Mandarine", "Tangerine", "Papaya", "Passionfruit", "Peach", "Pear", "Persimmon", "Physalis", "Plantain", "Plum", "Prune", "Pineapple", "Plumcot", "Pomegranate", "Pomelo", "Purple mangosteen", "Quince", "Raspberry", "Salmonberry", "Rambutan", "Redcurrant", "Salal berry", "Salak", "Satsuma", "Soursop", "Star fruit", "Solanum quitoense", "Strawberry", "Tamarillo", "Tamarind", "Ugli fruit", "Yuzu" ]
FuzzySearch monad
type FuzzySearch = FuzzySearchT Identity Source #
FuzzySearch monad
class MonadState FuzzySet m => MonadFuzzySearch m Source #
Instances
Monad m => MonadFuzzySearch (FuzzySearchT m) Source # | |
Defined in Data.FuzzySet.Monad add :: Text -> FuzzySearchT m Bool Source # findMin :: Double -> Text -> FuzzySearchT m [FuzzyMatch] Source # | |
MonadFuzzySearch m => MonadFuzzySearch (MaybeT m) Source # | |
MonadFuzzySearch m => MonadFuzzySearch (ExceptT e m) Source # | |
MonadFuzzySearch m => MonadFuzzySearch (ReaderT r m) Source # | |
(MonadFuzzySearch m, MonadState FuzzySet (SelectT s m)) => MonadFuzzySearch (SelectT s m) Source # | |
MonadFuzzySearch m => MonadFuzzySearch (StateT FuzzySet m) Source # | |
(MonadFuzzySearch m, Monoid w) => MonadFuzzySearch (WriterT w m) Source # | |
MonadFuzzySearch m => MonadFuzzySearch (ContT r m) Source # | |
:: FuzzySearch a | |
-> Int | Lower bound on gram sizes to use (inclusive) |
-> Int | Upper bound on gram sizes to use (inclusive) |
-> Bool | Whether or not to use the Levenshtein distance to determine the score |
-> a | The result of running the computation |
Evaluate a FuzzySearch
computation with the given options.
runDefaultFuzzySearch :: FuzzySearch a -> a Source #
Evaluate a FuzzySearch
computation with the following defaults:
- Gram size lower:
2
- Gram size upper:
3
- Use Levenshtein distance:
True
FuzzySearch monad transformer
data FuzzySearchT m a Source #
FuzzySearch monad transformer
Instances
:: Monad m | |
=> FuzzySearchT m a | |
-> Int | Lower bound on gram sizes to use (inclusive) |
-> Int | Upper bound on gram sizes to use (inclusive) |
-> Bool | Whether or not to use the Levenshtein distance to determine the score |
-> m a | The result of running the computation in the inner monad |
Evaluate a FuzzySearchT
computation with the given options.
runDefaultFuzzySearchT :: Monad m => FuzzySearchT m a -> m a Source #
Evaluate a FuzzySearchT
computation with the following defaults:
- Gram size lower:
2
- Gram size upper:
3
- Use Levenshtein distance:
True
Insertion
:: MonadFuzzySearch m | |
=> Text | The new entry |
-> m Bool | A flag to indicate whether the value was added (i.e., did not already exist in the set) | Try to match a string against the entries in the set, and return a list of all results with a score greater than or equal to the specified minimum score (i.e., the first argument). The results are ordered by similarity, with the closest match first. |
Add a string to the set. A boolean is returned which is True
if the
string was inserted, or False
if it already existed in the set.
add_ :: MonadFuzzySearch m => Text -> m () Source #
Add a string to the set, or do nothing if a key that matches the string already exists.
This function is identical to add
, except that the latter returns a
boolean to indicate whether any new value was added.
:: MonadFuzzySearch m | |
=> [Text] | A list of strings to add to the set |
-> m [Text] | A list of values that were inserted |
Add a list of strings to the set, all at once.
Unless you need to know the subset of values that were actually inserted,
use addMany_
instead.
:: MonadFuzzySearch m | |
=> [Text] | A list of strings to add to the set |
-> m () |
Add a list of strings to the set, all at once.
This function is identical to addMany
, except that the latter returns a
list of all values that were inserted.
Lookup
:: MonadFuzzySearch m | |
=> Text | The string to search for |
-> m [FuzzyMatch] | A list of results (score and matched value) |
Try to match the given string against the entries in the set, using a
minimum score of 0.33. Return a list of results ordered by similarity
score, with the closest match first. Use findMin
if you need to specify
a custom threshold value.
:: MonadFuzzySearch m | |
=> Double | A minimum score |
-> Text | The string to search for |
-> m [FuzzyMatch] | A list of results (score and matched value) |
:: MonadFuzzySearch m | |
=> Text | The string to search for |
-> m (Maybe FuzzyMatch) | The closest match, if one is found |
Try to match the given string against the entries in the set, and return
the closest match, if one is found. A minimum score of 0.33 is used. To
specify a custom threshold value, instead use findOneMin
.
:: MonadFuzzySearch m | |
=> Double | A minimum score |
-> Text | The string to search for |
-> m (Maybe FuzzyMatch) | The closest match, if one is found |
Try to match the given string against the entries in the set using the specified minimum score and return the closest match, if one is found.
:: MonadFuzzySearch m | |
=> Double | A minimum score |
-> Text | The string to search for |
-> m (Maybe Text) | The string most closely matching the input, if a match is found |
Try to match the given string against the entries in the set using the specified minimum score and return the string that most closely matches the input, if a match is found.
:: MonadFuzzySearch m | |
=> Text | The string to search for |
-> m (Maybe Text) | The string most closely matching the input, if a match is found |
Try to match the given string against the entries in the set, and return
the string that most closely matches the input, if a match is found. A
minimum score of 0.33 is used. To specify a custom threshold value,
instead use closestMatchMin
.
Inspection
values :: MonadFuzzySearch m => m [Text] Source #
Return the elements of the set. No particular order is guaranteed.
size :: MonadFuzzySearch m => m Int Source #
Return the number of entries in the set.
isEmpty :: MonadFuzzySearch m => m Bool Source #
Return a boolean indicating whether the set is empty.