fuzzyset: Fuzzy set for approximate string matching

[ bsd3, data, library ] [ Propose Tags ]

This library is based on the Python and JavaScript libraries with the same name.

[Skip to Readme]
Versions [RSS] [faq],,,,,,,,, 0.1.1, 0.2.0, 0.2.1, 0.2.2
Dependencies base (>=4.7 && <5), base-unicode-symbols (==, data-default (==, lens (==4.15.4), text (>=, text-metrics (==0.3.0), unordered-containers (==, vector (>= [details]
License BSD-3-Clause
Copyright 2017 Johannes Hildén
Author Johannes Hildén
Maintainer hildenjohannes@gmail.com
Category Data
Home page https://github.com/laserpants/fuzzyset-haskell
Source repo head: git clone https://github.com/laserpants/fuzzyset-haskell
Uploaded by arbelos at 2018-01-07T08:54:57Z
Distributions LTSHaskell:0.2.2, NixOS:0.2.1, Stackage:0.2.2
Downloads 6094 total (228 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs uploaded by user
Build status unknown [no reports yet]




Maintainer's Corner

For package maintainers and hackage trustees


Readme for fuzzyset-

[back to package description]

fuzzyset-haskell Build Status License Language Hackage

A fuzzy string set data structure for approximate string matching. This implementation is based on the Python and JavaScript libraries with the same name:


cabal install fuzzyset

For details, see Hackage docs. This library is also available on Stackage. To install using Stack:

stack install fuzzyset

How to use

Make sure the OverloadedStrings pragma is enabled. Then there are just three steps:

  1. Create a set using one of defaultSet, mkSet, or fromList.
  2. To add entries, use add, addToSet, or addMany.
  3. Then query the set with get, getOne, or getWithMinScore.
>>> defaultSet `add` "Jurassic Park" `add` "Terminator" `add` "The Matrix" `getOne` "percolator"
Just "Terminator"
>>> defaultSet `add` "Shaggy Rogers" `add` "Fred Jones" `add` "Daphne Blake" `add` "Velma Dinkley" `get` "Shaggy Jones"
[(0.7692307692307693,"Shaggy Rogers"),(0.5,"Fred Jones")]

There are also a few functions to inspect the set.

More examples

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Data.FuzzySet

states = [ "Alabama"        , "Alaska"         , "American Samoa"            , "Arizona"       , "Arkansas"
         , "California"     , "Colorado"       , "Connecticut"               , "Delaware"      , "District of Columbia"
         , "Florida"        , "Georgia"        , "Guam"                      , "Hawaii"        , "Idaho"
         , "Illinois"       , "Indiana"        , "Iowa"                      , "Kansas"        , "Kentucky"
         , "Louisiana"      , "Maine"          , "Maryland"                  , "Massachusetts" , "Michigan"
         , "Minnesota"      , "Mississippi"    , "Missouri"                  , "Montana"       , "Nebraska"
         , "Nevada"         , "New Hampshire"  , "New Jersey"                , "New Mexico"    , "New York"
         , "North Carolina" , "North Dakota"   , "Northern Marianas Islands" , "Ohio"          , "Oklahoma"
         , "Oregon"         , "Pennsylvania"   , "Puerto Rico"               , "Rhode Island"  , "South Carolina"
         , "South Dakota"   , "Tennessee"      , "Texas"                     , "Utah"          , "Vermont"
         , "Virginia"       , "Virgin Islands" , "Washington"                , "West Virginia" , "Wisconsin"
         , "Wyoming" ]

statesSet = fromList states

main = mapM_ print (get statesSet "Burger Islands")

The output of this program is:

(0.7142857142857143,"Virgin Islands")
(0.5714285714285714,"Rhode Island")
(0.44,"Northern Marianas Islands")

Using the definition of statesSet from previous example:

>>> get statesSet "Why-oh-me-ing"

>>> get statesSet "Connect a cat"

>>> get statesSet "Transylvania"

>>> get statesSet "CanOfSauce"

>>> get statesSet "Alaska"

>>> get statesSet "Alaskanbraskansas"