bktrees-0.1.3: A set data structure with approximate searching

Portability portable Alpha quality. Interface may change without notice. josef.svenningsson@gmail.com

Data.Set.BKTree

Description

Burhard-Keller trees provide an implementation of sets which apart from the ordinary operations also has an approximate member search, allowing you to search for elements that are of a distance `n` from the element you are searching for. The distance is determined using a metric on the type of elements. Therefore all elements must implement the `Metric` type class, rather than the more usual `Ord`.

Useful metrics include the manhattan distance between two points, the Levenshtein edit distance between two strings, the number of edges in the shortest path between two nodes in a undirected graph and the Hamming distance between two binary strings. Any euclidean space also has a metric. However, in this module we use int-valued metrics and that doesn't quite with the metrics of euclidean spaces which are real-values.

The worst case complexity of many of these operations is quite bad, but the expected behavior varies greatly with the metric. For example, the discrete metric (```distance x y | y == x = 0 | otherwise = 1```) makes BK-trees behave abysmally. The metrics mentioned above should give good performance characteristics.

Synopsis

# Documentation

data BKTree a Source

The type of Burhard-Keller trees.

class Eq a => Metric a whereSource

A type is `Metric` if is has a function `distance` which has the following properties:

• ``distance` x y >= 0`
• `distance x y == 0` if and only if `x == y`
• ``distance` x y == `distance` y x`
• ``distance` x z <= `distance` x y + `distance` y z`

All types of elements to `BKTree` must implement `Metric`.

This definition of a metric deviates from the mathematical one in that it returns an integer instead of a real number. The reason for choosing integers is that I wanted to avoid the rather unpredictable rounding of floating point numbers.

Methods

distance :: a -> a -> IntSource

Instances

 Metric Char Metric Int Metric Integer Eq a => Metric [a]

null :: BKTree a -> BoolSource

Test if the tree is empty.

The empty tree.

fromList :: Metric a => [a] -> BKTree aSource

Constructs a tree from a list

singleton :: a -> BKTree aSource

The tree with a single element

insert :: Metric a => a -> BKTree a -> BKTree aSource

Inserts an element into the tree. If an element is inserted several times it will be stored several times.

member :: Metric a => a -> BKTree a -> BoolSource

Checks whether an element is in the tree.

memberDistance :: Metric a => Int -> a -> BKTree a -> BoolSource

Approximate searching. `memberDistance n a tree` will return true if there is an element in `tree` which has a `distance` less than or equal to `n` from `a`.

delete :: Metric a => a -> BKTree a -> BKTree aSource

Removes an element from the tree. If an element occurs several times in the only the first occurrence will be deleted.

union :: Metric a => BKTree a -> BKTree a -> BKTree aSource

Merges two trees

unions :: Metric a => [BKTree a] -> BKTree aSource

Merges several trees

elems :: BKTree a -> [a]Source

Returns all the elements of the tree

elemsDistance :: Metric a => Int -> a -> BKTree a -> [a]Source

`elemsDistance n a tree` returns all the elements in `tree` which are at a `distance` less than or equal to `n` from the element `a`.

closest :: Metric a => a -> BKTree a -> Maybe (a, Int)Source

`closest a tree` returns the element in `tree` which is closest to `a` together with the distance. Returns `Nothing` if the tree is empty.