bktrees-0.3.1: A set data structure with approximate searching

Portability portable Alpha quality. Interface may change without notice. josef.svenningsson@gmail.com

Data.Set.BKTree

Description

Burkhard-Keller trees provide an implementation of sets which apart from the ordinary operations also has an approximate member search, allowing you to search for elements that are of a distance `n` from the element you are searching for. The distance is determined using a metric on the type of elements. Therefore all elements must implement the `Metric` type class, rather than the more usual `Ord`.

Useful metrics include the manhattan distance between two points, the Levenshtein edit distance between two strings, the number of edges in the shortest path between two nodes in an undirected graph and the Hamming distance between two binary strings. Any euclidean space also has a metric. However, in this module we use int-valued metrics and that's not compatible with the metrics of euclidean spaces which are real-values.

The worst case complexity of many of these operations is quite bad, but the expected behavior varies greatly with the metric. For example, the discrete metric (```distance x y | y == x = 0 | otherwise = 1```) makes BK-trees behave abysmally. The metrics mentioned above should give good performance characteristics.

Synopsis

# Documentation

data BKTree a Source

The type of Burkhard-Keller trees.

class Eq a => Metric a whereSource

A type is `Metric` if is has a function `distance` which has the following properties:

• ``distance` x y >= 0`
• `distance x y == 0` if and only if `x == y`
• ``distance` x y == `distance` y x`
• ``distance` x z <= `distance` x y + `distance` y z`

All types of elements to `BKTree` must implement `Metric`.

This definition of a metric deviates from the mathematical one in that it returns an integer instead of a real number. The reason for choosing integers is that I wanted to avoid the rather unpredictable rounding of floating point numbers.

Methods

distance :: a -> a -> IntSource

Instances

 Metric Char Metric Int Metric Integer Eq a => Metric [a]

null :: BKTree a -> BoolSource

Test if the tree is empty.

size :: BKTree a -> IntSource

Size of the tree.

The empty tree.

fromList :: Metric a => [a] -> BKTree aSource

Constructs a tree from a list

singleton :: a -> BKTree aSource

The tree with a single element

insert :: Metric a => a -> BKTree a -> BKTree aSource

Inserts an element into the tree. If an element is inserted several times it will be stored several times.

member :: Metric a => a -> BKTree a -> BoolSource

Checks whether an element is in the tree.

memberDistance :: Metric a => Int -> a -> BKTree a -> BoolSource

Approximate searching. `memberDistance n a tree` will return true if there is an element in `tree` which has a `distance` less than or equal to `n` from `a`.

delete :: Metric a => a -> BKTree a -> BKTree aSource

Removes an element from the tree. If an element occurs several times in the tree then only one occurrence will be deleted.

union :: Metric a => BKTree a -> BKTree a -> BKTree aSource

Merges two trees

unions :: Metric a => [BKTree a] -> BKTree aSource

Merges several trees

elems :: BKTree a -> [a]Source

Returns all the elements of the tree

elemsDistance :: Metric a => Int -> a -> BKTree a -> [a]Source

`elemsDistance n a tree` returns all the elements in `tree` which are at a `distance` less than or equal to `n` from the element `a`.

closest :: Metric a => a -> BKTree a -> Maybe (a, Int)Source

`closest a tree` returns the element in `tree` which is closest to `a` together with the distance. Returns `Nothing` if the tree is empty.