Safe Haskell | Safe-Inferred |
---|---|

Language | Haskell2010 |

- type PointAsListFn a p = p -> [a]
- type SquaredDistanceFn a p = p -> p -> a
- data KdTree a p
- empty :: Real a => PointAsListFn a p -> KdTree a p
- emptyWithDist :: Real a => PointAsListFn a p -> SquaredDistanceFn a p -> KdTree a p
- singleton :: Real a => PointAsListFn a p -> p -> KdTree a p
- singletonWithDist :: Real a => PointAsListFn a p -> SquaredDistanceFn a p -> p -> KdTree a p
- build :: Real a => PointAsListFn a p -> [p] -> KdTree a p
- buildWithDist :: Real a => PointAsListFn a p -> SquaredDistanceFn a p -> [p] -> KdTree a p
- insertUnbalanced :: Real a => KdTree a p -> p -> KdTree a p
- batchInsertUnbalanced :: Real a => KdTree a p -> [p] -> KdTree a p
- nearest :: Real a => KdTree a p -> p -> p
- inRadius :: Real a => KdTree a p -> a -> p -> [p]
- kNearest :: Real a => KdTree a p -> Int -> p -> [p]
- inRange :: Real a => KdTree a p -> p -> p -> [p]
- toList :: KdTree a p -> [p]
- null :: KdTree a p -> Bool
- size :: KdTree a p -> Int
- defaultSqrDist :: Num a => PointAsListFn a p -> SquaredDistanceFn a p

# Introduction

Let's say you have a large set of 3D points called *data points*,
and you'd like to be able to quickly perform *point queries* on the
data points. One example of a point query is the *nearest neighbor*
query: given a set of data points `points`

and a query point `p`

,
which point in `points`

is closest to `p`

?

We can efficiently solve the nearest neighbor query (along with
many other types of point queries) if we appropriately organize the
data points. One such method of organization is called the *k*-d
tree algorithm, which is implemented in this module.

# Usage

Let's say you have a list of 3D data points, and each point is of
type `Point3d`

:

data Point3d = Point3d { x :: Double , y :: Double , z :: Double } deriving Show

We call a point's individual values *axis values* (i.e., `x`

, `y`

,
and `z`

in the case of `Point3d`

).

In order to generate a *k*-d tree of `Point3d`

's, we need to define
a `PointAsListFn`

that expresses the point's axis values as a list:

point3dAsList :: Point3d -> [Double] point3dAsList (Point3d x y z) = [x, y, z]

Now we can build a `KdTree`

structure from a list of data points
and perform a nearest neighbor query as follows:

>>> let dataPoints = [(Point3d 0.0 0.0 0.0), (Point3d 1.0 1.0 1.0)] >>> let kdt =`build`

point3dAsList dataPoints >>> let queryPoint = Point3d 0.1 0.1 0.1 >>>`nearest`

kdt queryPoint Point3d {x = 0.0, y = 0.0, z = 0.0}

# Variants

## Dynamic *k*-d trees

The `KdTree`

structure is meant for static sets of data points. If
you need to insert points into an existing *k*-d tree, check out
`Data.KdTree.Dynamic.`

`KdTree`

.

*k*-d maps

If you need to associate additional data with each point in the
tree (i.e., points are *keys* associated with *values*), check out
`Data.KdMap.Static.`

`KdMap`

and
`Data.KdMap.Dynamic.`

`KdMap`

for static and dynamic
variants of this functionality. Please *do not* try to fake this
functionality with a `KdTree`

by augmenting your point type with
the extra data; you're gonna have a bad time.

# Advanced

## Custom distance functions

You may have noticed in the previous use case that we never
specified what "nearest" means for our points. By default,
`build`

uses a Euclidean distance function that is sufficient
in most cases. However, point queries are typically faster on a
`KdTree`

built with a user-specified custom distance
function. Let's generate a `KdTree`

using a custom distance
function.

One idiosyncrasy about `KdTree`

is that custom distance functions
are actually specified as *squared distance* functions
(`SquaredDistanceFn`

). This means that your custom distance
function must return the *square* of the actual distance between
two points. This is for efficiency: regular distance functions
often require expensive square root computations, whereas in our
case, the squared distance works fine and doesn't require computing
any square roots. Here's an example of a squared distance function
for `Point3d`

:

point3dSquaredDistance :: Point3d -> Point3d -> Double point3dSquaredDistance (Point3d x1 y1 z1) (Point3d x2 y2 z2) = let dx = x1 - x2 dy = y1 - y2 dz = z1 - z2 in dx * dx + dy * dy + dz * dz

We can build a `KdTree`

using our custom distance function as follows:

`>>> let kdt = ``buildWithDist`

point3dAsList point3dSquaredDistance points

## Axis value types

In the above examples, we used a point type with axis values of
type `Double`

. We can in fact use axis values of any type that is
an instance of the `Real`

typeclass. This means you can use points
that are composed of `Double`

s, `Int`

s, `Float`

s, and so on:

```
data Point2i = Point2i Int Int
point2iAsList :: Point2i -> [Int]
point2iAsList (Point2i x y) = [x, y]
kdt :: [Point2i] -> KdTree Int Point2i
kdt dataPoints =
````build`

point2iAsList dataPoints

# Reference

## Types

type PointAsListFn a p = p -> [a] Source

Converts a point of type `p`

with axis values of type
`a`

into a list of axis values [a].

type SquaredDistanceFn a p = p -> p -> a Source

Returns the squared distance between two points of type
`p`

with axis values of type `a`

.

A *k*-d tree structure that stores points of type `p`

with axis
values of type `a`

.

*k*-d tree construction

emptyWithDist :: Real a => PointAsListFn a p -> SquaredDistanceFn a p -> KdTree a p Source

Builds an empty `KdTree`

using a user-specified squared distance
function.

singleton :: Real a => PointAsListFn a p -> p -> KdTree a p Source

Builds a `KdTree`

with a single point.

singletonWithDist :: Real a => PointAsListFn a p -> SquaredDistanceFn a p -> p -> KdTree a p Source

Builds a `KdTree`

with a single point using a user-specified
squared distance function.

:: Real a | |

=> PointAsListFn a p | |

-> [p] | non-empty list of data points to be stored in the |

-> KdTree a p |

Builds a `KdTree`

from a list of data points using a default
squared distance function `defaultSqrDist`

.

Average complexity: *O(n * log(n))* for *n* data points.

Worst case time complexity: *O(n^2)* for *n* data points.

Worst case space complexity: *O(n)* for *n* data points.

buildWithDist :: Real a => PointAsListFn a p -> SquaredDistanceFn a p -> [p] -> KdTree a p Source

Builds a `KdTree`

from a list of data points using a
user-specified squared distance function.

Average time complexity: *O(n * log(n))* for *n* data points.

Worst case time complexity: *O(n^2)* for *n* data points.

Worst case space complexity: *O(n)* for *n* data points.

insertUnbalanced :: Real a => KdTree a p -> p -> KdTree a p Source

Inserts a point into a `KdTree`

. This can potentially
cause the internal tree structure to become unbalanced. If the tree
becomes too unbalanced, point queries will be very inefficient. If
you need to perform lots of point insertions on an already existing
*k*-d tree, check out
`Data.KdTree.Dynamic.`

`KdTree`

.

Average complexity: *O(log(n))* for *n* data points.

Worse case time complexity: *O(n)* for *n* data points.

batchInsertUnbalanced :: Real a => KdTree a p -> [p] -> KdTree a p Source

Inserts a list of points into a `KdTree`

. This can potentially
cause the internal tree structure to become unbalanced, which leads
to inefficient point queries.

Average complexity: *O(n * log(n))* for *n* data points.

Worst case time complexity: *O(n^2)* for *n* data points.

## Query

kNearest :: Real a => KdTree a p -> Int -> p -> [p] Source

Given a `KdTree`

, a query point, and a number `k`

, returns the
`k`

nearest points in the `KdTree`

to the query point.

Neighbors are returned in order of increasing distance from query point.

Average time complexity: *log(k) * log(n)* for *k* nearest
neighbors on a structure with *n* data points.

Worst case time complexity: *n * log(k)* for *k* nearest
neighbors on a structure with *n* data points.

:: Real a | |

=> KdTree a p | |

-> p | lower bounds of range |

-> p | upper bounds of range |

-> [p] | all points within given range |

Finds all points in a `KdTree`

with points within a given range,
where the range is specified as a set of lower and upper bounds.

Points are not returned in any particular order.

Worst case time complexity: *O(n)* for n data points and a range
that spans all the points.

toList :: KdTree a p -> [p] Source

Returns a list of all the points in the `KdTree`

.

Time complexity: *O(n)* for *n* data points.

## Utilities

defaultSqrDist :: Num a => PointAsListFn a p -> SquaredDistanceFn a p Source

A default implementation of squared distance given two points and
a `PointAsListFn`

.