Safe Haskell | None |
---|---|
Language | Haskell2010 |
Documentation
:: (RealFrac p, Floating d, Ord d, Eq a) | |
=> (a -> a -> d) | distance function |
-> p | proportion of remaining dataset to sample at each level, \(0 < p <= 1 \) |
-> Vector a | dataset used for constructing the index |
-> VPTree d a |
Build a VPTree
The supplied distance function d
must satisfy the definition of a metric, i.e.
- identity of indiscernible elements : \( d(x, y) = 0 \leftrightarrow x \equiv y \)
- symmetry : \( d(x, y) = d(y, x) \)
- triangle inequality : \( d(x, y) + d(y, z) >= d(x, z) \)
The current implementation makes multiple passes over the whole dataset, which is why the entire indexing dataset must be present in memory (packed as a Vector
).
Implementation detail : construction of a VP-tree requires a randomized algorithm, but we run that in the ST monad so the result is pure.