rp-tree-0.5: Random projection trees
Safe HaskellNone
LanguageHaskell2010

Data.RPTree

Description

Random projection trees for approximate nearest neighbor search in high-dimensional vector spaces.

Introduction

Similarity search is a common problem in many fields (imaging, natural language processing, ..), and is often one building block of a larger data processing system.

There are many ways to embed data in a vector space such that similarity search can be recast as a geometrical nearest neighbor lookup.

In turn, the efficiency and effectiveness of querying such a vector database strongly depends on how internally the data index is represented, graphs and trees being two common approaches.

The naive, all-pairs exact search becomes impractical even at moderate data sizes, which motivated research into approximate indexing methods.

Overview

This library provides a tree-based approach to approximate nearest neighbor search. The database is recursively partitioned according to a series of random projections, and this partitioning is logically arranged as a tree which allows for rapid lookup.

Internally, a single random projection vector is sampled per tree level, as proposed in [1]. The projection vectors in turn can be sparse with a tunable sparsity parameter, which can help compressing the database at a small accuracy cost.

Retrieval accuracy can be improved by populating multiple trees (i.e. a random forest), and intersecting the results of the same query against each of them.

Quick Start

1) Build an index with forest

2) Lookup the \(k\) nearest neighbors to a query point with knn

3) The database can be serialised and restored with serialiseRPForest and deserialiseRPForest, respectively.

References

1) Hyvonen, V., et al., Fast Nearest Neighbor Search through Sparse Random Projections and Voting, https://www.cs.helsinki.fi/u/ttonteri/pub/bigdata2016.pdf

Synopsis

Construction

tree Source #

Arguments

:: (Monad m, Inner SVector v) 
=> Word64

random seed

-> Int

max tree depth

-> Int

min leaf size

-> Int

data chunk size

-> Double

nonzero density of projection vectors

-> Int

dimension of projection vectors

-> ConduitT () (Embed v Double x) m ()

data source

-> m (RPTree Double () (Vector (Embed v Double x))) 

Populate a tree from a data stream

Assumptions on the data source:

  • non-empty : contains at least one value
  • stationary : each chunk is representative of the whole dataset
  • bounded : we wait until the end of the stream to produce a result

forest Source #

Arguments

:: (Monad m, Inner SVector v) 
=> Word64

random seed

-> Int

max tree depth, \(l > 1\)

-> Int

min leaf size, \(m_{leaf} > 1\)

-> Int

number of trees, \(n_t > 1\)

-> Int

data chunk size, \(n_{chunk} > 3\)

-> Double

nonzero density of projection vectors, \(p_{nz} \in (0, 1)\)

-> Int

dimension of projection vectors, \(d > 1\)

-> ConduitT () (Embed v Double x) m ()

data source

-> m (RPForest Double (Vector (Embed v Double x))) 

Populate a forest from a data stream

Assumptions on the data source:

  • non-empty : contains at least one value
  • stationary : each chunk is representative of the whole dataset
  • bounded : we wait until the end of the stream to produce a result

Parameters

rpTreeCfg Source #

Arguments

:: Int

min leaf size

-> Int

number of points in the dataset

-> Int

vector dimension of the data points

-> RPTreeConfig 

Configure the rp-tree tree construction process with some natural defaults

data RPTreeConfig Source #

Constructors

RPCfg 

Fields

Instances

Instances details
Show RPTreeConfig Source # 
Instance details

Defined in Data.RPTree.Conduit

Query

knn Source #

Arguments

:: (Ord p, Inner SVector v, Unbox d, Real d) 
=> (u d -> v d -> p)

distance function

-> Int

k neighbors

-> RPForest d (Vector (Embed u d x))

random projection forest

-> v d

query point

-> Vector (p, Embed u d x)

ordered in increasing distance order to the query point

Look up the \(k\) nearest neighbors to a query point

The supplied distance function d must satisfy the definition of a metric, i.e.

  • identity of indiscernible elements : \( d(x, y) = 0 \leftrightarrow x \equiv y \)
  • symmetry : \( d(x, y) = d(y, x) \)
  • triangle inequality : \( d(x, y) + d(y, z) \geq d(x, z) \)

I/O

serialiseRPForest Source #

Arguments

:: (Serialise d, Serialise a, Unbox d) 
=> RPForest d a 
-> [ByteString]

the order is undefined

Serialise each tree in the RPForest as a separate bytestring

deserialiseRPForest Source #

Arguments

:: (Serialise d, Serialise a, Unbox d) 
=> [ByteString] 
-> Either String (RPForest d a)

the order is undefined

Deserialise each tree in the RPForest from a separate bytestring and reconstruct

Statistics

recallWith Source #

Arguments

:: (Inner SVector v, Unbox d, Fractional b, Ord d, Ord a, Ord x, Ord (u d), Num d) 
=> (u d -> v d -> a)

distance function

-> RPForest d (Vector (Embed u d x)) 
-> Int

k : number of nearest neighbors to consider

-> v d

query point

-> b 

Average recall-at-k, computed over a set of trees

The supplied distance function d must satisfy the definition of a metric, i.e.

  • identity of indiscernible elements : \( d(x, y) = 0 \leftrightarrow x \equiv y \)
  • symmetry : \( d(x, y) = d(y, x) \)
  • triangle inequality : \( d(x, y) + d(y, z) \geq d(x, z) \)

Access

leaves :: RPTree d l a -> [a] Source #

All data buckets stored at the leaves of the tree

levels :: RPTree d l a -> Int Source #

Number of tree levels

points :: Monoid m => RPTree d l m -> m Source #

Set of data points used to construct the index

candidates Source #

Arguments

:: (Inner SVector v, Unbox d, Ord d, Num d, Semigroup xs) 
=> RPTree d l xs 
-> v d

query point

-> xs 

Retrieve points nearest to the query

in case of a narrow margin, collect both branches of the tree

Validation

treeSize :: Foldable t => RPTree d l (t a) -> Int Source #

How many data items are stored in the RPTree

leafSizes :: Foldable t => RPTree d l (t a) -> RPT d l Int Source #

How many data items are stored in each leaf of the RPTree

data RPTreeStats Source #

Instances

Instances details
Eq RPTreeStats Source # 
Instance details

Defined in Data.RPTree

Show RPTreeStats Source # 
Instance details

Defined in Data.RPTree

Types

data Embed v e a Source #

Pairing of a data item with its vector embedding

The vector is used internally for indexing

Constructors

Embed 

Fields

Instances

Instances details
Functor (Embed v e) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

fmap :: (a -> b) -> Embed v e a -> Embed v e b #

(<$) :: a -> Embed v e b -> Embed v e a #

(Eq a, Eq (v e)) => Eq (Embed v e a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

(==) :: Embed v e a -> Embed v e a -> Bool #

(/=) :: Embed v e a -> Embed v e a -> Bool #

(Ord a, Ord (v e)) => Ord (Embed v e a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

compare :: Embed v e a -> Embed v e a -> Ordering #

(<) :: Embed v e a -> Embed v e a -> Bool #

(<=) :: Embed v e a -> Embed v e a -> Bool #

(>) :: Embed v e a -> Embed v e a -> Bool #

(>=) :: Embed v e a -> Embed v e a -> Bool #

max :: Embed v e a -> Embed v e a -> Embed v e a #

min :: Embed v e a -> Embed v e a -> Embed v e a #

(Show (v e), Show e, Show a) => Show (Embed v e a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

showsPrec :: Int -> Embed v e a -> ShowS #

show :: Embed v e a -> String #

showList :: [Embed v e a] -> ShowS #

Generic (Embed v e a) Source # 
Instance details

Defined in Data.RPTree.Internal

Associated Types

type Rep (Embed v e a) :: Type -> Type #

Methods

from :: Embed v e a -> Rep (Embed v e a) x #

to :: Rep (Embed v e a) x -> Embed v e a #

(NFData (v e), NFData a) => NFData (Embed v e a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

rnf :: Embed v e a -> () #

(Serialise (v e), Serialise a) => Serialise (Embed v e a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

encode :: Embed v e a -> Encoding #

decode :: Decoder s (Embed v e a) #

encodeList :: [Embed v e a] -> Encoding #

decodeList :: Decoder s [Embed v e a] #

type Rep (Embed v e a) Source # 
Instance details

Defined in Data.RPTree.Internal

type Rep (Embed v e a) = D1 ('MetaData "Embed" "Data.RPTree.Internal" "rp-tree-0.5-K2Y2GUP9awAdaXVV0xDQV" 'False) (C1 ('MetaCons "Embed" 'PrefixI 'True) (S1 ('MetaSel ('Just "eEmbed") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 (v e)) :*: S1 ('MetaSel ('Just "eData") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 a)))

RPTree

data RPTree d l a Source #

Random projection trees

The first type parameter corresponds to a floating point scalar value, the second labels every tree part (e.g. for visual rendering) and the third is the type of the data collected at the leaves of the tree (e.g. lists of vectors).

We keep them separate to leverage the (Bi-)Functor instance for postprocessing and visualization.

This implementation uses one projection vector per tree level (as suggested in https://www.cs.helsinki.fi/u/ttonteri/pub/bigdata2016.pdf ).

Instances

Instances details
Functor (RPTree d l) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

fmap :: (a -> b) -> RPTree d l a -> RPTree d l b #

(<$) :: a -> RPTree d l b -> RPTree d l a #

Foldable (RPTree d l) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

fold :: Monoid m => RPTree d l m -> m #

foldMap :: Monoid m => (a -> m) -> RPTree d l a -> m #

foldMap' :: Monoid m => (a -> m) -> RPTree d l a -> m #

foldr :: (a -> b -> b) -> b -> RPTree d l a -> b #

foldr' :: (a -> b -> b) -> b -> RPTree d l a -> b #

foldl :: (b -> a -> b) -> b -> RPTree d l a -> b #

foldl' :: (b -> a -> b) -> b -> RPTree d l a -> b #

foldr1 :: (a -> a -> a) -> RPTree d l a -> a #

foldl1 :: (a -> a -> a) -> RPTree d l a -> a #

toList :: RPTree d l a -> [a] #

null :: RPTree d l a -> Bool #

length :: RPTree d l a -> Int #

elem :: Eq a => a -> RPTree d l a -> Bool #

maximum :: Ord a => RPTree d l a -> a #

minimum :: Ord a => RPTree d l a -> a #

sum :: Num a => RPTree d l a -> a #

product :: Num a => RPTree d l a -> a #

Traversable (RPTree d l) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

traverse :: Applicative f => (a -> f b) -> RPTree d l a -> f (RPTree d l b) #

sequenceA :: Applicative f => RPTree d l (f a) -> f (RPTree d l a) #

mapM :: Monad m => (a -> m b) -> RPTree d l a -> m (RPTree d l b) #

sequence :: Monad m => RPTree d l (m a) -> m (RPTree d l a) #

(Unbox d, Eq d, Eq l, Eq a) => Eq (RPTree d l a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

(==) :: RPTree d l a -> RPTree d l a -> Bool #

(/=) :: RPTree d l a -> RPTree d l a -> Bool #

(Unbox d, Show d, Show l, Show a) => Show (RPTree d l a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

showsPrec :: Int -> RPTree d l a -> ShowS #

show :: RPTree d l a -> String #

showList :: [RPTree d l a] -> ShowS #

Generic (RPTree d l a) Source # 
Instance details

Defined in Data.RPTree.Internal

Associated Types

type Rep (RPTree d l a) :: Type -> Type #

Methods

from :: RPTree d l a -> Rep (RPTree d l a) x #

to :: Rep (RPTree d l a) x -> RPTree d l a #

(NFData a, NFData l, NFData d) => NFData (RPTree d l a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

rnf :: RPTree d l a -> () #

(Serialise d, Serialise l, Serialise a, Unbox d) => Serialise (RPTree d l a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

encode :: RPTree d l a -> Encoding #

decode :: Decoder s (RPTree d l a) #

encodeList :: [RPTree d l a] -> Encoding #

decodeList :: Decoder s [RPTree d l a] #

type Rep (RPTree d l a) Source # 
Instance details

Defined in Data.RPTree.Internal

type Rep (RPTree d l a)

type RPForest d a = IntMap (RPTree d () a) Source #

A random projection forest is an ordered set of RPTrees

This supports efficient updates of the ensemble in the streaming/online setting.

Vector types

Sparse

data SVector a Source #

Sparse vectors with unboxed components

Instances

Instances details
Scale SVector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

(.*) :: (Unbox a, Num a) => a -> SVector a -> SVector a Source #

Inner SVector Vector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

inner :: (Unbox a, Num a) => SVector a -> Vector a -> a Source #

metricL2 :: (Unbox a, Floating a) => SVector a -> Vector a -> a Source #

(^+^) :: (Unbox a, Num a) => SVector a -> Vector a -> Vector a Source #

(^-^) :: (Unbox a, Num a) => SVector a -> Vector a -> Vector a Source #

Inner SVector DVector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

inner :: (Unbox a, Num a) => SVector a -> DVector a -> a Source #

metricL2 :: (Unbox a, Floating a) => SVector a -> DVector a -> a Source #

(^+^) :: (Unbox a, Num a) => SVector a -> DVector a -> DVector a Source #

(^-^) :: (Unbox a, Num a) => SVector a -> DVector a -> DVector a Source #

Inner SVector SVector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

inner :: (Unbox a, Num a) => SVector a -> SVector a -> a Source #

metricL2 :: (Unbox a, Floating a) => SVector a -> SVector a -> a Source #

(^+^) :: (Unbox a, Num a) => SVector a -> SVector a -> SVector a Source #

(^-^) :: (Unbox a, Num a) => SVector a -> SVector a -> SVector a Source #

(Unbox a, Eq a) => Eq (SVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

(==) :: SVector a -> SVector a -> Bool #

(/=) :: SVector a -> SVector a -> Bool #

(Unbox a, Ord a) => Ord (SVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

compare :: SVector a -> SVector a -> Ordering #

(<) :: SVector a -> SVector a -> Bool #

(<=) :: SVector a -> SVector a -> Bool #

(>) :: SVector a -> SVector a -> Bool #

(>=) :: SVector a -> SVector a -> Bool #

max :: SVector a -> SVector a -> SVector a #

min :: SVector a -> SVector a -> SVector a #

(Unbox a, Show a) => Show (SVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

showsPrec :: Int -> SVector a -> ShowS #

show :: SVector a -> String #

showList :: [SVector a] -> ShowS #

Generic (SVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

Associated Types

type Rep (SVector a) :: Type -> Type #

Methods

from :: SVector a -> Rep (SVector a) x #

to :: Rep (SVector a) x -> SVector a #

NFData (SVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

rnf :: SVector a -> () #

(Unbox a, Serialise a) => Serialise (SVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

type Rep (SVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

type Rep (SVector a) = D1 ('MetaData "SVector" "Data.RPTree.Internal" "rp-tree-0.5-K2Y2GUP9awAdaXVV0xDQV" 'False) (C1 ('MetaCons "SV" 'PrefixI 'True) (S1 ('MetaSel ('Just "svDim") 'SourceUnpack 'SourceStrict 'DecidedStrict) (Rec0 Int) :*: S1 ('MetaSel ('Just "svVec") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 (Vector (Int, a)))))

fromListSv :: Unbox a => Int -> [(Int, a)] -> SVector a Source #

(Unsafe) Pack a SVector from its vector dimension and components

Note : the relevant invariants are not checked :

  • vector components are _assumed_ to be in increasing order
  • vector dimension is larger than any component index

fromVectorSv Source #

Arguments

:: Int

vector dimension

-> Vector (Int, a)

vector components (in increasing order)

-> SVector a 

(Unsafe) Pack a SVector from its vector dimension and components

Note : the relevant invariants are not checked :

  • vector components are _assumed_ to be in increasing order
  • vector dimension is larger than any component index

Dense

data DVector a Source #

Dense vectors with unboxed components

Instances

Instances details
Scale DVector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

(.*) :: (Unbox a, Num a) => a -> DVector a -> DVector a Source #

Inner DVector DVector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

inner :: (Unbox a, Num a) => DVector a -> DVector a -> a Source #

metricL2 :: (Unbox a, Floating a) => DVector a -> DVector a -> a Source #

(^+^) :: (Unbox a, Num a) => DVector a -> DVector a -> DVector a Source #

(^-^) :: (Unbox a, Num a) => DVector a -> DVector a -> DVector a Source #

Inner SVector DVector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

inner :: (Unbox a, Num a) => SVector a -> DVector a -> a Source #

metricL2 :: (Unbox a, Floating a) => SVector a -> DVector a -> a Source #

(^+^) :: (Unbox a, Num a) => SVector a -> DVector a -> DVector a Source #

(^-^) :: (Unbox a, Num a) => SVector a -> DVector a -> DVector a Source #

(Unbox a, Eq a) => Eq (DVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

(==) :: DVector a -> DVector a -> Bool #

(/=) :: DVector a -> DVector a -> Bool #

(Unbox a, Ord a) => Ord (DVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

compare :: DVector a -> DVector a -> Ordering #

(<) :: DVector a -> DVector a -> Bool #

(<=) :: DVector a -> DVector a -> Bool #

(>) :: DVector a -> DVector a -> Bool #

(>=) :: DVector a -> DVector a -> Bool #

max :: DVector a -> DVector a -> DVector a #

min :: DVector a -> DVector a -> DVector a #

(Unbox a, Show a) => Show (DVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

showsPrec :: Int -> DVector a -> ShowS #

show :: DVector a -> String #

showList :: [DVector a] -> ShowS #

Generic (DVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

Associated Types

type Rep (DVector a) :: Type -> Type #

Methods

from :: DVector a -> Rep (DVector a) x #

to :: Rep (DVector a) x -> DVector a #

NFData (DVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

rnf :: DVector a -> () #

(Unbox a, Serialise a) => Serialise (DVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

type Rep (DVector a) Source # 
Instance details

Defined in Data.RPTree.Internal

type Rep (DVector a) = D1 ('MetaData "DVector" "Data.RPTree.Internal" "rp-tree-0.5-K2Y2GUP9awAdaXVV0xDQV" 'True) (C1 ('MetaCons "DV" 'PrefixI 'True) (S1 ('MetaSel ('Just "dvVec") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 (Vector a))))

fromListDv :: Unbox a => [a] -> DVector a Source #

Vector space typeclasses

class (Scale u, Scale v) => Inner u v where Source #

Inner product spaces

This typeclass is provided as a convenience for library users to interface their own vector types.

Methods

inner :: (Unbox a, Num a) => u a -> v a -> a Source #

metricL2 :: (Unbox a, Floating a) => u a -> v a -> a Source #

(^+^) :: (Unbox a, Num a) => u a -> v a -> v a Source #

(^-^) :: (Unbox a, Num a) => u a -> v a -> v a Source #

Instances

Instances details
Inner DVector DVector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

inner :: (Unbox a, Num a) => DVector a -> DVector a -> a Source #

metricL2 :: (Unbox a, Floating a) => DVector a -> DVector a -> a Source #

(^+^) :: (Unbox a, Num a) => DVector a -> DVector a -> DVector a Source #

(^-^) :: (Unbox a, Num a) => DVector a -> DVector a -> DVector a Source #

Inner SVector Vector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

inner :: (Unbox a, Num a) => SVector a -> Vector a -> a Source #

metricL2 :: (Unbox a, Floating a) => SVector a -> Vector a -> a Source #

(^+^) :: (Unbox a, Num a) => SVector a -> Vector a -> Vector a Source #

(^-^) :: (Unbox a, Num a) => SVector a -> Vector a -> Vector a Source #

Inner SVector DVector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

inner :: (Unbox a, Num a) => SVector a -> DVector a -> a Source #

metricL2 :: (Unbox a, Floating a) => SVector a -> DVector a -> a Source #

(^+^) :: (Unbox a, Num a) => SVector a -> DVector a -> DVector a Source #

(^-^) :: (Unbox a, Num a) => SVector a -> DVector a -> DVector a Source #

Inner SVector SVector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

inner :: (Unbox a, Num a) => SVector a -> SVector a -> a Source #

metricL2 :: (Unbox a, Floating a) => SVector a -> SVector a -> a Source #

(^+^) :: (Unbox a, Num a) => SVector a -> SVector a -> SVector a Source #

(^-^) :: (Unbox a, Num a) => SVector a -> SVector a -> SVector a Source #

class Scale v where Source #

Scale a vector

Methods

(.*) :: (Unbox a, Num a) => a -> v a -> v a Source #

Instances

Instances details
Scale Vector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

(.*) :: (Unbox a, Num a) => a -> Vector a -> Vector a Source #

Scale DVector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

(.*) :: (Unbox a, Num a) => a -> DVector a -> DVector a Source #

Scale SVector Source # 
Instance details

Defined in Data.RPTree.Internal

Methods

(.*) :: (Unbox a, Num a) => a -> SVector a -> SVector a Source #

Helpers for implementing Inner instances

Inner product

innerSS :: (Vector u (Int, a), Vector v (Int, a), Unbox a, Num a) => u (Int, a) -> v (Int, a) -> a Source #

sparse-sparse inner product

innerSD :: (Num a, Vector u (Int, a), Vector v a, Unbox a) => u (Int, a) -> v a -> a Source #

sparse-dense inner product

innerDD :: (Vector v a, Num a) => v a -> v a -> a Source #

L2 distance

metricSSL2 :: (Floating a, Vector u a, Unbox a, Vector u (Int, a), Vector v (Int, a)) => u (Int, a) -> v (Int, a) -> a Source #

Vector distance induced by the L2 norm (sparse-sparse)

metricSDL2 :: (Floating a, Unbox a, Vector u (Int, a), Vector v a) => u (Int, a) -> v a -> a Source #

Vector distance induced by the L2 norm (sparse-dense)

Scale

scaleS :: (Vector v (a, b), Num b) => b -> v (a, b) -> v (a, b) Source #

scaleD :: (Vector v b, Num b) => b -> v b -> v b Source #

Rendering

CSV

writeCsv Source #

Arguments

:: (Foldable t, Unbox a, Show a, Show b) 
=> FilePath

path of output file

-> t (Vector (DVector a, b))

data point, label

-> IO () 

Encode dataset as CSV and save into file

GraphViz dot

writeDot Source #

Arguments

:: Ord t 
=> (t -> String)

how to render the node content

-> FilePath

path of output file

-> String

graph name

-> RPTree d x t 
-> IO () 

tree to graphviz dot format

Testing

liftC :: (Monad m, MonadTrans t) => ConduitT i o m r -> ConduitT i o (t m) r Source #

Random generation

Conduit

dataSource Source #

Arguments

:: Monad m 
=> Int

number of vectors to generate

-> GenT m a

random generator for the vector components

-> ConduitT i a (GenT m) () 

Source of random data points

datS Source #

Arguments

:: Monad m 
=> Int

number of data points

-> Int

vector dimension

-> Double

nonzero density

-> ConduitT i (SVector Double) (GenT m) () 

binary mixture of isotropic Gaussian rvs with sparse components

datD Source #

Arguments

:: Monad m 
=> Int

number of data points

-> Int

vector dimension

-> ConduitT i (DVector Double) (GenT m) () 

binary mixture of isotropic Gaussian rvs

Vector data

sparse Source #

Arguments

:: (Monad m, Unbox a) 
=> Double

nonzero density

-> Int

vector dimension

-> GenT m a

random generator of vector components

-> GenT m (SVector a) 

Generate a sparse random vector with a given nonzero density and components sampled from the supplied random generator

dense Source #

Arguments

:: (Monad m, Vector Vector a) 
=> Int

vector dimension

-> GenT m a

random generator of vector components

-> GenT m (DVector a) 

Generate a dense random vector with components sampled from the supplied random generator