Copyright | (c) Alp Mestanogullari, Ville Tirronen, 2011-2015 |
---|---|
License | BSD3 |
Maintainer | Alp Mestanogullari <alpmestan@gmail.com> |
Stability | experimental |
Safe Haskell | None |
Language | Haskell98 |
Math.KMeans
Description
An implementation of the k-means clustering algorithm based on the vector package.
The core functions of this module are kmeans
and kmeansWith
. See some examples
on github.
- kmeans :: (a -> Vector Double) -> Distance -> Int -> [a] -> Clusters a
- kmeansWith :: Monad m => (Int -> [a] -> m (Clusters a)) -> (a -> Vector Double) -> Distance -> Int -> [a] -> m (Clusters a)
- type Distance = Vector Double -> Vector Double -> Double
- type Clusters a = Vector (Cluster a)
- newtype Cluster a = Cluster {
- elements :: [a]
- type Centroids = Vector (Vector Double)
- partition :: Int -> [a] -> Clusters a
- euclidSq :: Distance
- l1dist :: Distance
- linfdist :: Distance
The meat of this package: kmeans
Arguments
:: (a -> Vector Double) | feature extraction |
-> Distance | distance function |
-> Int | the |
-> [a] | input list of |
-> Clusters a | result, hopefully |
Run the kmeans clustering algorithm.
kmeans f distance k points
will run the algorithm using f
to extract features from your type,
using distance
to measure the distance between vectors,
trying to separate points
in k
clusters.
Extracting features just means getting a Vector
with Double
coordinates that will represent your type
in the space in which kmeans
will run.
Arguments
:: Monad m | |
=> (Int -> [a] -> m (Clusters a)) | how should we partition the points? |
-> (a -> Vector Double) | get the coordinates of a "point" |
-> Distance | what distance do we use |
-> Int | number of desired clusters |
-> [a] | list of points |
-> m (Clusters a) | resulting clustering |
Same as kmeans
, except that instead of using partition
, you supply your own
function for choosing the initial clustering. Two important things to note:
- If you don't need any kind of effect and just have a
partition
-like function you want to use,m
will can just beIdentity
here. If that's too obnoxious to work with, please let me know and I may just provide a separatekmeansWith
function with no there. But most of the time, you'll probably just be interested in the following scenario. - Most likely, you want to have something smarter than our simple
partition
function. A couple of papers I have read claim very decent results by using some precise probabilistic schemas for the initial partitionning. In this case, yourm
would probably beIO
orST
(e.g using my probable package) and you could fine-tune the way the initial clusters are picked so that the algorithm may give better results. Of course, if your initialization is monadic, so is the result.
Types
A Cluster
of points is just a list of points
Misc.
partition :: Int -> [a] -> Clusters a Source
This is the current partitionning strategy used
by kmeans
. If we want k
clusters, we just
try to regroup consecutive elements in k
buckets
The euclidean distance without taking the final square root This would waste cycles without changing the behavior of the algorithm