| Copyright | (c) Alp Mestanogullari, Ville Tirronen, 2011-2015 |
|---|---|
| License | BSD3 |
| Maintainer | Alp Mestanogullari <alpmestan@gmail.com> |
| Stability | experimental |
| Safe Haskell | None |
| Language | Haskell98 |
Math.KMeans
Description
An implementation of the k-means clustering algorithm based on the vector package.
The core functions of this module are kmeans and kmeansWith. See some examples
on github.
- kmeans :: (a -> Vector Double) -> Distance -> Int -> [a] -> Clusters a
- kmeansWith :: Monad m => (Int -> [a] -> m (Clusters a)) -> (a -> Vector Double) -> Distance -> Int -> [a] -> m (Clusters a)
- type Distance = Vector Double -> Vector Double -> Double
- type Clusters a = Vector (Cluster a)
- newtype Cluster a = Cluster {
- elements :: [a]
- type Centroids = Vector (Vector Double)
- partition :: Int -> [a] -> Clusters a
- euclidSq :: Distance
- l1dist :: Distance
- linfdist :: Distance
The meat of this package: kmeans
Arguments
| :: (a -> Vector Double) | feature extraction |
| -> Distance | distance function |
| -> Int | the |
| -> [a] | input list of |
| -> Clusters a | result, hopefully |
Run the kmeans clustering algorithm.
kmeans f distance k points
will run the algorithm using f to extract features from your type,
using distance to measure the distance between vectors,
trying to separate points in k clusters.
Extracting features just means getting a Vector
with Double coordinates that will represent your type
in the space in which kmeans will run.
Arguments
| :: Monad m | |
| => (Int -> [a] -> m (Clusters a)) | how should we partition the points? |
| -> (a -> Vector Double) | get the coordinates of a "point" |
| -> Distance | what distance do we use |
| -> Int | number of desired clusters |
| -> [a] | list of points |
| -> m (Clusters a) | resulting clustering |
Same as kmeans, except that instead of using partition, you supply your own
function for choosing the initial clustering. Two important things to note:
- If you don't need any kind of effect and just have a
partition-like function you want to use,mwill can just beIdentityhere. If that's too obnoxious to work with, please let me know and I may just provide a separatekmeansWithfunction with no there. But most of the time, you'll probably just be interested in the following scenario. - Most likely, you want to have something smarter than our simple
partitionfunction. A couple of papers I have read claim very decent results by using some precise probabilistic schemas for the initial partitionning. In this case, yourmwould probably beIOorST(e.g using my probable package) and you could fine-tune the way the initial clusters are picked so that the algorithm may give better results. Of course, if your initialization is monadic, so is the result.
Types
A Cluster of points is just a list of points
Misc.
partition :: Int -> [a] -> Clusters a Source
This is the current partitionning strategy used
by kmeans. If we want k clusters, we just
try to regroup consecutive elements in k buckets
The euclidean distance without taking the final square root This would waste cycles without changing the behavior of the algorithm