Safe Haskell	None
Language	Haskell2010

AI.Clustering.KMeans

Contents

Initialization methods
References

Synopsis

data KMeans a = KMeans {
- membership :: Vector Int
- centers :: Matrix Double
- clusters :: Maybe [[a]]
- sse :: Double
}
data KMeansOpts = KMeansOpts {
- kmeansMethod :: Method
- kmeansSeed :: Vector Word32
- kmeansClusters :: Bool
- kmeansMaxIter :: Int
}
defaultKMeansOpts :: KMeansOpts
kmeans :: Int -> Matrix Double -> KMeansOpts -> KMeans (Vector Double)
kmeansBy :: Vector v a => Int -> v a -> (a -> Vector Double) -> KMeansOpts -> KMeans a
data Method
- = Forgy
- | KMeansPP
- | Centers (Matrix Double)
decode :: Vector Int -> [a] -> [[a]]

Documentation

data KMeans a Source #

Results from running kmeans

Constructors

KMeans
Fields membership :: Vector Int A vector of integers (0 ~ k-1) indicating the cluster to which each point is allocated. centers :: Matrix Double A matrix of cluster centers. clusters :: Maybe [[a]] sse :: Double the sum of squared error (SSE)

Instances

Show a => Show (KMeans a) Source #
Instance details Defined in AI.Clustering.KMeans.Types Methods showsPrec :: Int -> KMeans a -> ShowS # show :: KMeans a -> String # showList :: [KMeans a] -> ShowS #

data KMeansOpts Source #

Constructors

KMeansOpts
Fields kmeansMethod :: Method kmeansSeed :: Vector Word32 Seed for random number generation kmeansClusters :: Bool Wether to return clusters, may use a lot memory kmeansMaxIter :: Int Maximum iteration

defaultKMeansOpts :: KMeansOpts Source #

Default options. > defaultKMeansOpts = KMeansOpts > { kmeansMethod = KMeansPP > , kmeansSeed = U.fromList [1,2,3,4,5,6,7] > , kmeansClusters = True > , kmeansMaxIter = 10 > }

kmeans Source #

Arguments

:: Int	The number of clusters
-> Matrix Double	Input data stored as rows in a matrix
-> KMeansOpts
-> KMeans (Vector Double)

Perform K-means clustering

kmeansBy Source #

Arguments

:: Vector v a
=> Int	The number of clusters
-> v a	Input data
-> (a -> Vector Double)
-> KMeansOpts
-> KMeans a

Perform K-means clustering, using a feature extraction function

Initialization methods

data Method Source #

Different initialization methods

Constructors

Forgy	The Forgy method randomly chooses k unique observations from the data set and uses these as the initial means.
KMeansPP	K-means++ algorithm.
Centers (Matrix Double)	Provide a set of k centroids

decode :: Vector Int -> [a] -> [[a]] Source #

Assign data to clusters based on KMeans result

References

Arthur, D. and Vassilvitskii, S. (2007). k-means++: the advantages of careful seeding. Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027–1035.