swift-lda-0.4.0: Online sampler for Latent Dirichlet Allocation

Safe HaskellSafe-Infered




Latent Dirichlet Allocation

Imperative implementation of a collapsed Gibbs sampler for LDA. This library uses the topic modeling terminology (documents, words, topics), even though it is generic. For example if used for word class induction, replace documents with word types, words with features and topics with word classes.



pass :: Int -> LDA s -> Vector Doc -> ST s (Vector Doc)Source

pass batch runs one pass of Gibbs sampling on documents in batch

passOne :: Int -> LDA s -> Doc -> ST s DocSource

Run a pass on a single doc


data LDA s Source

Abstract type holding the settings and the state of the sampler

type Doc = (D, Vector (W, Maybe Z))Source

type D = IntSource

type W = IntSource

type Z = IntSource

Access model information

data Finalized Source




docTopics :: !Table2D

Document topic counts

wordTopics :: !Table2D

Word topic counts

topics :: !Table1D

Topics counts

topicDocs :: !Table2D

Inverse document-topic counts

topicWords :: !Table2D

Inverse word-topic counts

alphasum :: !Double

alpha * K Dirichlet parameter (topic sparseness)

beta :: !Double

beta Dirichlet parameter (word sparseness)

topicNum :: !Int

Number of topics K

wSize :: !Int

Number of unique words

exponent :: !(Maybe Double)

Learning rate exponent


Initialization and finalization

initial :: Vector Word32 -> Int -> Double -> Double -> Maybe Double -> ST s (LDA s)Source

initial s k a b initializes model with k topics, a/k alpha hyperparameter, b beta hyperparameter and random seed s

finalize :: LDA s -> ST s FinalizedSource

Create transparent immutable object holding model information from opaque internal representation

Querying evolving model

Querying finalized model

docTopicWeights :: Finalized -> Doc -> Vector DoubleSource

docTopicWeights m doc returns unnormalized topic probabilities for document doc given LDA model m

wordTopicWeights :: Finalized -> D -> W -> Vector DoubleSource

topicWeights m d w returns the unnormalized probabilities of topics for word w in document d given LDA model m.