Safe Haskell | None |
---|
NLP.SwiftLDA
Contents
Description
Latent Dirichlet Allocation
Imperative implementation of a collapsed Gibbs sampler for LDA. This library uses the topic modeling terminology (documents, words, topics), even though it is generic. For example if used for word class induction, replace documents with word types, words with features and topics with word classes.
Usage example:
import qualified Data.Vector as V import qualified Data.Vector.Unboxed as U import Control.Monad.ST import NLP.SwiftLDA main = do -- Initialize model. let river = 1 money = 2 bank = 3 m <- stToIO $ initial (U.singleton 42) 2 10 0.1 let docs = [ (1, U.fromList [river, river, bank]) , (2, U.fromList [money, money, bank]) , (3, U.fromList [river, money, bank]) ] -- Run 10 iterations of sampling on this batch of documents. result <- stToIO $ run m 10 docs -- Display topic assignments. print result -- Run one iteration of sampling on another batch and display result print =<< (stToIO $ run m 1 [(4, U.fromList [bank, bank])]) -- Retrieve and display word-topic weights. fm <- stToIO $ finalize m print $ wordTopics fm
- initial :: Vector Word32 -> Int -> Double -> Double -> ST s (LDA s)
- finalize :: LDA s -> ST s Finalized
- run :: Traversable f => LDA s -> Int -> f Doc -> ST s (f (Vector Z))
- data LDA s
- type Doc = (D, Vector W)
- type D = Int
- type W = Int
- type Z = Int
- type Table2D = IntMap Table1D
- type Table1D = IntMap Double
- data Finalized = Finalized {}
- docTopicWeights_ :: LDA s -> Doc -> ST s (Vector Double)
- priorDocTopicWeights_ :: LDA s -> D -> ST s (Vector Double)
- docTopicWeights :: Finalized -> Doc -> Vector Double
- wordTopicWeights :: Finalized -> D -> W -> Vector Double
- docCounts :: Finalized -> Table1D
Initialization and finalization
initial :: Vector Word32 -> Int -> Double -> Double -> ST s (LDA s)Source
initial s k a b
initializes model with k
topics, a/k
alpha hyperparameter, b
beta hyperparameter and random seed s
.
finalize :: LDA s -> ST s FinalizedSource
Create transparent immutable object holding model information from opaque internal representation.
Running sampler
run :: Traversable f => LDA s -> Int -> f Doc -> ST s (f (Vector Z))Source
run m i batch
runs an outer loop of i
passes of Gibbs
sampling over documents in batch
using the model m
and returns
the topic assignments for words in the documents of the batch.
Datatypes
Access model information
Constructors
Finalized | |
Fields
|
Querying evolving model
Querying finalized model
docTopicWeights :: Finalized -> Doc -> Vector DoubleSource
docTopicWeights m doc
returns unnormalized topic probabilities
for document doc given LDA model m
.