Safe Haskell | None |
---|
Word Class induction with LDA
This module provides function which implement word class induction using the generic algorithm implemented in Colada.LDA.
You can access and set options in the Options
record using lenses.
Example:
import Data.Label let options = set passes 5 . set beta 0.01 . set topicNum 100 $ defaultOptions in run options sentences
- learn :: Options -> [Sentence] -> (WordClass, [Vector D])
- defaultOptions :: Options
- summary :: WordClass -> Text
- summarize :: Bool -> WordClass -> Text
- wordTypeClasses :: WordClass -> Map Text (IntMap Double)
- label :: Bool -> WordClass -> Sentence -> Vector (Vector Double)
- predict :: WordClass -> Sentence -> [Vector (Double, Text)]
- data WordClass
- ldaModel :: forall (~>). Arrow ~> => Lens ~> WordClass Finalized
- wordTypeTable :: forall (~>). Arrow ~> => Lens ~> WordClass (AtomTable (Vector Char))
- featureTable :: forall (~>). Arrow ~> => Lens ~> WordClass (AtomTable (Vector Char))
- options :: forall (~>). Arrow ~> => Lens ~> WordClass Options
- data Finalized
- docTopics :: Finalized -> Table2D
- wordTopics :: Finalized -> Table2D
- topics :: Finalized -> Table1D
- topicDocs :: Finalized -> Table2D
- topicWords :: Finalized -> Table2D
- data Options
- featIds :: forall (~>). Arrow ~> => Lens ~> Options [Int]
- topicNum :: forall (~>). Arrow ~> => Lens ~> Options Int
- alphasum :: forall (~>). Arrow ~> => Lens ~> Options Double
- beta :: forall (~>). Arrow ~> => Lens ~> Options Double
- passes :: forall (~>). Arrow ~> => Lens ~> Options Int
- repeats :: forall (~>). Arrow ~> => Lens ~> Options Int
- batchSize :: forall (~>). Arrow ~> => Lens ~> Options Int
- seed :: forall (~>). Arrow ~> => Lens ~> Options Word32
- topn :: forall (~>). Arrow ~> => Lens ~> Options Int
- initSize :: forall (~>). Arrow ~> => Lens ~> Options Int
- initPasses :: forall (~>). Arrow ~> => Lens ~> Options Int
- exponent :: forall (~>). Arrow ~> => Lens ~> Options (Maybe Double)
- progressive :: forall (~>). Arrow ~> => Lens ~> Options Bool
- lambda :: forall (~>). Arrow ~> => Lens ~> Options Double
Running the sampler
learn :: Options -> [Sentence] -> (WordClass, [Vector D])Source
learn options xs
runs the LDA Gibbs sampler for word classes
with options
on sentences xs
, and returns the resulting model
together progressive class the assignments
Extracting information
summary :: WordClass -> TextSource
summary m
returns a textual summary of word classes found in
model m
wordTypeClasses :: WordClass -> Map Text (IntMap Double)Source
wordTypeClasses m
returns a Map from word types to unnormalized
distributions over word classes
Class and word prediction
label :: Bool -> WordClass -> Sentence -> Vector (Vector Double)Source
label m s
returns for each word in sentences s,
unnormalized probabilities of word classes.
predict :: WordClass -> Sentence -> [Vector (Double, Text)]Source
predict m s
returns for each word in sentence s, unnormalized
probabilities of words given predicted word class.
Data types and associated lenses
LDA model
Word type string to atom and vice versa conversion tables
Feature string to atom and vice versa conversion tables
Options for Gibbs sampling
wordTopics :: Finalized -> Table2D
Word topic counts
topicWords :: Finalized -> Table2D
Inverse word-topic counts
Feature ids
Number of topics K
Dirichlet parameter alpha*K which controls topic sparseness
Dirichlet parameter beta which controls word sparseness
Number of sampling passes per batch
Number of repeats per sentences
Number of sentences per batch
Random seed for the sampler
Number of most probable words to return