Safe Haskell | None |
---|
Word Class induction with LDA
This module provides function which implement word class induction using the generic algorithm implemented in Colada.LDA.
You can access and set options in the Options
record using lenses.
Example:
import Data.Label let options = set passes 5 . set beta 0.01 . set topicNum 100 $ defaultOptions in run options sentences
- learn :: Options -> [Sentence] -> (WordClass, [Vector (Vector Double)])
- defaultOptions :: Options
- summary :: WordClass -> Text
- summarize :: Bool -> WordClass -> Text
- wordTypeClasses :: WordClass -> Map Text (IntMap Double)
- label :: Bool -> WordClass -> Sentence -> Vector (Vector Double)
- predict :: WordClass -> Sentence -> [Vector (Double, Text)]
- data WordClass
- ldaModel :: forall arr. Arrow arr => Lens arr WordClass Finalized
- wordTypeTable :: forall arr. Arrow arr => Lens arr WordClass (AtomTable (Vector Char))
- featureTable :: forall arr. Arrow arr => Lens arr WordClass (AtomTable (Vector Char))
- options :: forall arr. Arrow arr => Lens arr WordClass Options
- data Finalized
- docTopics :: Finalized -> Table2D
- wordTopics :: Finalized -> Table2D
- topics :: Finalized -> Table1D
- topicDocs :: Finalized -> Table2D
- topicWords :: Finalized -> Table2D
- data Options
- featIds :: forall arr. Arrow arr => Lens arr Options [Int]
- topicNum :: forall arr. Arrow arr => Lens arr Options Int
- alphasum :: forall arr. Arrow arr => Lens arr Options Double
- beta :: forall arr. Arrow arr => Lens arr Options Double
- passes :: forall arr. Arrow arr => Lens arr Options Int
- repeats :: forall arr. Arrow arr => Lens arr Options Int
- batchSize :: forall arr. Arrow arr => Lens arr Options Int
- seed :: forall arr. Arrow arr => Lens arr Options Word32
- topn :: forall arr. Arrow arr => Lens arr Options Int
- initSize :: forall arr. Arrow arr => Lens arr Options Int
- initPasses :: forall arr. Arrow arr => Lens arr Options Int
- exponent :: forall arr. Arrow arr => Lens arr Options (Maybe Double)
- progressive :: forall arr. Arrow arr => Lens arr Options Bool
- lambda :: forall arr. Arrow arr => Lens arr Options Double
Running the sampler
learn :: Options -> [Sentence] -> (WordClass, [Vector (Vector Double)])Source
learn options xs
runs the LDA Gibbs sampler for word classes
with options
on sentences xs
, and returns the resulting model
together progressive class the assignments
Extracting information
summary :: WordClass -> TextSource
summary m
returns a textual summary of word classes found in
model m
wordTypeClasses :: WordClass -> Map Text (IntMap Double)Source
wordTypeClasses m
returns a Map from word types to unnormalized
distributions over word classes
Class and word prediction
label :: Bool -> WordClass -> Sentence -> Vector (Vector Double)Source
label m s
returns for each word in sentences s,
unnormalized probabilities of word classes.
predict :: WordClass -> Sentence -> [Vector (Double, Text)]Source
predict m s
returns for each word in sentence s, unnormalized
probabilities of words given predicted word class.
Data types and associated lenses
LDA model
Word type string to atom and vice versa conversion tables
Feature string to atom and vice versa conversion tables
Options for Gibbs sampling
wordTopics :: Finalized -> Table2D
Word topic counts
topicWords :: Finalized -> Table2D
Inverse word-topic counts
Feature ids
Number of topics K
Dirichlet parameter alpha*K which controls topic sparseness
Dirichlet parameter beta which controls word sparseness
Number of sampling passes per batch
Number of repeats per sentences
Number of sentences per batch
Random seed for the sampler
Number of most probable words to return