colada-0.4.2: Colada implements incremental word class class induction using online LDA

Safe HaskellNone

Colada.WordClass

Contents

Description

Word Class induction with LDA

This module provides function which implement word class induction using the generic algorithm implemented in Colada.LDA.

You can access and set options in the Options record using lenses. Example:

  import Data.Label
  let options =   set passes 5 
                . set beta 0.01 
                . set topicNum 100 
                $ defaultOptions
  in run options sentences

Synopsis

Running the sampler

learn :: Options -> [Sentence] -> (WordClass, [Vector D])Source

learn options xs runs the LDA Gibbs sampler for word classes with options on sentences xs, and returns the resulting model together progressive class the assignments

Extracting information

summary :: WordClass -> TextSource

summary m returns a textual summary of word classes found in model m

wordTypeClasses :: WordClass -> Map Text (IntMap Double)Source

wordTypeClasses m returns a Map from word types to unnormalized distributions over word classes

Class and word prediction

label :: Bool -> WordClass -> Sentence -> Vector (Vector Double)Source

label m s returns for each word in sentences s, unnormalized probabilities of word classes.

predict :: WordClass -> Sentence -> [Vector (Double, Text)]Source

predict m s returns for each word in sentence s, unnormalized probabilities of words given predicted word class.

Data types and associated lenses

data WordClass Source

Container for the Word Class model

ldaModel :: forall (~>). Arrow ~> => Lens ~> WordClass FinalizedSource

LDA model

Word type string to atom and vice versa conversion tables

Feature string to atom and vice versa conversion tables

options :: forall (~>). Arrow ~> => Lens ~> WordClass OptionsSource

Options for Gibbs sampling

docTopics :: Finalized -> Table2D

Document topic counts

wordTopics :: Finalized -> Table2D

Word topic counts

topics :: Finalized -> Table1D

Topics counts

topicDocs :: Finalized -> Table2D

Inverse document-topic counts

topicWords :: Finalized -> Table2D

Inverse word-topic counts

featIds :: forall (~>). Arrow ~> => Lens ~> Options [Int]Source

Feature ids

topicNum :: forall (~>). Arrow ~> => Lens ~> Options IntSource

Number of topics K

alphasum :: forall (~>). Arrow ~> => Lens ~> Options DoubleSource

Dirichlet parameter alpha*K which controls topic sparseness

beta :: forall (~>). Arrow ~> => Lens ~> Options DoubleSource

Dirichlet parameter beta which controls word sparseness

passes :: forall (~>). Arrow ~> => Lens ~> Options IntSource

Number of sampling passes per batch

repeats :: forall (~>). Arrow ~> => Lens ~> Options IntSource

Number of repeats per sentences

batchSize :: forall (~>). Arrow ~> => Lens ~> Options IntSource

Number of sentences per batch

seed :: forall (~>). Arrow ~> => Lens ~> Options Word32Source

Random seed for the sampler

topn :: forall (~>). Arrow ~> => Lens ~> Options IntSource

Number of most probable words to return

initSize :: forall (~>). Arrow ~> => Lens ~> Options IntSource

initPasses :: forall (~>). Arrow ~> => Lens ~> Options IntSource

exponent :: forall (~>). Arrow ~> => Lens ~> Options (Maybe Double)Source

progressive :: forall (~>). Arrow ~> => Lens ~> Options BoolSource

lambda :: forall (~>). Arrow ~> => Lens ~> Options DoubleSource