colada-0.7.0.0: Colada implements incremental word class class induction using online LDA

Safe HaskellNone

Colada.WordClass

Contents

Description

Word Class induction with LDA

This module provides function which implement word class induction using the generic algorithm implemented in Colada.LDA.

You can access and set options in the Options record using lenses. Example:

  import Data.Label
  let options =   set passes 5 
                . set beta 0.01 
                . set topicNum 100 
                $ defaultOptions
  in run options sentences

Synopsis

Running the sampler

learnIO :: Options -> (Vector (Vector Double) -> IO ()) -> [Sentence] -> IO WordClassSource

learnIO options f xs runs the LDA Gibbs sampler for word classes with options on sentences xs, and returns the resulting model. The progressive class assignments are passed to the handler function f.

Extracting information

summary :: WordClass -> TextSource

summary m returns a textual summary of word classes found in model m

wordTypeClasses :: WordClass -> Map Text (IntMap Double)Source

wordTypeClasses m returns a Map from word types to unnormalized distributions over word classes

Class and word prediction

label :: Bool -> WordClass -> Sentence -> Vector (Vector Double)Source

label m s returns for each word in sentences s, unnormalized probabilities of word classes.

predict :: WordClass -> Sentence -> [Vector (Double, Text)]Source

predict m s returns for each word in sentence s, unnormalized probabilities of words given predicted word class.

Data types and associated lenses

data WordClass Source

Container for the Word Class model

ldaModel :: forall arr. Arrow arr => Lens arr WordClass FinalizedSource

LDA model

Word type string to atom and vice versa conversion tables

Feature string to atom and vice versa conversion tables

options :: forall arr. Arrow arr => Lens arr WordClass OptionsSource

Options for Gibbs sampling

docTopics :: Finalized -> Table2D

Document topic counts

wordTopics :: Finalized -> Table2D

Word topic counts

topics :: Finalized -> Table1D

Topics counts

topicDocs :: Finalized -> Table2D

Inverse document-topic counts

topicWords :: Finalized -> Table2D

Inverse word-topic counts

featIds :: forall arr. Arrow arr => Lens arr Options [Int]Source

Feature ids

topicNum :: forall arr. Arrow arr => Lens arr Options IntSource

Number of topics K

alphasum :: forall arr. Arrow arr => Lens arr Options DoubleSource

Dirichlet parameter alpha*K which controls topic sparseness

beta :: forall arr. Arrow arr => Lens arr Options DoubleSource

Dirichlet parameter beta which controls word sparseness

passes :: forall arr. Arrow arr => Lens arr Options IntSource

Number of sampling passes per batch

batchSize :: forall arr. Arrow arr => Lens arr Options IntSource

Number of sentences per batch

seed :: forall arr. Arrow arr => Lens arr Options Word32Source

Random seed for the sampler

topn :: forall arr. Arrow arr => Lens arr Options IntSource

Number of most probable words to return

exponent :: forall arr. Arrow arr => Lens arr Options (Maybe Double)Source

progressive :: forall arr. Arrow arr => Lens arr Options BoolSource

lambda :: forall arr. Arrow arr => Lens arr Options DoubleSource