Readme for delta-h-0.0.3
= DELTA-H
Online entropy-based model of lexical category acquisition.
Grzegorz Chrupala and Afra Alishahi
= INSTALL
Install the Haskell Platform: http://hackage.haskell.org/platform/
On linux, the following command will install the delta-h executable in the
bin directory:
cabal install --prefix=`pwd`
= USAGE
The data directory has an example input file data/goat.txt
The other files are CHILDES.
To induce a model (i.e. a set of clusters), execute the following:
> ./bin/delta-h learn '[-12,0,12]' data/goat.txt
The argument '[-12,0,12]' specifies the features to be used (in this case
preceding bigram, focus word, and following bigram. Feature ids can be
inspected in the source file src/Entropy/Features.hs
The model will be stored in data/goat.txt.[-12,0,12].learn.model
You can display the model in a human-readable format with:
> ./bin/delta-h display data/goat.txt.[-12,0,12].learn.model
The learned model can also be used to label input data, without
further learning:
> ./bin/delta-h label True True data/goat.txt.[-12,0,12].learn.model < \
data/goat.txt
The first argument specifies whether to use focus word for labeling,
the second argument whether to avoid outputting new cluster ids (not
in the model).
There is also a command which test the learned model on the word
prediction task:
> ./bin/delta-h eval-mrr True True data/goat.txt.[-12,0,12].learn.model < \
data/goat.txt
The first argument specifies whether to marginalize over all cluster
assignments, the second whether to output detailed information.
The semantic property prediction task can be run with the eval-sem command:
> ./bin/delta-h eval-sem False data/lexicon TRAIN.pos TRAIN.cluster \
TEST.pos TEST.cluster
The meaning of the arguments to this command:
False - do not produce verbose output
data/lexicon - semantic property lexicon file (generated from Wordnet)
TRAIN.pos - POS tagged train data
TRAIN.cluster - train data labeled with cluster IDs (use the label command to
generate it)
TEST.pos - POS tagged test data
TEST.cluster - test data labeled with cluster IDs (use the label command to
generate it)
= SOURCES
There are some other (currently undocumented) commands: inspect src/Main.hs
The main part of the model is implemented in src/Entropy/Algorithm.hs.