delta-h: Online entropy-based model of lexical category acquisition.

[ bsd3, library, natural-language-processing, program ] [ Propose Tags ] [ Report a vulnerability ]

Implementation of the model described in Grzegorz Chrupała and Afra Alishahi, Online Entropy-based Model of Lexical Category Acquisition, CoNLL 2010 http://www.lsv.uni-saarland.de/personalPages/gchrupala/papers/conll-2010.pdf

[Skip to Readme]

Modules

[Index]

Entropy
- Entropy.Algorithm
- Entropy.Features
ListZipper
Reader

Downloads

delta-h-0.0.3.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

GrzegorzChrupala

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.0.1, 0.0.2, 0.0.3
Dependencies	base (>=3 && <5), binary, bytestring, containers, monad-atom (>=0.4), nlp-scores, text [details]
License	BSD-3-Clause
Author	Grzegorz Chrupala and Afra Alishahi
Maintainer	pitekus@gmail.com
Uploaded	by GrzegorzChrupala at 2012-02-29T17:27:10Z
Category	Natural Language Processing
Home page	https://bitbucket.org/gchrupala/delta-h
Distributions
Reverse Dependencies	1 direct, 0 indirect [details]
Executables	delta-h
Downloads	2898 total (14 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs uploaded by user Build status unknown [no reports yet]

Readme for delta-h-0.0.3

[back to package description]

= DELTA-H

Online entropy-based model of lexical category acquisition.
Grzegorz Chrupala and Afra Alishahi

= INSTALL

Install the Haskell Platform: http://hackage.haskell.org/platform/

On linux, the following command will install the delta-h executable in the 
bin directory:

cabal install --prefix=`pwd`

= USAGE

The data directory has an example input file data/goat.txt
The other files are CHILDES.

To induce a model (i.e. a set of clusters), execute the following:

> ./bin/delta-h learn '[-12,0,12]' data/goat.txt

The argument '[-12,0,12]' specifies the features to be used (in this case
preceding bigram, focus word, and following bigram. Feature ids can be
inspected in the source file src/Entropy/Features.hs

The model will be stored in data/goat.txt.[-12,0,12].learn.model

You can display the model in a human-readable format with:

> ./bin/delta-h display  data/goat.txt.[-12,0,12].learn.model

The learned model can also be used to label input data, without
further learning:

> ./bin/delta-h label True True data/goat.txt.[-12,0,12].learn.model < \
data/goat.txt

The first argument specifies whether to use focus word for labeling,
the second argument whether to avoid outputting new cluster ids (not
in the model).

There is also a command which test the learned model on the word
prediction task:

> ./bin/delta-h eval-mrr True True  data/goat.txt.[-12,0,12].learn.model < \
 data/goat.txt

The first argument specifies whether to marginalize over all cluster
assignments, the second whether to output detailed information.

The semantic property prediction task can be run with the eval-sem command:
> ./bin/delta-h eval-sem False data/lexicon TRAIN.pos TRAIN.cluster \
       TEST.pos TEST.cluster

The meaning of the arguments to this command:
  False         - do not produce verbose output
  data/lexicon  - semantic property lexicon file (generated from Wordnet)
  TRAIN.pos     - POS tagged train data
  TRAIN.cluster - train data labeled with cluster IDs (use the label command to 
                  generate it)
  TEST.pos      - POS tagged test data
  TEST.cluster  - test data labeled with cluster IDs  (use the label command to 
                  generate it)

= SOURCES

There are some other (currently undocumented) commands: inspect src/Main.hs

The main part of the model is implemented in src/Entropy/Algorithm.hs.