NaturalLanguageAlphabets: Alphabet and word representations

[ bsd3, library, natural-language-processing ] [ Propose Tags ]

Provides different encoding for characters and words in natural language processing. A character will often be encoded as a unicode text string as we deal with multi-symbol characters.

Internal encoding of IMMC symbols are 0-based integers, which allows for the use of unboxed containers.

A very simple unigram-based scoring scheme and DSL to write such schemes are also provided.

https://github.com/choener/NaturalLanguageAlphabets/blob/master/README.md


[Skip to Readme]

Flags

Manual Flags

NameDescriptionDefault
llvm

build the benchmark using LLVM

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.0.0.1, 0.0.1.0, 0.0.2.0, 0.1.0.0, 0.1.1.0, 0.2.1.0
Change log changelog.md
Dependencies aeson (>=0.8 && <0.11), array (>=0.5 && <0.6), attoparsec (>=0.10 && <0.14), base (>4.7 && <4.9), bimaps (>=0.0.0.4 && <0.0.1), binary (>=0.7 && <0.8), bytestring (>=0.10.4), cereal (>=0.4 && <0.5), cereal-text (>=0.1 && <0.2), deepseq (>=1.3 && <1.5), file-embed (>=0.0.6 && <0.0.10), hashable (>=1.2 && <1.3), hashtables (>=1.1 && <1.3), intern (>=0.9 && <0.10), QuickCheck (>=2.7 && <2.9), stringable (>=0.1.2 && <0.2), system-filepath (>=0.4.9 && <0.5), text (>=0.11 && <1.3), text-binary (>=0.1 && <0.3), unordered-containers (>=0.2.3 && <0.3), vector (>=0.10 && <0.12), vector-th-unbox (>=0.2 && <0.3) [details]
License BSD-3-Clause
Copyright Christian Hoener zu Siederdissen, 2014-2015
Author Christian Hoener zu Siederdissen
Maintainer choener@bioinf.uni-leipzig.de
Category Natural Language Processing
Home page https://github.com/choener/NaturalLanguageAlphabets
Bug tracker https://github.com/choener/NaturalLanguageAlphabets/issues
Source repo head: git clone git://github.com/choener/NaturalLanguageAlphabets
Uploaded by ChristianHoener at 2015-11-22T19:37:12Z
Distributions
Reverse Dependencies 2 direct, 0 indirect [details]
Downloads 4276 total (20 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2015-12-08 [all 3 reports]

Readme for NaturalLanguageAlphabets-0.0.2.0

[back to package description]

Build Status

Natural Language Alphabets

Efficient, alphabet symbols. The symbols are interned, and hashed. This is quite useful for k-gram scoring, where we have different sets of symbols with different scores. IMMC symbols are internally represented via Ints in the range [0..]. This makes it possible to use unboxed containers when handling IMMC symbols.

Contact

Christian Hoener zu Siederdissen
Leipzig University, Leipzig, Germany
choener@bioinf.uni-leipzig.de
http://www.bioinf.uni-leipzig.de/~choener/