NaturalLanguageAlphabets: Alphabet and word representations

[ bsd3, library, natural-language-processing ] [ Propose Tags ]

Provides different encoding for characters and words in natural language processing. A character will often be encoded as a unicode text string as we deal with multi-symbol characters.

Internal encoding of IMMC symbols are 0-based integers, which allows for the use of unboxed containers.

A very simple unigram-based scoring scheme is also provided.


[Skip to Readme]

Flags

Manual Flags

NameDescriptionDefault
llvm

build the benchmark using LLVM

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.0.0.1, 0.0.1.0, 0.0.2.0, 0.1.0.0, 0.1.1.0, 0.2.1.0
Change log changelog.md
Dependencies array (>=0.5 && <0.6), attoparsec (>=0.10 && <0.13), base (>4.7 && <4.9), bimaps (>=0.0.0 && <0.0.1), bytestring (>=0.10.4), deepseq (>=1.3 && <1.5), file-embed (>=0.0.6 && <0.0.9), hashable (>=1.2 && <1.3), hashtables (>=1.1 && <1.3), intern (>=0.9 && <0.10), stringable (>=0.1.2 && <0.2), system-filepath (>=0.4.9 && <0.5), text (>=0.11 && <1.3), unordered-containers (>=0.2.3 && <0.3), vector (>=0.10 && <0.11), vector-th-unbox (>=0.2 && <0.3) [details]
License BSD-3-Clause
Copyright Christian Hoener zu Siederdissen, 2014-2015
Author Christian Hoener zu Siederdissen
Maintainer choener@bioinf.uni-leipzig.de
Category Natural Language Processing
Home page http://www.bioinf.uni-leipzig.de/~choener/
Source repo head: git clone git://github.com/choener/NaturalLanguageAlphabets
Uploaded by ChristianHoener at 2015-05-07T23:59:33Z
Distributions
Reverse Dependencies 2 direct, 0 indirect [details]
Downloads 4264 total (16 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2015-05-08 [all 1 reports]

Readme for NaturalLanguageAlphabets-0.0.1.0

[back to package description]

Natural Language Alphabets

Build Status

Efficient, alphabet symbols. The symbols are interned, and hashed. This is quite useful for k-gram scoring, where we have different sets of symbols with different scores. IMMC symbols are internally represented via Ints in the range [0..]. This makes it possible to use unboxed containers when handling IMMC symbols.

Contact

Christian Hoener zu Siederdissen choener@bioinf.uni-leipzig.de Leipzig University, Leipzig, Germany