The NaturalLanguageAlphabets package

[Tags: bsd3, library]

Provides different encoding for characters and words in natural language processing. A character will often be encoded as a unicode text string as we deal with multi-symbol characters.

Internal encoding of IMMC symbols are 0-based integers, which allows for the use of unboxed containers.

A very simple unigram-based scoring scheme is also provided.


[Skip to ReadMe]

Properties

Versions0.0.0.1, 0.0.1.0
Change logchangelog.md
Dependenciesarray (==0.5.*), attoparsec (>=0.10 && <0.13), base (>4.7 && <4.9), bimaps (==0.0.0.*), bytestring (>=0.10.4), deepseq (>=1.3 && <1.5), file-embed (>=0.0.6 && <0.0.9), hashable (==1.2.*), hashtables (>=1.1 && <1.3), intern (==0.9.*), stringable (>=0.1.2 && <0.2), system-filepath (>=0.4.9 && <0.5), text (>=0.11 && <1.3), unordered-containers (>=0.2.3 && <0.3), vector (==0.10.*), vector-th-unbox (==0.2.*) [details]
LicenseBSD3
CopyrightChristian Hoener zu Siederdissen, 2014-2015
AuthorChristian Hoener zu Siederdissen
Maintainerchoener@bioinf.uni-leipzig.de
Stabilityexperimental
CategoryNatural Language Processing
Home pagehttp://www.bioinf.uni-leipzig.de/~choener/
Source repositoryhead: git clone git://github.com/choener/NaturalLanguageAlphabets
UploadedThu May 7 23:59:33 UTC 2015 by ChristianHoener
DistributionsNixOS:0.0.1.0
Downloads309 total (25 in last 30 days)
Votes
0 []
StatusDocs available [build log]
Last success reported on 2015-05-08 [all 1 reports]

Modules

[Index]

Flags

NameDescriptionDefaultType
llvmbuild the benchmark using LLVMDisabledManual

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainers' corner

For package maintainers and hackage trustees

Readme for NaturalLanguageAlphabets-0.0.1.0

Natural Language Alphabets

Build Status

Efficient, alphabet symbols. The symbols are interned, and hashed. This is quite useful for k-gram scoring, where we have different sets of symbols with different scores. IMMC symbols are internally represented via Ints in the range [0..]. This makes it possible to use unboxed containers when handling IMMC symbols.

Contact

Christian Hoener zu Siederdissen choener@bioinf.uni-leipzig.de Leipzig University, Leipzig, Germany