Safe Haskell	None
Language	Haskell2010

NLP.Tokenize.Annotations

Synopsis

Documentation

Create a tokenizer that protects the provided terms (to tokenize multi-word terms)

Tokenize on whitespace, as defined by 'ch -> Char.isSeparator ch || Char.isSpace ch'

Split common contractions off and freeze them. Currently deals with: 'm, 's, 'd, 've, 'll, and negations (n't)

Tokenize on characters that satisfy the provided predicate.