Safe Haskell | None |
Language | Haskell2010 |
Data types representing the POS tags and Chunk tags derived from the Conll2000 training corpus.
Named entity categories defined for the Conll 2003 task.
Phrase chunk tags defined for the Conll task.
tagTxtPatterns :: [(Text, Text)] Source
Order matters here: The patterns are replaced in reverse order when generating tags, and in top-to-bottom when generating tags.
reversePatterns :: [(Text, Text)] Source
These tags may actually be the Penn Treebank tags. But I have not (yet?) seen the punctuation tags added to the Penn set.
This particular list was complied from the union of:
- All tags used on the Conll2000 training corpus. (contributing the punctuation tags)
- The PennTreebank tags, listed here: (which contributed LS over the items in the corpus).
- The tags: START, END, and Unk, which are used by Chatter.
START | START tag, used in training. |
END | END tag, used in training. |
Hash | # |
Dollar | $ |
CloseDQuote | '' |
OpenDQuote | `` |
Op_Paren | ( |
Cl_Paren | ) |
Comma | , |
Term | . Sentence Terminator |
Colon | : |
CC | Coordinating conjunction |
CD | Cardinal number |
DT | Determiner |
EX | Existential there |
FW | Foreign word |
IN | Preposition or subordinating conjunction |
JJ | Adjective |
JJR | Adjective, comparative |
JJS | Adjective, superlative |
LS | List item marker |
MD | Modal |
NN | Noun, singular or mass |
NNS | Noun, plural |
NNP | Proper noun, singular |
NNPS | Proper noun, plural |
PDT | Predeterminer |
POS | Possessive ending |
PRP | Personal pronoun |
PRPdollar | Possessive pronoun |
RB | Adverb |
RBR | Adverb, comparative |
RBS | Adverb, superlative |
RP | Particle |
SYM | Symbol |
TO | to |
UH | Interjection |
VB | Verb, base form |
VBD | Verb, past tense |
VBG | Verb, gerund or present participle |
VBN | Verb, past participle |
VBP | Verb, non-3rd person singular present |
VBZ | Verb, 3rd person singular present |
WDT | Wh-determiner |
WP | Wh-pronoun |
WPdollar | Possessive wh-pronoun |
WRB | Wh-adverb |
Unk |