Safe Haskell | None |
---|
Data types representing the POS tags and Chunk tags derived from the Conll2000 training corpus.
- data NERTag
- data Chunk
- readTag :: Text -> Either Error Tag
- tagTxtPatterns :: [(Text, Text)]
- reversePatterns :: [(Text, Text)]
- showTag :: Tag -> Text
- replaceAll :: [(Text, Text)] -> Text -> Text
- data Tag
- = START
- | END
- | Hash
- | Dollar
- | CloseDQuote
- | OpenDQuote
- | Op_Paren
- | Cl_Paren
- | Comma
- | Term
- | Colon
- | CC
- | CD
- | DT
- | EX
- | FW
- | IN
- | JJ
- | JJR
- | JJS
- | LS
- | MD
- | NN
- | NNS
- | NNP
- | NNPS
- | PDT
- | POS
- | PRP
- | PRPdollar
- | RB
- | RBR
- | RBS
- | RP
- | SYM
- | TO
- | UH
- | VB
- | VBD
- | VBG
- | VBN
- | VBP
- | VBZ
- | WDT
- | WP
- | WPdollar
- | WRB
- | Unk
Documentation
Named entity categories defined for the Conll 2003 task.
Phrase chunk tags defined for the Conll task.
tagTxtPatterns :: [(Text, Text)]Source
Order matters here: The patterns are replaced in reverse order when generating tags, and in top-to-bottom when generating tags.
reversePatterns :: [(Text, Text)]Source
replaceAll :: [(Text, Text)] -> Text -> TextSource
These tags may actually be the Penn Treebank tags. But I have not (yet?) seen the punctuation tags added to the Penn set.
This particular list was complied from the union of:
- All tags used on the Conll2000 training corpus. (contributing the punctuation tags) * The PennTreebank tags, listed here: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html (which contributed LS over the items in the corpus). * The tags: START, END, and Unk, which are used by Chatter.
START | START tag, used in training. |
END | END tag, used in training. |
Hash | # |
Dollar | $ |
CloseDQuote | '' |
OpenDQuote | `` |
Op_Paren | ( |
Cl_Paren | ) |
Comma | , |
Term | . Sentence Terminator |
Colon | : |
CC | Coordinating conjunction |
CD | Cardinal number |
DT | Determiner |
EX | Existential there |
FW | Foreign word |
IN | Preposition or subordinating conjunction |
JJ | Adjective |
JJR | Adjective, comparative |
JJS | Adjective, superlative |
LS | List item marker |
MD | Modal |
NN | Noun, singular or mass |
NNS | Noun, plural |
NNP | Proper noun, singular |
NNPS | Proper noun, plural |
PDT | Predeterminer |
POS | Possessive ending |
PRP | Personal pronoun |
PRPdollar | Possessive pronoun |
RB | Adverb |
RBR | Adverb, comparative |
RBS | Adverb, superlative |
RP | Particle |
SYM | Symbol |
TO | to |
UH | Interjection |
VB | Verb, base form |
VBD | Verb, past tense |
VBG | Verb, gerund or present participle |
VBN | Verb, past participle |
VBP | Verb, non-3rd person singular present |
VBZ | Verb, 3rd person singular present |
WDT | Wh-determiner |
WP | Wh-pronoun |
WPdollar | Possessive wh-pronoun |
WRB | Wh-adverb |
Unk |