This module aims to make tagging text with parts of speech trivially easy.
If you're new to
chatter and POS-tagging, then I
suggest you simply try:
tagger <- defaultTagger
tagStr tagger "This is a sample sentence.""This/dt is/bez a/at sample/nn sentence/nn ./."
Note that we used
tagStr, instead of
people don't (yet!) use Data.Text by default, so there is a
tag that packs and unpacks the
String. This is
innefficient, but it's just to get you started, and
tagStr can be
very handy when you're debugging an tagger in ghci (or cabal repl).
- tag :: POSTagger -> Text -> [TaggedSentence]
- tagStr :: POSTagger -> String -> String
- tagText :: POSTagger -> Text -> Text
- train :: POSTagger -> [TaggedSentence] -> IO POSTagger
- trainStr :: POSTagger -> String -> IO POSTagger
- trainText :: POSTagger -> Text -> IO POSTagger
- eval :: POSTagger -> [TaggedSentence] -> Double
- serialize :: POSTagger -> ByteString
- deserialize :: Map ByteString (ByteString -> Maybe POSTagger -> Either String POSTagger) -> ByteString -> Either String POSTagger
- taggerTable :: Map ByteString (ByteString -> Maybe POSTagger -> Either String POSTagger)
- saveTagger :: POSTagger -> FilePath -> IO ()
- loadTagger :: FilePath -> IO POSTagger
- defaultTagger :: IO POSTagger
Tag a chunk of input text with part-of-speech tags, using the
sentence splitter, tokenizer, and tagger contained in the
Tag the tokens in a string.
Returns a space-separated string of tokens, each token suffixed with the part of speech. For example:
tag tagger "the dog jumped .""the/at dog/nn jumped/vbd ./."
POSTagger on a corpus of sentences.
This will recurse through the
POSTagger stack, training all the
backoff taggers as well. In order to do that, this function has to
be generic to the kind of taggers used, so it is not possible to
train up a new POSTagger from nothing:
train wouldn't know what
tagger to create.
To get around that restriction, you can use the various
implementations, such as
NLP.POS.AvgPerceptronTagger.mkTagger'. For example:
import NLP.POS.AvgPerceptronTagger as APT let newTagger = APT.mkTagger APT.emptyPerceptron Nothing posTgr <- train newTagger trainingExamples
Train a tagger on string input in the standard form for POS tagged corpora:
trainStr tagger "the/at dog/nn jumped/vbd ./."
Measures accuracy over all tags in the test corpus.
Accuracy is calculated as:
|tokens tagged correctly| / |all tokens|
The default table of tagger IDs to readTagger functions. Each tagger packaged with Chatter should have an entry here. By convention, the IDs use are the fully qualified module name of the tagger package.