chatter-0.5.2.0: A library of simple NLP algorithms.

CopyrightRogan Creswick, 2014
Maintainercreswick@gmail.com
Stabilityexperimental
Safe HaskellNone
LanguageHaskell2010

NLP.POS

Description

This module aims to make tagging text with parts of speech trivially easy.

If you're new to chatter and POS-tagging, then I suggest you simply try:

>>> tagger <- defaultTagger
>>> tagStr tagger "This is a sample sentence."
"This/dt is/bez a/at sample/nn sentence/nn ./."

Note that we used tagStr, instead of tag, or tagText. Many people don't (yet!) use Data.Text by default, so there is a wrapper around tag that packs and unpacks the String. This is innefficient, but it's just to get you started, and tagStr can be very handy when you're debugging a tagger in ghci (or cabal repl).

tag exposes more details of the tokenization and tagging, since it returns a list of TaggedSentences, but it doesn't print results as nicely.

Synopsis

Documentation

tag :: Tag t => POSTagger t -> Text -> [TaggedSentence t] Source

Tag a chunk of input text with part-of-speech tags, using the sentence splitter, tokenizer, and tagger contained in the POSTager.

tagStr :: Tag t => POSTagger t -> String -> String Source

Tag the tokens in a string.

Returns a space-separated string of tokens, each token suffixed with the part of speech. For example:

>>> tag tagger "the dog jumped ."
"the/at dog/nn jumped/vbd ./."

tagText :: Tag t => POSTagger t -> Text -> Text Source

Text version of tagStr

train :: Tag t => POSTagger t -> [TaggedSentence t] -> IO (POSTagger t) Source

Train a POSTagger on a corpus of sentences.

This will recurse through the POSTagger stack, training all the backoff taggers as well. In order to do that, this function has to be generic to the kind of taggers used, so it is not possible to train up a new POSTagger from nothing: train wouldn't know what tagger to create.

To get around that restriction, you can use the various mkTagger implementations, such as mkTagger or NLP.POS.AvgPerceptronTagger.mkTagger'. For example:

import NLP.POS.AvgPerceptronTagger as APT

let newTagger = APT.mkTagger APT.emptyPerceptron Nothing
posTgr <- train newTagger trainingExamples

trainStr :: Tag t => POSTagger t -> String -> IO (POSTagger t) Source

Train a tagger on string input in the standard form for POS tagged corpora:

trainStr tagger "the/at dog/nn jumped/vbd ./."

trainText :: Tag t => POSTagger t -> Text -> IO (POSTagger t) Source

The Text version of trainStr

eval :: Tag t => POSTagger t -> [TaggedSentence t] -> Double Source

Evaluate a POSTager.

Measures accuracy over all tags in the test corpus.

Accuracy is calculated as:

|tokens tagged correctly| / |all tokens|

taggerTable :: Tag t => Map ByteString (ByteString -> Maybe (POSTagger t) -> Either String (POSTagger t)) Source

The default table of tagger IDs to readTagger functions. Each tagger packaged with Chatter should have an entry here. By convention, the IDs use are the fully qualified module name of the tagger package.

saveTagger :: Tag t => POSTagger t -> FilePath -> IO () Source

Store a POSTager to a file.

loadTagger :: Tag t => FilePath -> IO (POSTagger t) Source

Load a tagger, using the interal taggerTable. If you need to specify your own mappings for new composite taggers, you should use deserialize.

This function checks the filename to determine if the content should be decompressed. If the file ends with ".gz", then we assume it is a gziped model.

defaultTagger :: IO (POSTagger Tag) Source

A basic POS tagger.

conllTagger :: IO (POSTagger Tag) Source

A POS tagger that has been trained on the Conll 2000 POS tags.

brownTagger :: IO (POSTagger Tag) Source

A POS tagger trained on a subset of the Brown corpus.