chatter-0.5.0.1: A library of simple NLP algorithms.

CopyrightRogan Creswick, 2014
Maintainercreswick@gmail.com
Stabilityexperimental
Safe HaskellNone
LanguageHaskell2010

NLP.Chunk

Description

NLP.Chunk aims to make phrasal chunking trivially easy -- it is the corolary to NLP.POS.

The simplest way to try out chunking with Chatter is to open a repl after installing chatter and try this:

> import NLP.POS
> import NLP.Chunk
> tgr <- defaultTagger
> chk <- defaultChunker
> chunkText tgr chk "Monads are monoids in the category of endofunctors."
 "[NP Monads/NNS are/VBP monoids/NNS] [PP in/IN] [NP the/DT category/NN] [PP of/IN] [NP endofunctors/NNS] ./."

Note that it isn't perfect--phrase chunking is tricky, and the defaultTagger and defaultChunker aren't trained on the largest training set (they use Conll 2000). You can easily train more taggers and chunkers using the APIs exposed here if you have the training data to do so.

Synopsis

Documentation

defaultChunker :: IO (Chunker Chunk Tag) Source

A basic Phrasal chunker.

conllChunker :: IO (Chunker Chunk Tag) Source

Convenient function to load the Conll2000 Chunker.

train :: (ChunkTag c, Tag t) => Chunker c t -> [ChunkedSentence c t] -> IO (Chunker c t) Source

Train a chunker on a set of additional examples.

chunk :: (ChunkTag c, Tag t) => Chunker c t -> [TaggedSentence t] -> [ChunkedSentence c t] Source

Chunk a TaggedSentence that has been produced by a Chatter tagger, producing a rich representation of the Chunks and the Tags detected.

If you just want to see chunked output from standard text, you probably want chunkText or chunkStr.

chunkText :: (ChunkTag c, Tag t) => POSTagger t -> Chunker c t -> Text -> Text Source

Convenience funciton to Tokenize, POS-tag, then Chunk the provided text, and format the result in an easy-to-read format.

> tgr <- defaultTagger
> chk <- defaultChunker
> chunkText tgr chk "The brown dog jumped over the lazy cat."
"[NP The/DT brown/NN dog/NN] [VP jumped/VBD] [NP over/IN the/DT lazy/JJ cat/NN] ./."

chunkStr :: (ChunkTag c, Tag t) => POSTagger t -> Chunker c t -> String -> String Source

A wrapper around chunkText that packs strings.

chunkerTable :: (ChunkTag c, Tag t) => Map ByteString (ByteString -> Either String (Chunker c t)) Source

The default table of tagger IDs to readTagger functions. Each tagger packaged with Chatter should have an entry here. By convention, the IDs use are the fully qualified module name of the tagger package.

saveChunker :: (ChunkTag c, Tag t) => Chunker c t -> FilePath -> IO () Source

Store a Chunker to disk.

loadChunker :: (ChunkTag c, Tag t) => FilePath -> IO (Chunker c t) Source

Load a Chunker from disk, optionally gunzipping if needed. (based on file extension)