Copyright | Rogan Creswick, 2014 |
---|---|
Maintainer | creswick@gmail.com |
Stability | experimental |
Safe Haskell | None |
Language | Haskell2010 |
NLP.Chunk aims to make phrasal chunking trivially easy -- it is the corolary to NLP.POS.
The simplest way to try out chunking with Chatter is to open a repl after installing chatter and try this:
> import NLP.POS > import NLP.Chunk > tgr <- defaultTagger > chk <- defaultChunker > chunkText tgr chk "Monads are monoids in the category of endofunctors." "[NP Monads/NNS are/VBP monoids/NNS] [PP in/IN] [NP the/DT category/NN] [PP of/IN] [NP endofunctors/NNS] ./."
Note that it isn't perfect--phrase chunking is tricky, and the
defaultTagger
and defaultChunker
aren't trained on the largest
training set (they use Conll 2000). You can easily train more taggers
and chunkers using the APIs exposed here if you have the training data
to do so.
- defaultChunker :: IO (Chunker Chunk Tag)
- conllChunker :: IO (Chunker Chunk Tag)
- train :: (ChunkTag c, Tag t) => Chunker c t -> [ChunkedSentence c t] -> IO (Chunker c t)
- chunk :: (ChunkTag c, Tag t) => Chunker c t -> [TaggedSentence t] -> [ChunkedSentence c t]
- chunkText :: (ChunkTag c, Tag t) => POSTagger t -> Chunker c t -> Text -> Text
- chunkStr :: (ChunkTag c, Tag t) => POSTagger t -> Chunker c t -> String -> String
- chunkerTable :: (ChunkTag c, Tag t) => Map ByteString (ByteString -> Either String (Chunker c t))
- saveChunker :: (ChunkTag c, Tag t) => Chunker c t -> FilePath -> IO ()
- loadChunker :: (ChunkTag c, Tag t) => FilePath -> IO (Chunker c t)
- serialize :: (ChunkTag c, Tag t) => Chunker c t -> ByteString
- deserialize :: (ChunkTag c, Tag t) => Map ByteString (ByteString -> Either String (Chunker c t)) -> ByteString -> Either String (Chunker c t)
Documentation
train :: (ChunkTag c, Tag t) => Chunker c t -> [ChunkedSentence c t] -> IO (Chunker c t) Source
Train a chunker on a set of additional examples.
chunk :: (ChunkTag c, Tag t) => Chunker c t -> [TaggedSentence t] -> [ChunkedSentence c t] Source
Chunk a TaggedSentence
that has been produced by a Chatter
tagger, producing a rich representation of the Chunks and the Tags
detected.
If you just want to see chunked output from standard text, you
probably want chunkText
or chunkStr
.
chunkText :: (ChunkTag c, Tag t) => POSTagger t -> Chunker c t -> Text -> Text Source
Convenience funciton to Tokenize, POS-tag, then Chunk the provided text, and format the result in an easy-to-read format.
> tgr <- defaultTagger > chk <- defaultChunker > chunkText tgr chk "The brown dog jumped over the lazy cat." "[NP The/DT brown/NN dog/NN] [VP jumped/VBD] [NP over/IN the/DT lazy/JJ cat/NN] ./."
chunkStr :: (ChunkTag c, Tag t) => POSTagger t -> Chunker c t -> String -> String Source
A wrapper around chunkText
that packs strings.
chunkerTable :: (ChunkTag c, Tag t) => Map ByteString (ByteString -> Either String (Chunker c t)) Source
The default table of tagger IDs to readTagger functions. Each tagger packaged with Chatter should have an entry here. By convention, the IDs use are the fully qualified module name of the tagger package.
saveChunker :: (ChunkTag c, Tag t) => Chunker c t -> FilePath -> IO () Source
Store a Chunker
to disk.
loadChunker :: (ChunkTag c, Tag t) => FilePath -> IO (Chunker c t) Source
Load a Chunker
from disk, optionally gunzipping if
needed. (based on file extension)
deserialize :: (ChunkTag c, Tag t) => Map ByteString (ByteString -> Either String (Chunker c t)) -> ByteString -> Either String (Chunker c t) Source