Safe Haskell	None
Language	Haskell2010

NLP.Extraction.Parsec

Contents

Orphan instances

Description

This is a very simple wrapper around Parsec for writing Information Extraction patterns.

Because the particular tags/tokens to parse depends on the training corpus (for POS tagging) and the domain, this module only provides basic extractors. You can, for example, create an extractor to find noun phrases by combining the components provided here:

  nounPhrase :: Extractor (Text, Tag)
  nounPhrase = do
    nlist <- many1 (try (posTok $ Tag "NN")
                <|> try (posTok $ Tag "DT")
                    <|> (posTok $ Tag "JJ"))
    let term = T.intercalate " " (map fst nlist)
    return (term, Tag "n-phr")

Synopsis

Documentation

type Extractor t = Parsec (TaggedSentence t) () Source #

A Parsec parser.

Example usage:

> set -XOverloadedStrings
> import Text.Parsec.Prim
> parse myExtractor "interactive repl" someTaggedSentence

posTok :: Tag t => t -> Extractor t (POS t) Source #

Consume a token with the given POS Tag

posPrefix :: Tag t => Text -> Extractor t (POS t) Source #

Consume a token with the specified POS prefix.

> parse (posPrefix "n") "ghci" [(Bob, Tag "np")]
Right [(Bob, Tag "np")]

matches :: CaseSensitive -> Token -> Token -> Bool Source #

Text equality matching with optional case sensitivity.

txtTok :: Tag t => CaseSensitive -> Token -> Extractor t (POS t) Source #

Consume a token with the given lexical representation.

anyToken :: Tag t => Extractor t (POS t) Source #

Consume any one non-empty token.

oneOf :: Tag t => CaseSensitive -> [Token] -> Extractor t (POS t) Source #

followedBy :: Tag t => Extractor t b -> Extractor t a -> Extractor t a Source #

Skips any number of fill tokens, ending with the end parser, and returning the last parsed result.

This is useful when you know what you're looking for and (for instance) don't care what comes first.

Orphan instances

(Monad m, Tag t) => Stream (TaggedSentence t) m (POS t) Source #
Methods uncons :: TaggedSentence t -> m (Maybe (POS t, TaggedSentence t)) #
(Monad m, ChunkTag c, Tag t) => Stream (ChunkedSentence c t) m (ChunkOr c t) Source #
Methods uncons :: ChunkedSentence c t -> m (Maybe (ChunkOr c t, ChunkedSentence c t)) #