Safe Haskell | None |
---|---|
Language | Haskell2010 |
This is a very simple wrapper around Parsec for writing Information Extraction patterns.
Because the particular tags/tokens to parse depends on the training corpus (for POS tagging) and the domain, this module only provides basic extractors. You can, for example, create an extractor to find noun phrases by combining the components provided here:
nounPhrase :: Extractor (Text, Tag) nounPhrase = do nlist <- many1 (try (posTok $ Tag "NN") <|> try (posTok $ Tag "DT") <|> (posTok $ Tag "JJ")) let term = T.intercalate " " (map fst nlist) return (term, Tag "n-phr")
- type Extractor t = Parsec (TaggedSentence t) ()
- posTok :: Tag t => t -> Extractor t (POS t)
- posPrefix :: Tag t => Text -> Extractor t (POS t)
- matches :: CaseSensitive -> Token -> Token -> Bool
- txtTok :: Tag t => CaseSensitive -> Token -> Extractor t (POS t)
- anyToken :: Tag t => Extractor t (POS t)
- oneOf :: Tag t => CaseSensitive -> [Token] -> Extractor t (POS t)
- followedBy :: Tag t => Extractor t b -> Extractor t a -> Extractor t a
Documentation
type Extractor t = Parsec (TaggedSentence t) () Source
A Parsec parser.
Example usage:
> set -XOverloadedStrings > import Text.Parsec.Prim > parse myExtractor "interactive repl" someTaggedSentence
matches :: CaseSensitive -> Token -> Token -> Bool Source
Text equality matching with optional case sensitivity.