| Safe Haskell | None |
|---|---|
| Language | Haskell2010 |
NLP.Types.Annotations
- prettyShow :: Pretty p => p -> Text
- newtype Index a = Index Int
- fromIndex :: Index a -> Int
- data Annotation dat tag = Annotation {}
- data TokenizedSentence = TokenizedSentence {
- tokText :: Text
- tokAnnotations :: [Annotation Text Token]
- tokens :: TokenizedSentence -> [Token]
- toTextToks :: TokenizedSentence -> [Text]
- data TaggedSentence pos = TaggedSentence {}
- tsLength :: POS pos => TaggedSentence pos -> Int
- tsToPairs :: POS pos => TaggedSentence pos -> [(Token, pos)]
- applyTags :: POS pos => TokenizedSentence -> [pos] -> TaggedSentence pos
- getTags :: POS pos => TaggedSentence pos -> [pos]
- unapplyTags :: POS pos => TaggedSentence pos -> (TokenizedSentence, [pos])
- data ChunkedSentence pos chunk = ChunkedSentence {
- chunkTagSentence :: TaggedSentence pos
- chunkAnnotations :: [Annotation (TaggedSentence pos) chunk]
- data NERedSentence pos chunk ne = NERedSentence {
- neChunkSentence :: ChunkedSentence pos chunk
- neAnnotations :: [Annotation (TaggedSentence pos) ne]
- class AnnotatedText sentence where
- getText :: sentence -> Text
- newtype Token = Token Text
- showTok :: Token -> Text
- suffix :: Token -> Text
- class (Ord a, Eq a, Read a, Show a, Generic a, Serialize a, Hashable a) => POS a where
- serializePOS :: a -> Text
- parsePOS :: Text -> Either Error a
- safeParsePOS :: Text -> a
- tagUNK :: a
- startPOS :: a
- endPOS :: a
- isDt :: a -> Bool
- class (Ord a, Eq a, Read a, Show a, Generic a, Serialize a, Hashable a) => Chunk a where
- serializeChunk :: a -> Text
- parseChunk :: Text -> Either Error a
- notChunk :: a
- class (Ord a, Eq a, Read a, Show a, Generic a, Serialize a, Hashable a) => NamedEntity a where
- serializeNETag :: a -> Text
- parseNETag :: Text -> Either Error a
Documentation
prettyShow :: Pretty p => p -> Text
Convert a pretty-printable value into a text string.
newtype Index a
Safe index type, uses a phantom type to prevent us from indexing into the wrong thing.
data Annotation dat tag
Annotations are the base of all tags (POS tags, Chunks, marked entities, etc.)
The semantics of the particular annotation depend on the type of the value, and these can be wrapped up in a newtype for easier use.
Constructors
| Annotation | |
Fields
| |
Instances
| (Eq dat, Eq tag) => Eq (Annotation dat tag) | |
| (Ord dat, Ord tag) => Ord (Annotation dat tag) | |
| (Read dat, Read tag) => Read (Annotation dat tag) | |
| (Show dat, Show tag) => Show (Annotation dat tag) | |
| Generic (Annotation dat tag) | |
| (Hashable dat, Hashable tag) => Hashable (Annotation dat tag) | |
| AnnotatedText (Annotation Text Token) | |
| type Rep (Annotation dat tag) |
data TokenizedSentence
Wrapper around both the underlying text and the tokenizer results.
Constructors
| TokenizedSentence | |
Fields
| |
Instances
tokens :: TokenizedSentence -> [Token]
Get the raw tokens out of a TokenizedSentence
toTextToks :: TokenizedSentence -> [Text]
data TaggedSentence pos
Results of the POS tagger, which encompases a TokenizedSentence
Constructors
| TaggedSentence | |
Fields | |
Instances
| Eq pos => Eq (TaggedSentence pos) | |
| Ord pos => Ord (TaggedSentence pos) | |
| Read pos => Read (TaggedSentence pos) | |
| Show pos => Show (TaggedSentence pos) | |
| Generic (TaggedSentence pos) | |
| Arbitrary pos => Arbitrary (TaggedSentence pos) | |
| Hashable pos => Hashable (TaggedSentence pos) | |
| POS pos => Pretty (TaggedSentence pos) | |
| AnnotatedText (TaggedSentence pos) | |
| type Rep (TaggedSentence pos) |
tsLength :: POS pos => TaggedSentence pos -> Int
Count the length of the tokens of a TaggedSentence.
Note that this is *probably* the number of annotations also, but it is not necessarily the same.
tsToPairs :: POS pos => TaggedSentence pos -> [(Token, pos)]
Generate a list of Tokens and their corresponding POS tags. Creates a token for each POS tag, just in case any POS tags are annotated over multiple tokens.
applyTags :: POS pos => TokenizedSentence -> [pos] -> TaggedSentence pos
Apply a parallel list of POS tags to a TokenizedSentence
getTags :: POS pos => TaggedSentence pos -> [pos]
Extract the POS tags from a tagged sentence.
unapplyTags :: POS pos => TaggedSentence pos -> (TokenizedSentence, [pos])
Extract the POS tags from a tagged sentence, returning the tokenized sentence that they applied to.
data ChunkedSentence pos chunk
A Chunked sentence, with underlying Part-of-Speech tags and tokens.
Note: This is not a deep tree, a separate parse tree is needed.
Constructors
| ChunkedSentence | |
Fields
| |
Instances
| (Eq pos, Eq chunk) => Eq (ChunkedSentence pos chunk) | |
| (Ord pos, Ord chunk) => Ord (ChunkedSentence pos chunk) | |
| (Read pos, Read chunk) => Read (ChunkedSentence pos chunk) | |
| (Show pos, Show chunk) => Show (ChunkedSentence pos chunk) | |
| Generic (ChunkedSentence pos chunk) | |
| (Hashable pos, Hashable chunk) => Hashable (ChunkedSentence pos chunk) | |
| AnnotatedText (ChunkedSentence pos chunk) | |
| type Rep (ChunkedSentence pos chunk) |
data NERedSentence pos chunk ne
A sentence that has been marked with named entities.
Constructors
| NERedSentence | |
Fields
| |
Instances
| (Eq pos, Eq chunk, Eq ne) => Eq (NERedSentence pos chunk ne) | |
| (Ord pos, Ord chunk, Ord ne) => Ord (NERedSentence pos chunk ne) | |
| (Read pos, Read chunk, Read ne) => Read (NERedSentence pos chunk ne) | |
| (Show pos, Show chunk, Show ne) => Show (NERedSentence pos chunk ne) | |
| Generic (NERedSentence pos chunk ne) | |
| (Hashable pos, Hashable chunk, Hashable ne) => Hashable (NERedSentence pos chunk ne) | |
| AnnotatedText (NERedSentence pos chunk ne) | |
| type Rep (NERedSentence pos chunk ne) |
class AnnotatedText sentence where
Typeclass of things that have underlying text, so it's easy to get the annotated document out of a tagged, tokenized, or chunked result.
Methods
getText :: sentence -> Text
Instances
| AnnotatedText TokenizedSentence | |
| AnnotatedText (TaggedSentence pos) | |
| AnnotatedText (IOBTaggedSentence pos) | |
| AnnotatedText (ChunkedSentence pos chunk) | |
| AnnotatedText (Annotation Text Token) | |
| AnnotatedText (NERedSentence pos chunk ne) |
newtype Token
Tokenization takes in text, produces annotations. type Tokenizer = Text -> TokenizedSentence
Chunking requires POS-tags (and tokenization) and generates annotations on the tokens. type Chunker pos chunk = TaggedSentence pos -> ChunkedSentence pos chunk
Named Entity recognition requires POS tags and tokens, and produces annotations with Named Entities marked. type NERer pos chunk ne = ChunkedSentence pos chunk -> NERedSentence pos chunk ne
Sentinel value for tokens.
Constructors
| Token Text |
Extract the last three characters of a Token, if the token is
long enough, otherwise returns the full token text.
class (Ord a, Eq a, Read a, Show a, Generic a, Serialize a, Hashable a) => POS a where
The class of POS Tags.
We use a typeclass here because POS tags just need a few things in excess of equality (they also need to be serializable and human readable). Passing around all the constraints everywhere becomes a hassle, and it's handy to have a uniform interface to the diferent kinds of tag types.
This typeclass also allows for corpus-specific tags to be distinguished; They have different semantics, so they should not be merged. That said, if you wish to create a unifying POS Tag set, and mappings into that set, you can use the type system to ensure that that is done correctly.
Methods
serializePOS :: a -> Text
parsePOS :: Text -> Either Error a
Parse a POS tag into a structured POS value. (eg: NN, VB, etc..)
This is the dual of serializePOS
safeParsePOS :: Text -> a
tagUNK :: a
The value used to represent "unknown".
startPOS :: a
Special marker POS for start of a corpus.
endPOS :: a
Special marker POS for the end of a corpus.
Check if a tag is a determiner tag.
class (Ord a, Eq a, Read a, Show a, Generic a, Serialize a, Hashable a) => Chunk a where
The class of things that can be regarded as chunks; Chunk tags
are much like POS tags, but should not be confused. Generally,
chunks distinguish between different phrasal categories (e.g.; Noun
Phrases, Verb Phrases, Prepositional Phrases, etc..)
Methods
serializeChunk :: a -> Text
Serialize a chunk to a text representation (such as NP, VP, etc.)
This is the dual of parseChunk.
parseChunk :: Text -> Either Error a
Parse a chunk from a text representation (such as NP, VP, etc.)
This is the dual of serializeChunk.
notChunk :: a
Special chunk value to indicate something is not in a chunk.
class (Ord a, Eq a, Read a, Show a, Generic a, Serialize a, Hashable a) => NamedEntity a where
The class of named entity sets. This typeclass can be defined entirely in terms of the required class constraints.
Minimal complete definition
Nothing
Methods
serializeNETag :: a -> Text
Serialize a Named Entity to a Textual representation (eg:
MISC, PER, ORG, etc..) This is the dual of parseNETag.
parseNETag :: Text -> Either Error a
Parse a Named Entity from a textual representation (eg: MISC,
PER, ORG, etc..) This is the dual of serializeNETag.
Instances