Copyright | © 2018 bruno cuconato |
---|---|
License | LPGL-3 |
Maintainer | bruno cuconato <bcclaro+hackage@gmail.com> |
Stability | experimental |
Portability | non-portable |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
defines types for handling CoNLL-U data.
Synopsis
- type Doc = [Sent]
- data Sent = Sent {}
- type Comment = StringPair
- type StringPair = (String, String)
- data CW a = CW {}
- data AW
- data SW
- data MT
- data EN
- data ID
- type FORM = Maybe String
- type LEMMA = Maybe String
- type UPOS = Maybe POS
- type XPOS = Maybe String
- type FEATS = [Feat]
- type HEAD = ID
- type DEPS = [Rel]
- type MISC = Maybe String
- data Feat = Feat {}
- data Rel = Rel {}
- type Index = Int
- type IxSep = Char
- _dep :: CW SW -> Maybe EP
- depIs :: EP -> CW SW -> Bool
- mkDEP :: String -> EP
- mkUPOS :: String -> POS
- mkAW :: ID -> FORM -> LEMMA -> UPOS -> XPOS -> FEATS -> Maybe Rel -> DEPS -> MISC -> CW AW
- mkSW :: CW AW -> CW SW
type and data declarations
Documents and Sentences
type Comment = StringPair Source #
most comments are (key, value) pairs.
type StringPair = (String, String) Source #
Words
represents a word line in a CoNLL-U file. note that we have
collapsed some fields together: HEAD
and DEPREL have been
combined as a relation type Rel accessible by the $sel:_rel:CW
function;
the DEPS
field is merely a list of Rel
.
a C(oNLL-U)W(ord) may be a simple word, a multi-word token, or an
empty node. this is captured by the phantom type (the a
in the
declaration), which can be parametrized by one of the data types
below in order to build functions that only operate on one of these
word types (see mkSWord
on how to do this). see the _dep
function, which only operates on simple words, which are the ones
that have a DEPREL field.
Word types
Word Fields
SID Index | word ID is an integer |
MID Index Index | multi-word token ID is a range |
EID Index Index | empty node ID is a decimal |
feature representation
dependency relation representation.