hs-conllu-0.1.2: Conllu validating parser and utils.

Copyright© 2018 bruno cuconato
LicenseLPGL-3
Maintainerbruno cuconato <bcclaro+hackage@gmail.com>
Stabilityexperimental
Portabilitynon-portable
Safe HaskellSafe
LanguageHaskell2010

Conllu.Type

Contents

Description

defines types for handling CoNLL-U data.

Synopsis

type and data declarations

Documents and Sentences

type Doc = [Sent] Source #

data Sent Source #

Constructors

Sent 

Fields

Instances

Eq Sent Source # 

Methods

(==) :: Sent -> Sent -> Bool #

(/=) :: Sent -> Sent -> Bool #

Show Sent Source # 

Methods

showsPrec :: Int -> Sent -> ShowS #

show :: Sent -> String #

showList :: [Sent] -> ShowS #

type Comment = StringPair Source #

most comments are (key, value) pairs.

Words

data CW a Source #

represents a word line in a CoNLL-U file. note that we have collapsed some fields together: HEAD and DEPREL have been combined as a relation type Rel accessible by the _rel function; the DEPS field is merely a list of Rel.

a C(oNLL-U)W(ord) may be a simple word, a multi-word token, or an empty node. this is captured by the phantom type (the a in the declaration), which can be parametrized by one of the data types below in order to build functions that only operate on one of these word types (see mkSWord on how to do this). see the _dep function, which only operates on simple words, which are the ones that have a DEPREL field.

Constructors

CW 

Fields

Instances

Eq (CW a) Source # 

Methods

(==) :: CW a -> CW a -> Bool #

(/=) :: CW a -> CW a -> Bool #

Ord (CW a) Source # 

Methods

compare :: CW a -> CW a -> Ordering #

(<) :: CW a -> CW a -> Bool #

(<=) :: CW a -> CW a -> Bool #

(>) :: CW a -> CW a -> Bool #

(>=) :: CW a -> CW a -> Bool #

max :: CW a -> CW a -> CW a #

min :: CW a -> CW a -> CW a #

Show (CW a) Source # 

Methods

showsPrec :: Int -> CW a -> ShowS #

show :: CW a -> String #

showList :: [CW a] -> ShowS #

Word types

data AW Source #

phantom type for any kind of word.

data SW Source #

phantom type for a simple word.

data MT Source #

phantom type for multiword tokens. do note that in MWTs only the ID, FORM and MISC fields may be non-empty.

data EN Source #

phantom type for an empty node.

Word Fields

data ID Source #

Constructors

SID Index

word ID is an integer

MID Index Index

multi-word token ID is a range

EID Index Index

empty node ID is a decimal

Instances

Eq ID Source # 

Methods

(==) :: ID -> ID -> Bool #

(/=) :: ID -> ID -> Bool #

Ord ID Source # 

Methods

compare :: ID -> ID -> Ordering #

(<) :: ID -> ID -> Bool #

(<=) :: ID -> ID -> Bool #

(>) :: ID -> ID -> Bool #

(>=) :: ID -> ID -> Bool #

max :: ID -> ID -> ID #

min :: ID -> ID -> ID #

Show ID Source # 

Methods

showsPrec :: Int -> ID -> ShowS #

show :: ID -> String #

showList :: [ID] -> ShowS #

type FEATS = [Feat] Source #

type HEAD = ID Source #

type DEPS = [Rel] Source #

data Feat Source #

feature representation

Constructors

Feat 

Fields

Instances

Eq Feat Source # 

Methods

(==) :: Feat -> Feat -> Bool #

(/=) :: Feat -> Feat -> Bool #

Show Feat Source # 

Methods

showsPrec :: Int -> Feat -> ShowS #

show :: Feat -> String #

showList :: [Feat] -> ShowS #

data Rel Source #

dependency relation representation.

Constructors

Rel 

Fields

Instances

Eq Rel Source # 

Methods

(==) :: Rel -> Rel -> Bool #

(/=) :: Rel -> Rel -> Bool #

Show Rel Source # 

Methods

showsPrec :: Int -> Rel -> ShowS #

show :: Rel -> String #

showList :: [Rel] -> ShowS #

type Index = Int Source #

type IxSep = Char Source #

ID separator in meta words

accessor functions

_dep :: CW SW -> Maybe EP Source #

get DEPREL main value, if it exists.

depIs :: EP -> CW SW -> Bool Source #

check if DEP is the one provided.

constructor functions

mkDEP :: String -> EP Source #

read a main DEPREL (no subtype).

mkUPOS :: String -> POS Source #

read an UPOS tag.

mkAW :: ID -> FORM -> LEMMA -> UPOS -> XPOS -> FEATS -> Maybe Rel -> DEPS -> MISC -> CW AW Source #

make a word from its fields, by default it has phantom type of AW (any kind of word).

mkSW :: CW AW -> CW SW Source #

coerce a word to a simple word.