hs-conllu-0.1.5: Conllu validating parser and utils.
Copyright© 2018 bruno cuconato
LicenseLPGL-3
Maintainerbruno cuconato <bcclaro+hackage@gmail.com>
Stabilityexperimental
Portabilitynon-portable
Safe HaskellSafe-Inferred
LanguageHaskell2010

Conllu.Parse

Description

Parsers for CoNLL-U format. the CoNLL-U format is based in the deprecated CoNLL format (defined here) and is defined here

Synopsis

Documentation

type Parser = Parsec Void String Source #

Parser type synonym

parsers

parseConlluWith Source #

Arguments

:: Parser Sent

the sentence parser to be used.

-> FilePath

the source whose stream is being supplied in the next argument (may be "" for no file)

-> String

stream to be parsed

-> Either String Doc 

parse a CoNLL-U document using a customized parser.

parseConllu :: FilePath -> String -> Either String Doc Source #

parse a CoNLL-U document using the default parser.

customizable parsers

parserC :: ParserC -> Parser Sent Source #

defines a custom parser of sentences. if you only need to customize one field parser (e.g., to parse special comments or a special MISC field), you can do:

parserC ParserC{_commentP = myCommentsParser }

default parsers

rawSents :: Parser (RawData String Void) Source #

parse CoNLL-U sentences with recovery.

sentence :: Parser Sent Source #

the default sentence parser.

comment :: Parser Comment Source #

parse a comment.

word :: Parser (CW AW) Source #

the default word parser.

CoNLL-U field parsers

emptyField :: Parser (Maybe a) Source #

parse an empty field.

idW :: Parser ID Source #

parse the ID field, which might be an integer, a range, or a decimal.

form :: Parser FORM Source #

parse the FORM field.

lemma :: Parser LEMMA Source #

parse the LEMMA field.

upos :: Parser UPOS Source #

parse the UPOS field.

xpos :: Parser XPOS Source #

parse the XPOS field.

feats :: Parser FEATS Source #

parse the FEATS field.

deprel :: Parser DEPREL Source #

parse the DEPREL field.

deps :: Parser DEPS Source #

parse the DEPS field.

misc :: Parser MISC Source #

parse the MISC field.

utility parsers

commentPair :: Parser Comment Source #

parse a comment pair.

listPair :: String -> Parser a -> Parser b -> Parser [(a, b)] Source #

parse a list of pairs.

stringNot :: String -> Parser String Source #

parse any chars except the ones provided.

stringWOSpaces :: Parser String Source #

parse a string until a space, a tab, or a newline.

stringWSpaces :: Parser String Source #

parse a string until a tab or a newline.

parser combinators

keyValue :: String -> Parser a -> Parser b -> Parser (a, b) Source #

parse a (key, value) pair.

maybeEmpty :: Parser a -> Parser (Maybe a) Source #

a parser combinator for parsers that won't parse "_" (e.g., as lemma would).

two combinators are needed for parsing the empty field (without lookahead). this has to do with the fact that if we do

form <|> emptyField

we would parse "_" as a non-empty FORM field. but if we did

emptyField <|> form

we would parse "_" in "_something" and then the parser would choke expecting a tab.

orEmpty :: Parser String -> Parser (Maybe String) Source #

a parser combinator for parsers that may parse "_".

listP :: Parser [a] -> Parser [a] Source #

parse a list of values that may be an empty field. using a parser that returns a possibly empty list like sepBy and many will return the correct result for the empty field ('_'), but will report it the same as any other syntax error.