Safe Haskell | None |
---|
NLP.Morfeusz
Contents
Description
The module provides the analyse
wrapper function which uses the
Morfeusz library for morphosyntactic analysis. The result is represented
as a directed acylic graph (DAG) with Token
labeled edges.
The DAG representation is needed when the input word has multiple
correct segmentations.
>>>
:m NLP.Morfeusz
>>>
:set -XOverloadedStrings
>>>
mapM_ print . analyse False $ "miałem"
Edge {from = 0, to = 1, label = Token {orth = "mia\322", interps = [Interp {base = "mie\263", msd = "praet:sg:m1.m2.m3:imperf"}]}} Edge {from = 0, to = 2, label = Token {orth = "mia\322em", interps = [Interp {base = "mia\322", msd = "subst:sg:inst:m3"}]}} Edge {from = 1, to = 2, label = Token {orth = "em", interps = [Interp {base = "by\263", msd = "aglt:sg:pri:imperf:wok"}]}}
You can use the paths
function to extract all paths from the resultant
DAG and, if you are not interested in all possible segmentations, just
take the first of possible paths:
>>>
mapM_ print . paths . analyse False $ "miałem"
[Token {orth = "mia\322em", interps = [Interp {base = "mia\322", msd = "subst:sg:inst:m3"}]}] [Token {orth = "mia\322", interps = [Interp {base = "mie\263", msd = "praet:sg:m1.m2.m3:imperf"}]},Token {orth = "em", interps = [Interp {base = "by\263", msd = "aglt:sg:pri:imperf:wok"}]}]>>>
mapM_ print . head . paths . analyse False $ "miałem"
Token {orth = "mia\322em", interps = [Interp {base = "mia\322", msd = "subst:sg:inst:m3"}]}
Types
A directed edge with label of type a
between nodes of type Int
.
A token with a list of recognized interpretations. If the list of interpretations is empty, the token is unknown to the Morfeusz.
An interpretation of the word.
Sentence analysis
type KeepSpaces = BoolSource
Keep spaces in the analysis output.
analyse :: KeepSpaces -> Text -> DAG TokenSource
Analyse the input sentence and return the result as a DAG of tokens.