Portability | portable |
---|---|
Stability | experimental |
Maintainer | Jan Snajder <jan.snajder@fer.hr> |
Safe Haskell | Safe-Inferred |
Data.MultextEastMsd
Contents
Description
Implementation of the MULTEXT-East morphosyntactic descriptors.
MULTEXT-East encodes values of morphosyntatic attributes in a single string,
using positional encoding. Each attribute is represented by a single letter
at a predefined position, while non-applicable attributes are represented by
hyphens. For example, Ncmsg
denotes a common noun (Nc
) in masculine
singular genitive (msg
) case. For details, refer to http://nl.ijs.si/ME.
Currently, only MULTEXT-East Version 3 is supported. MULTEXT-East Version 3 covers morphosyntactic descriptions for Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian. For details, refer to http://nl.ijs.si/ME/V3/.
Usage example:
>>>
let Just d1 = fromString "Ncmsg"
>>>
pos d1
Noun>>>
features d1
[NType Common,Gender Masculine,Number Singular,Case Genitive]>>>
let d2 = unset NType d1
>>>
toString d2
"N-msg">>>
d1 == d2
False>>>
d1 =~= d2
True
- data Msd
- msd :: PoS -> [Feature] -> Msd
- data PoS
- = Noun
- | Verb
- | Adjective
- | Adposition
- | Conjunction
- | Numeral
- type Attribute a = a -> Feature
- get :: Enum a => Attribute a -> Msd -> Maybe Feature
- set :: [Feature] -> Msd -> Msd
- unset :: Enum a => Attribute a -> Msd -> Msd
- check :: [Feature] -> Msd -> Bool
- features :: Msd -> [Feature]
- pos :: Msd -> PoS
- (=~=) :: MsdPattern a => a -> a -> Bool
- toString :: Msd -> String
- fromString :: String -> Maybe Msd
- validString :: String -> Bool
- data Feature
- = Animate Bool
- | AType AType
- | Aspect Aspect
- | Case Case
- | Class Class
- | Clitic Bool
- | CliticS Bool
- | CoordType CoordType
- | Courtesy Bool
- | CType CType
- | Definiteness Definiteness
- | Degree Degree
- | Formation Formation
- | Gender Gender
- | MForm MForm
- | MType MType
- | Negative Bool
- | NType NType
- | Number Number
- | OwnedNumber Number
- | OwnerNumber Number
- | OwnerPerson Person
- | Person Person
- | SType SType
- | SubType SubType
- | Tense Tense
- | VForm VForm
- | Voice Voice
- | VType VType
- data AType
- = Qualificative
- | Indefinite
- | Possessive
- | OrdinalT
- data Aspect
- data Case
- = Nominative
- | Genitive
- | Dative
- | Accusative
- | Vocative
- | Locative
- | Instrumental
- | Direct
- | Oblique
- | Partitive
- | Illative
- | Inessive
- | Elative
- | Allative
- | Adessive
- | Ablative
- | Translative
- | Terminative
- | Essive
- | Abessive
- | Komitative
- | Aditive
- | Temporalis
- | Causalis
- | Sublative
- | Delative
- | Sociative
- | Factive
- | Superessive
- | Distributive
- | EssiveFormal
- | Multiplicative
- data Class
- = Definite1
- | Definite2
- | Definite34
- | Definite
- | Demonstrative
- | IndefiniteC
- | Interrogative
- | Relative
- data CoordType
- = CTSimple
- | CTRepetit
- | CTCorrelat
- | CTSentence
- | CTWords
- | Initial
- | NonInitial
- data CType
- data Definiteness
- data Degree
- = Positive
- | Comparative
- | Superlative
- | ElativeD
- | Diminutive
- data Formation
- data Gender
- data MForm
- data MType
- data NType
- data Number
- = Singular
- | Plural
- | Dual
- | Count
- | Collective
- data Person
- data SType
- data SubType
- = STNegative
- | STPositive
- data Tense
- data VForm
- = Indicative
- | Subjunctive
- | Imperative
- | Conditional
- | Infinitive
- | Participle
- | Gerund
- | Supine
- | Transgressive
- | Quotative
- data Voice
- data VType
Datatype constructor
msd :: PoS -> [Feature] -> MsdSource
Constructs a morphosyntactic descriptor (an abstract Msd
datatype) of
a specified part-of-speech and with specified features (attribute-value
pairs). Duplicated attributes and attributes not applicable to the given
part-of-speech are ignored.
Constructors
Noun | |
Verb | |
Adjective | |
Adposition | |
Conjunction | |
Numeral |
Getting and setting values
set :: [Feature] -> Msd -> MsdSource
Sets the specified features (attribute-value pairs). Duplicated attributes and attributes not applicable to the given part-of-speech are ignored.
check :: [Feature] -> Msd -> BoolSource
Checks whether the attributes are set to the specified values.
Wildcard matching
(=~=) :: MsdPattern a => a -> a -> BoolSource
A wildcard-matching operator between two Msd patterns.
Relation msd1 =~= msd2
holds iff msd1
and msd2
are of the same
part-of-speech and the attributes common to msd1
and msd2
have identical values. The attributes of msd1
that are not
set in msd2
(and conversely) are ignored in the comparison.
In MULTEXT-East notation, this is tantamount to
having character code -
(hyphen) act as a wildcard.
From/to string conversion
fromString :: String -> Maybe MsdSource
Converts a MULTEXT-East string notation into an Msd
datatype.
Returns Nothing
if string is not a valid MULTEXT-East string.
validString :: String -> BoolSource
Checks whether the string conforms to the MULTEXT-East specification.
Defined as:
validString = isJust . fromString
Morphosyntactic features
Constructors
Constructors
Qualificative | |
Indefinite | |
Possessive | |
OrdinalT |
Constructors
Progressive | |
Perfective |
Constructors
Constructors
CTSimple | |
CTRepetit | |
CTCorrelat | |
CTSentence | |
CTWords | |
Initial | |
NonInitial |
Constructors
Coordinating | |
Subordinating | |
Portmanteau |
data Definiteness Source
Instances
Constructors
Positive | |
Comparative | |
Superlative | |
ElativeD | |
Diminutive |
Constructors
Singular | |
Plural | |
Dual | |
Count | |
Collective |
Constructors
Preposition | |
Postposition |
Constructors
STNegative | |
STPositive |