multext-east-msd-0.1.0.4: MULTEXT-East morphosyntactic descriptors

Portabilityportable
Stabilityexperimental
MaintainerJan Snajder <jan.snajder@fer.hr>
Safe HaskellSafe-Inferred

Data.MultextEastMsd

Contents

Description

Implementation of the MULTEXT-East morphosyntactic descriptors.

MULTEXT-East encodes values of morphosyntatic attributes in a single string, using positional encoding. Each attribute is represented by a single letter at a predefined position, while non-applicable attributes are represented by hyphens. For example, Ncmsg denotes a common noun (Nc) in masculine singular genitive (msg) case. For details, refer to http://nl.ijs.si/ME.

Currently, only MULTEXT-East Version 3 is supported. MULTEXT-East Version 3 covers morphosyntactic descriptions for Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian. For details, refer to http://nl.ijs.si/ME/V3/.

Usage example:

>>> let Just d1 = fromString "Ncmsg"
>>> pos d1
Noun
>>> features d1
[NType Common,Gender Masculine,Number Singular,Case Genitive]
>>> let d2 = unset NType d1
>>> toString d2
"N-msg"
>>> d1 == d2
False
>>> d1 =~= d2
True

Synopsis

Datatype constructor

data Msd Source

Instances

msd :: PoS -> [Feature] -> MsdSource

Constructs a morphosyntactic descriptor (an abstract Msd datatype) of a specified part-of-speech and with specified features (attribute-value pairs). Duplicated attributes and attributes not applicable to the given part-of-speech are ignored.

Getting and setting values

type Attribute a = a -> FeatureSource

get :: Enum a => Attribute a -> Msd -> Maybe FeatureSource

Gets the value of a specified attribute.

set :: [Feature] -> Msd -> MsdSource

Sets the specified features (attribute-value pairs). Duplicated attributes and attributes not applicable to the given part-of-speech are ignored.

unset :: Enum a => Attribute a -> Msd -> MsdSource

Unsets the value of a specified attribute.

check :: [Feature] -> Msd -> BoolSource

Checks whether the attributes are set to the specified values.

features :: Msd -> [Feature]Source

Returns the features (attribute-value pairs) of a Msd.

pos :: Msd -> PoSSource

Returns a part-of-speech (PoS value) of an Msd.

Wildcard matching

(=~=) :: MsdPattern a => a -> a -> BoolSource

A wildcard-matching operator between two Msd patterns. Relation msd1 =~= msd2 holds iff msd1 and msd2 are of the same part-of-speech and the attributes common to msd1 and msd2 have identical values. The attributes of msd1 that are not set in msd2 (and conversely) are ignored in the comparison. In MULTEXT-East notation, this is tantamount to having character code - (hyphen) act as a wildcard.

From/to string conversion

toString :: Msd -> StringSource

Converts an Msd datatype into a MULTEXT-East string notation.

fromString :: String -> Maybe MsdSource

Converts a MULTEXT-East string notation into an Msd datatype. Returns Nothing if string is not a valid MULTEXT-East string.

validString :: String -> BoolSource

Checks whether the string conforms to the MULTEXT-East specification. Defined as: validString = isJust . fromString

Morphosyntactic features

data MForm Source

Constructors

Digit 
Roman 
Letter 
Both 
MForm_ 
Approx 

data NType Source

Constructors

Common 
Proper 

data Person Source

Constructors

First 
Second 
Third 

data Voice Source

Constructors

Active 
Passive 

data VType Source

Constructors

Main 
Auxiliary 
Modal 
Copula 
Base