concraft-hr-0.1.0.2: Part-of-speech tagger for Croatian

Safe HaskellNone
LanguageHaskell98

NLP.Concraft.Croatian.Morphosyntax

Description

Morphosyntax data layer in Croatian.

Synopsis

Documentation

packSent :: ListLike a => Tagset -> [a] -> Sent Word Tag Source

Given a tagset and a list of words it packs them into Sent data from, used by the tagging model. It is assumedd that all of the tags do not have any prior probabilites. If this was used on the training set the function wouldn't differentiate correct from possible tags.

packSentT :: ListLike a => Tagset -> [a] -> Sent Word Tag Source

Packs the training data to sentences with the first tag having the highest probability. Suitable for using on the training set.

addAnalysis :: Sent Word Tag -> [Set Tag] -> Sent Word Tag Source

Given a sentence and a list of tags for each word this function adds the tags.

extractSentences :: ListLike a => a -> [[a]] Source

Extracts sentences from a given input. Rarely used since it's not always the case that we can assume the sentences are separated only by two newline characters.

transformToConfig :: ListLike a => a -> a Source

Transforms a given string to a model suited string. Ex. Nsmnn -> N:s:m:n:n, or Vmp-sf -> V:m:p:9:s:f, all - to '9'.

data Word Source

Representation of a word.

Constructors

Word 

Fields

orth :: Text

Orthographic (plainly normal) form.

oov :: Bool

Indicates whether a word is out-of-dictionary or not. It is assumed that the word is out-of-dictionary if no tags were provided for the word. If additional analysis gives a non-empty set of possible tags this value should (and is in this tagger) change the value accordingly.

Instances

Eq Word 
Ord Word 
Show Word 
Generic Word 
ToJSON Word 
FromJSON Word 
Binary Word 
Word Word

Instance needed for the use of the concraft model.

type Rep Word 

class IsString a => ListLike a where Source

Used to allow use of same functions on lazy and strict inputs. It is assumed that the function behave as they do in Text, Text or String modules.

Methods

tcintersperse :: Char -> a -> a Source

tcmap :: (Char -> Char) -> a -> a Source

strict :: a -> Text Source

tcwords :: a -> [a] Source

tcsplitOn :: a -> a -> [a] Source

tcnull :: a -> Bool Source

tclines :: a -> [a] Source