Safe Haskell | Safe-Inferred |
---|---|
Language | Haskell2010 |
This module contains types and functions for working with the MDF dictionary format, used by programs such as SIL Toolbox. For more on the MDF format, refer to e.g. Coward & Grimes (2000), Making Dictionaries: A guide to lexicography and the Multi-Dictionary Formatter.
Synopsis
- newtype MDF v = MDF {}
- data MDFLanguage
- = English
- | National
- | Regional
- | Vernacular
- | Other
- fieldLangs :: Map String MDFLanguage
- parseMDFRaw :: String -> Either (ParseErrorBundle String Void) (MDF String)
- parseMDFWithTokenisation :: [String] -> String -> Either (ParseErrorBundle String Void) (MDF [Component PWord])
- errorBundlePretty :: (VisualStream s, TraversableStream s, ShowErrorComponent e) => ParseErrorBundle s e -> String
- componentiseMDF :: MDF [Component a] -> [Component a]
- componentiseMDFWordsOnly :: MDF [Component a] -> [Component a]
- duplicateEtymologies :: (v -> String) -> MDF v -> MDF v
MDF files
An MDF (Multi-Dictionary Formatter) file, represented as a list
of (field marker, whitespace, field value) tuples. The field marker
is represented excluding its initial slash; whitespace after the
field marker is also stored, allowing the original MDF file to be
precisely recovered. Field values should includes all whitespace to
the next marker. All field values are stored as String
s, with the
exception of Vernacular
fields, which have type v
.
For instance, the following MDF file:
\lx kapa \ps n \ge parent \se sakapa \ge father
Could be stored as:
MDF [ ("lx", " ", Right "kapa\n") , ("ps", " ", Left "n\n") , ("ge", " ", Left "parent\n") , ("se", " ", Right "sakapa\n") , ("ge", " ", Left "father") ]
data MDFLanguage Source #
The designated language of an MDF field.
Instances
Show MDFLanguage Source # | |
Defined in Brassica.MDF showsPrec :: Int -> MDFLanguage -> ShowS # show :: MDFLanguage -> String # showList :: [MDFLanguage] -> ShowS # | |
Eq MDFLanguage Source # | |
Defined in Brassica.MDF (==) :: MDFLanguage -> MDFLanguage -> Bool # (/=) :: MDFLanguage -> MDFLanguage -> Bool # |
fieldLangs :: Map String MDFLanguage Source #
A Map
from the most common field markers to the language of
their values.
(Note: This is currently hardcoded in the source code, based on the values in the MDF definitions from SIL Toolbox. There’s probably a more principled way of defining this, but hardcoding should suffice for now.)
Parsing
parseMDFRaw :: String -> Either (ParseErrorBundle String Void) (MDF String) Source #
Parse an MDF file to an MDF
, storing the Vernacular
fields as String
s.
parseMDFWithTokenisation :: [String] -> String -> Either (ParseErrorBundle String Void) (MDF [Component PWord]) Source #
Parse an MDF file to an MDF
, parsing the Vernacular
fields
into Component
s in the process.
Re-export
:: (VisualStream s, TraversableStream s, ShowErrorComponent e) | |
=> ParseErrorBundle s e | Parse error bundle to display |
-> String | Textual rendition of the bundle |
Pretty-print a ParseErrorBundle
. All ParseError
s in the bundle will
be pretty-printed in order together with the corresponding offending
lines by doing a single pass over the input stream. The rendered String
always ends with a newline.
Since: megaparsec-7.0.0
Conversion
componentiseMDFWordsOnly :: MDF [Component a] -> [Component a] Source #
As with componentiseMDF
, but the resulting Component
s contain
the contents of Vernacular
fields only; all else is
discarded. The first parameter specifies the Separator
to insert
after each vernacular field.
:: (v -> String) | Function to convert from vernacular field values to
strings. Can also be used to preprocess the value of the
resulting |
-> MDF v | |
-> MDF v |
Add etymological fields to an MDF
by duplicating the values in
lx
, se
and ge
fields. e.g.:
\lx kapa \ps n \ge parent \se sakapa \ge father
Would become:
\lx kapa \ps n \ge parent \et kapa \eg parent \se sakapa \ge father \et sakapa \eg father
This can be helpful when applying sound changes to an MDF file: the vernacular words can be copied as etymologies, and then the sound changes can be applied leaving the etymologies as is.