minimorph-0.1.5.0: English spelling functions with an emphasis on simplicity.

Portabilityportable
Stabilityexperimental
Maintainereric.kow@gmail.com
Safe HaskellNone

NLP.Minimorph.English

Contents

Description

Simple default rules for English morphology

Synopsis

Punctuation

commas :: Text -> [Text] -> TextSource

No Oxford commas, alas.

 commas "and" "foo bar"       == "foo and bar"
 commas "and" "foo, bar, baz" == "foo, bar and baz"

Numbers

cardinal :: Int -> TextSource

 cardinal 1 == "one"
 cardinal 2 == "two"
 cardinal 3 == "three"
 cardinal 11 == "11"

ordinalNotSpelled :: Int -> TextSource

 ordinalNotSpelled 1 == "1st"
 ordinalNotSpelled 2 == "2nd"
 ordinalNotSpelled 11 == "11th"

ordinal :: Int -> TextSource

 ordinal 1 == "first"
 ordinal 2 == "second"
 ordinal 3 == "third"
 ordinal 11 == "11th"
 ordinal 42 == "42nd"

Nouns and verbs

defaultNounPlural :: Text -> TextSource

Heuristics for English plural for an unknown noun.

 defaultNounPlural "egg"    == "eggs"
 defaultNounPlural "patch"  == "patches"
 defaultNounPlural "boy"    == "boys"
 defaultNounPlural "spy"    == "spies"
 defaultNounPlural "thesis" == "theses"

http:www.paulnoll.comBooksClear-English/English-plurals-1.html http:en.wikipedia.orgwikiEnglish_plural

defaultVerbStuff :: Text -> (Text, Text)Source

Heuristics for 3rd person singular and past participle for an unknown regular verb. Doubling of final consonants can be handled via a table of (partially) irrefular verbs.

 defaultVerbStuff "walk"  == ("walks",  "walked")
 defaultVerbStuff "push"  == ("pushes", "pushed")
 defaultVerbStuff "play"  == ("plays",  "played")
 defaultVerbStuff "cry"   == ("cries",  "cried")

defaultPossesive :: Text -> TextSource

Heuristics for a possesive form for an unknown noun.

 defaultPossesive "pass"        == "pass'"
 defaultPossesive "SOS"         == "SOS'"
 defaultPossesive "Mr Blinkin'" == "Mr Blinkin's"
 defaultPossesive "cry"         == "cry's"

Determiners

indefiniteDet :: Text -> TextSource

 indefiniteDet "dog"  == "a"
 indefiniteDet "egg"  == "an"
 indefiniteDet "ewe"  == "a"
 indefiniteDet "ewok" == "an"
 indefiniteDet "8th"  == "an"

wantsAn :: Text -> BoolSource

True if the indefinite determiner for a word would normally be an as opposed to a.

acronymWantsAn :: Text -> BoolSource

Variant of wantsAn that assumes the input string is pronounced one letter at a time.

 wantsAn        "x-ray" == False
 acronymWantsAn "x-ray" == True

Note that this won't do the right thing for words like SCUBA. You really have to reserve it for those separate-letter acronyms.

Acronyms

looksLikeAcronym :: Text -> BoolSource

True if all upper case from second letter and up.

 looksLikeAcronym "DNA"  == True
 looksLikeAcronym "tRNA" == True
 looksLikeAcronym "x"    == False
 looksLikeAcronym "DnA"  == False

startsWithAcronym :: Text -> BoolSource

True if the first word (separating on either - or space) looks like an acronym.

Sounds

hasSibilantSuffix :: Text -> BoolSource

Ends with a sh sound.

hasSemivowelPrefix :: Text -> BoolSource

Starts with a semivowel.

hasVowel_U_Prefix :: Text -> BoolSource

Starts with a vowel-y U sound

hasCySuffix :: Text -> BoolSource

Last two letters are a consonant and y.

hasCoSuffix :: Text -> BoolSource

Last two letters are a consonant and o.

isVowel :: Char -> BoolSource

Is a vowel.

isLetterWithInitialVowelSound :: Char -> BoolSource

Letters that when pronounced independently in English sound like they begin with vowels.

 isLetterWithInitialVowelSound 'r' == True
 isLetterWithInitialVowelSound 'k' == False

(In the above, r is pronounced are, but k is pronounced kay.)

isConsonant :: Char -> BoolSource

Is a consonant.