hyphenation-0.1: Configurable Knuth-Liang hyphenation

Portabilityportable
Stabilityprovisional
MaintainerEdward Kmett <ekmett@gmail.com>
Safe HaskellSafe-Infered

Text.Hyphenation

Contents

Description

Hyphenation based on the Knuth-Liang algorithm as used by TeX.

The implementation is based on Ned Batchelder's public domain hyphenate.py and simplified to remove the need for a manual exception list.

Synopsis

Hyphenate with a given set of patterns

hyphenate :: (Char -> Char) -> [String] -> String -> [String]Source

Builds a hyphenator given a character normalization function and a list of patterns.

Designed to be used partially applied to all but the last argument The resulting function can be used to break a word up into fragments where it would be legal to hyphenate the text.

The Knuth-Liang hyphenation algorithm isn't designed to find all such points, but it does find most of them, and in particular tries avoids ones where the hyphenation varies depending on the use of the word as, for instance either a noun or a verb.

 do en <- hyphenate toLower <$> readHyphenationPatternFile "en.hyp"
    return $ en "hyphenation"
 ["hy","phen","ation"]

Pattern file support

readHyphenationPatternFile :: String -> IO [String]Source

Load a file containing whitespace delimited patterns stripping out comments lines that start with #

Loading installed patterns

hyphenateLanguage :: String -> IO (String -> [String])Source

Read a built-in language file from the data directory where cabal installed this package.

(e.g. hyphenateLanguage "en" opens "/Users/ekmett/.cabal/lib/hyphenation-0.1/ghc-7.4.1/en.hyp" when run on the author's local machine)

Known patterns

hyphenateEnglish :: String -> [String]Source

 ghci> hyphenateEnglish "supercalifragilisticexpialadocious"
 ["su","per","cal","ifrag","ilis","tic","ex","pi","al","ado","cious"]

hyphenateFrench :: String -> [String]Source

 ghci> hyphenateFrench "anticonstitutionnellement"
 ["an","ti","cons","ti","tu","tion","nel","le","ment"]

hyphenateIcelandic :: String -> [String]Source

 ghci> hyphenateIcelandic "vaðlaheiðavegavinnuverkfærageymsluskúr"
 ["va\240la","hei\240a","vega","vinnu","verk","f\230ra","geymslu","sk\250r"]