pronounce: A library for interfacing with the CMU Pronouncing Dictionary
Text.Pronounce is a Haskell library for interfacing and
CMU Pronouncing Dictionary.
It is based off of Allison Parrish's
python library pronouncing
, and it
exports much of the same functionality. The underlying data
structure that I used for representing the dictionary was a
Map
from entries to lists of their possible phones as
represented in the CMU dict. Many functions rely on access
to the CMU dict and may return more than one result (more on the
layout of the cmu dict later), so I decided to encompass this underlying
state of the dictionary by using the ReaderT
Monad
Transformer with the List
Monad embedded inside it.
In order to properly use this library, a basic understanding of the CMU Pronouncing Dictionary is assumed. Basically, the dictionary maps English words to their pronunciations transcribed using ARPAbet. This transcription reduces each word to a sequence of phones (vowel/consonant sounds) with stresses indicated by numbers at the ends of vowels. In addition, since some words can have multiple pronunciations, there can be multiple entries for a word:
CONSOLE K AH0 N S OW1 L CONSOLE(1) K AA1 N S OW0 L
Most users need not worry about the actual syntax of the
cmu dict; however, and should merely note that such an
entry in the CMUdict
would consist of the mapping from
the Entry
"CONSOLE" to some [Phones]
, a list of possible
sequences of phones for this word (stresses included). For
a better description of the actual cmu pronouncing
dictionary, I recommend visiting
the official website
or simply looking through
the cmu dict itself.
When working with this library, the default setting is to load
the dictionary from an included binary file, but the user
has the option to parse the dictionary from a unicode text
file, or encode the text file into binary themselves. For
this last purpose, I included the script I originally used
to encode the dictionary into a binary in the examples
folder.
Finally, I would like to note that
Text.Pronounce.ParseDict
operates on utf8 encoded files,
due to compatibility with Text
, which is utf encoded,
despite the fact the original CMU Pronouncing Dictionary
uses latin1 encoding. Because of this, if the user wants to
use a version of the CMU Dictionary other than the included
one, they must change to encoding to utf before parsing.
[Skip to Readme]
Downloads
- pronounce-1.2.0.0.tar.gz [browse] (Cabal source package)
- Package description (as included in the package)
Maintainer's Corner
For package maintainers and hackage trustees
Candidates
Versions [RSS] | 1.1.0.1, 1.1.0.2, 1.1.0.3, 1.2.0.0 |
---|---|
Change log | ChangeLog.md |
Dependencies | base (>=4.10 && <4.12), binary (>=0.8.4 && <0.9), containers (>=0.5 && <0.6), filepath (>=1.4 && <1.5), mtl (>=2.2 && <2.3), safe (>=0.3 && <0.4), text (>=1.2 && <1.3) [details] |
License | BSD-3-Clause |
Author | Noah Goodman |
Maintainer | ngoodman@uchicago.edu |
Category | Text |
Home page | https://github.com/buonuomo/Text.Pronounce |
Source repo | head: git clone https://github.com/buonuomo/Text.Pronounce.git |
Uploaded | by NoahGoodman at 2018-08-23T13:04:28Z |
Distributions | |
Reverse Dependencies | 1 direct, 0 indirect [details] |
Downloads | 2437 total (19 in the last 30 days) |
Rating | (no votes yet) [estimated by Bayesian average] |
Your Rating | |
Status | Docs available [build log] Last success reported on 2018-08-23 [all 1 reports] |