CMU Pronouncing Dictionary
From Wikipedia, the free encyclopedia
| Developer(s) | Carnegie Mellon University |
|---|---|
| Stable release | 0.7a / 18 February 2008 |
| Available in | English |
| Development status | Maintained |
| License | Public Domain |
| Website | Homepage |
The CMU Pronouncing Dictionary (also known as cmudict) is a public domain pronouncing dictionary created by Carnegie Mellon University (CMU). It is used as the American lexicon for the Festival Speech Synthesis System and also for the CMU Sphinx speech recognition system. The latest release is 0.7a, which contains 133,746 entries (from 123,442 baseforms).
Contents |
[edit] Database Format
The database is distributed as a text file of the format word <two spaces> pronunciation. If there are multiple pronunciations available for a word, all subsequent entries are followed by an index in parentheses. The pronunciation is encoded using a modified form of the Arpabet system. The difference is stress marks on vowels with levels 0, 1, 2; not all entries have stress however. For example, the following pronunciations are available for encyclopedia:
ENCYCLOPEDIA AH0 N S AY2 K L AH0 P IY1 D IY0 AH0 ENCYCLOPEDIA(2) AH0 N S AY2 K L OW0 P IY1 D IY0 AH0
[edit] History
| Version | Release date [1] |
|---|---|
| 0.1 | 16 September 1993 |
| 0.2 | 10 March 1994 |
| 0.3 | 28 September 1994 |
| 0.4 | 8 November 1995 |
| 0.5 | No public release |
| 0.6 | 11 August 1998 |
| 0.7a | 19 February 2008 [2] |
[edit] Applications
The Unifon converter is based on the CMU Pronouncing Dictionary. The Natural Language Toolkit contains an interface to the CMU Pronouncing Dictionary.
[edit] References
- ^ ftp://ftp.cs.cmu.edu/project/speech/dict/
- ^ http://sourceforge.net/forum/forum.php?forum_id=787627
[edit] External links
- The current version of the dictionary is maintained at SourceForge.
- Homepage - includes database search
- RDF converted to Resource Description Framework by the open source Texai project.

