The ability to synthetically voice a publication is an important accessibility feature that many readers rely on, regardless of whether human narration is also provided (e.g., many readers prefer the faster playback that TTS engines make possible).
While basic playback is possible so long as a reading system includes TTS technology, or access to a similarly-enabled assistive technology, any complexity in the vocabulary used typically leads to mispronunciations by synthetic speech engines without enhancement.
EPUB 3 adds three new complimentary technologies to enable content authors to enhance the quality of TTS playback:
The Pronunciation Lexicon Specification defines an XML format for defining globally-applicable pronunciations. When words are encountered in the prose that match the defined entries, the provided pronunciation is used in place of the engine's default rendering. Lexicons provide a simple way to define pronunciations for words whose meanings do not change based on context.
The Synthetic Speech Markup Language (SSML) allows pronunciations to be embedded directly in the markup. When SSML attributes are encountered on elements, the provided pronunciation is used in place of either the engine's default rendering or a PLS entry. SSML can be used to define all pronunciations, but is better used as a compliment to PLS lexicons (e.g., to disambiguate heteronyms and ambiguous number forms).
The CSS3 Speech modules includes a grab-bag of properties that can be used to control playback. From providing control over the spelling out of words and numbers to inserting aural cues and pauses, these properties allow control of playback beyond the traditional enhancement of pronunciation.
The EPUB Samples Project contains the following publications that implement enhanced TTS functionality: