CSS3 Speech

Note

The CSS3 Speech module was not a recommendation at the time that EPUB 3 was finalized, but has since reach Candidate Recommendation status. Until an update to EPUB 3 occurs, all supported properties from the module must still be prefixed with -epub- (reading systems will automatically map the prefixed properties to the new official versions, so there will be no need to revisit old content).

The CSS3 Speech module provides additional text-to-speech (TTS) enhancement functionality. Unlike PLS lexicons and SSML markup, the Speech module properties are not focused on defining the correct pronunciation of words.

The primary property the CSS3 Speech module adds for enhancing TTS playback is speak-as. This property provides the ability to control whether the TTS engine will read each character (setting to spell-out) or number (digit) in a string out. (See Example 1 and Example 2.) TTS engines often use unreliable tests based on the apparent wordiness of acronyms to determine whether to voice them, but this property allows you to override that behavior.

The speak-as property also takes the complimentary values literal-punctuation and no-punctuation. The values, as expected, control whether the TTS engine will voice punctuation.

The module also includes the speak property, which provides the ability to control TTS rendering of content, regardless of whether the containing element is visible or not. Setting the none value disables rendering on an element, and setting the normal value enables.

The following table lists the remaining properties from the Speech module that are supported in EPUB 3. These properties are focused on non-prosodic aspects of TTS playback.

Additional EPUB 3-supported Speech module properties
	Description
`pause`	The `pause` property controls the amount of pause that occurs before and after the element that it is applied to. Pauses are typically used to identify transitions between major structures, such as between paragraphs and when new sections are beginning. TTS engines use punctuation to provide pauses within the flow of the narrative. The value of the `pause` property is a time value indicating the pause length. If only a single value is specified: `-epub-pause: 50ms` that time is applied both before and after the associated element. You can individually control the time to pause before and after by including a second time value: `-epub-pause: 50ms 0ms` The amount of pause specified occurs before any aural `cue` and `rest` at the start of the associated element, and after any `rest` and `cue` at the end of the element.
`cue`	The `cue` property provides the ability to uniquely identify elements with an aural sound. Cues are helpful in distinguishing new headings, for example, as pauses alone are not a good indicator. Note that the cue property will render the associated audio clip both before and after the heading if only a single value is specified: `-epub-cue: url('audio/ping.mp3');` Readers typically only expect a cue to signal the start, so use the `null` value to disable cues after the associated element has been rendered: `-epub-cue: url('audio/ping.mp3') null;` The aural cue occurs between any `pause` and `rest` at the start of the associated element, and between any `rest` and `cue` at the end of the element.
`rest`	The `rest` property controls the pause that occurs between the any aural cues and the rendering of the associated element, both before and after. The value of the `rest` property is a time value indicating the pause length. If only a single value is specified: `-epub-rest: 25ms` that time is applied both before and after the associated element. You can individually control the time to pause before and after by including a second time value: `-epub-rest: 25ms 0ms` The amount of rest specified occurs after any `pause` and `cue` at the start of the associated element, and before any `cue` and `pause` at the end of the element.
`voice-family`	The `voice-family` property provides control over the gender and type of voice used for TTS playback, allowing content producers to create more realistic TTS playback (e.g., alternating gender to match the character). Although it's possible to name the voice to use: `-epub-voice-family: 'Dave';` in practice, with the wide variety of devices an EPUB may be played on, such specificity is only so useful as it requires knowing the names of all voices available on all devices. Instead, it is better to request a voice using the pattern: age?, gender, integer? (where the question mark indicates the field is optional): `.king-lear { -epub-voice-family: old male 1; }` The age value may be `child`, `young` or `old`; the gender `male`, `female` or `neutral`; and, when specified, the integer indicates the ordinal position of the voice to use (i.e., when more than one matching voice is available).

Examples

Example 1 — Spelling out letters

<abbr class="spell">IBM</abbr>
<span class="spell">IOU</span>

.spell {
   -epub-speak-as: spell-out
}

Example 2 — Spelling out numbers

<span class="digits">911</span>
<span class="digits">416 555-0123</span>
<span class="digits">90210</span>

.digits {
   -epub-speak-as: digits
}

Example 3 — Voicing punctuation

<p>
   Example one is correctly punctuated as follows:
   <span class="punctuate answer">The Franks, 
   a warlike people of Germany, gave their 
   name to France.</span>
</p>

.punctuate {
   -epub-speak-as: literal-punctuation
}

Example 4 — Ignoring punctuation

<p>The telegram from Dr. King to President Kennedy
   read as follows:</p>
<blockquote>
   <pre class="silent">
   HOWEVER I AM CONVINCED THAT UNLESS SOME STEPS ARE TAKEN BY
   THE FEDERAL GOVERMENT TO RESTORE A SENSE OF CONFIDENCE IN
   THE PROTECTION OF LIFE, LIMB AND PROPERTY MY PLEAS SHALL FALL
   ON DEAF EARS AND WE SHALL SEE THE WORST RACIAL HOLOCAUST THIS
   NATION HAS EVER SEEN AFTER TODAYS TRAGEDY, INVESTIGATION WILL
   NOT SUFFICE.
   </pre>
   <cite><a 
      href="http://www.jfklibrary.org/Asset-Viewer/-crU2bLgN0CcGkys8dkuHg.aspx"
      >September 15, 1963 Telegram</a></cite>
</blockquote>

.silent {
   -epub-speak-as: no-punctuation
}

Example 5 — Adding pauses, cues and rests to headings

h1 {
   -epub-pause: 50ms 25ms;
   -epub-cue: url('audio/ping.mp3') none;
   -epub-rest: 10ms 0ms
}

Example 6 — Alternating voice gender

.male {
   -epub-voice-family: male
}

.female {
   -epub-voice-family: female
}

<p class="female">
   Alice: But I don't want to go among mad people.
</p>

<p class="male">
   The Cat: Oh, you can't help that.
   We're all mad here. I'm mad. You're mad.
</p>

Example 7 — Using different same-gender voices

.huck {
   -epub-voice-family: child male 1
}

.tom {
   -epub-voice-family: child male 2
}

<p class="tom">
   "Well—I—I—well, that ought to settle it, of course; 
   but I can't somehow seem to understand it no way.  
   Looky here, warn't you ever murdered AT ALL?"
</p>

<p class="huck">
   "No. I warn't ever murdered at all—I played it on 
   them. You come in here and feel of me if you don't 
   believe me."
</p>

Compliance References and Standards

EPUB 3 — CSS 3.0 Speech
CSS3 — Speech Module (working draft referenced by EPUB 3)

Frequently Asked Questions

Are CSS3 Speech properties supported at this time?

At the time of writing, no reading systems have appeared that support the CSS3 Speech properties. Please send a report if the situation changes and this page has not been updated.

Can I force the TTS engine to say acronyms instead of spell them?

The Speech module does not provide a way to tell an engine it must voice a capitalized term. When including an acronym like EPUB, you would have to use a lexicon or attach an SSML pronunciation attribute to absolutely ensure that it does not get spelled out.

Why do I need to control the voicing of punctuation?

Although most engines will voice significant pause points, such as colons, they will typically not render each punctuation point in a document as it would ruin the reading experience. There are times when it is critical to ensure that the reader is able to hear all the punctuation in a sentence or phrase, such as in grammar textbooks, programming guides and the like. (See Example 3.)

Accessible technologies also enable the pronunciation of all punctuation by default in elements such as pre and code. Although the benefit of reading all punctuation in computer code should be obvious, it's not always the case that preformatted text needs to such detailed rendering. Applying no-punctuation to a pre block of text ensures that it will be read without punctuation being announced.