PLS lexicons provide control over the text-to-speech (TTS) playback rendering on conforming reading systems. A lexicon file is like a dictionary or look-up guide, allowing the pronunciations defined in it to be used in place of the default rendering when matching words are encountered. Defining words in a lexicon ensures that readers hear your work played back as expected, not based on the heuristics applied by the TTS engine on their reading system.
Each PLS lexicon is an XML file with a root lexicon
element. Lexicons are comprised of one or
more lexeme
entries, each of which defines the word(s) to match in grapheme
element(s) and the replacement pronunciation to use in a phoneme
element. (See Example 1.)
The alias
element can also be used to replace one word with another. (See Example 5.)
The language of the lexicon and the phonetic alphabet used must both be defined on the root
lexicon
element.
PLS entries should be created for any complex word that is important to the publication and that a TTS engine is likely to mispronounce. The list includes, but is not limited to, proper names and nouns, technical, scientific and legal terms, and complex compound words. The default rendering for heteronyms can also be defined in a PLS lexicon so that only variations need to be handled by SSML tagging.
Note that PLS lexicons are not activated simply by being included in the EPUB container. You must
reference the applicable lexicon(s) from each content document in order for them to be applied to the
content. The hreflang
attribute should also always be set to the language of the referenced
PLS file. (See Example 6.)
Multiple lexicons can be attached to a content document to handle embedded foreign languages. (See Example 7.)
Localizations are not possible within a single PLS lexicon file, but you can attach multiple lexicons to voice words differently for different regions. (See the faq question below for more information.)
At the time of writing, no reading systems have appeared that support the new TTS enhancements in EPUB 3. Please send a report if the situation changes and this page has not been updated.
Although IPA is arguably the most widely recognized phonetic alphabet, that does not mean that it has full support even in existing synthetic speech engines. Some engines support only their own alphabets, for example. IPA is also less developer-friendly than X-SAMPA because it uses Unicode characters that require modifying most keyboard layouts to input, whereas X-SAMPA is ASCII-based. Internal workflows should be a determining factor at this time. The ultimate answer will depend on what engines are employed in reading systems.
Note that it is possible to translate one alphabet representation to the other, so work in either
alphabet shouldn't ever be lost
if there does turn out to be a clear winner and loser.
The need to be able to define case-sensitive pronunciations is clear, but how PLS lexicons are processed less so. The specification itself says nothing about case sensitivity of graphemes, with only a requirement for case-sensitive processing defined in an informative appendix. Until reading systems that support PLS lexicons appear, any answer is speculative, but assume case sensitivity because of the critical role it plays.
Note that you should also consider that certain terms will appear both in lower case and title
case in a publication without changing the pronunciation, and add grapheme
elements
for both cases:
<lexeme>
<grapheme>acetaminophen</grapheme>
<grapheme>Acetaminophen</grapheme>
<phoneme>@"sit@'mIn@f@n</phoneme>
</lexeme>
When case conflicts occur, use SSML in the markup to correct the pronunciation of the less common
term. For example, both spellings mobile
and Mobile
may refer to human mobility in
a document that studies age-related health issues in Mobile, Alabama. Defining the pronunciation
of Mobile
as ˈmoʊbaɪl
will cause the city name to be mispronounced (and likewise
the other way around).
Yes, if the rendering engine does not support voicing the specified language, the reader may get an error or the text may be silently skipped. Error handling in such situations cannot be guaranteed. Language-specific lexicons will typically not be loaded.
Not within a single PLS file. The phoneme
element does not allow an
xml:lang
to be attached to it. Multiple localized lexicons could be attached to
a content document that only specifies the stem language code, so that the reader's localization
preference setting can be used to determine the proper lexicon to apply (e.g., the content
document specifies it is en
and the lexicons specify en-US
and
en-GB
).
Care should be taken not to exclude readers by specifying localizations. If a reading system does not include a voice that can handle the localizations, the lexicon will not be loaded.
A better solution is to define one lexicon for all reading systems that can handle the
region-independent language. If the publication is written in US English, for example, it would
be better to use the default en
code for the standard pronunciation lexicon and
specify a locale only for targeted regions:
<html … xml:lang="en">
<head>
…
<link
rel="pronunciation"
href="lex/en.pls"
type="application/pls+xml"
hreflang="en" />
<link
rel="pronunciation"
href="lex/en-GB.pls"
type="application/pls+xml"
hreflang="en-GB" />
…
</head>
…
</html>
This way any reader with an English-language reading system will at least hear the correct US pronunciations.
The inclusion of the technologies in EPUB 3 was not to require a choice to be made; the technologies are meant to complement each other. PLS lexicons allow you to define a word once and have the TTS engine do the work of replacing it each time it occurs in the prose. SSML, on the other hand, provides the fine-grained control that is just not possible in a lexicon, at the price of having to tag each instance of a term that has to be replaced.