This chapter provides an in-depth look at controlling voice characteristics within a DECtalk Software text file or application.
Topics include:
Table 4-1
-- Stress SymbolsSymbol Name Indicates [ ' ] apostrophe primary stress [ ` ] grave accent secondary stress [ " ] quotation mark emphatic stress [ / ] slash pitch rise [ \ ] backslash pitch fall [ / \ ] slash and pitch rise and fall backslash
Table 4-2
-- Syntactic SymbolsSymbol Name Indicates [ - ] hyphen syllable boundary [ * ] asterisk morpheme boundary [ # ] number sign compound noun [ ) ] close parenthesis beginning of verb phrase [ , ] comma clause boundaryClause boundary: defining symbol [ . ] period end of sentence [ ? ] question mark end of question [ ! ] exclamation point end of exclamation [ + ] plus sign new paragraph
Valid speaking rates are between 75 and 600 in the [:ra ] command. Rates specified outside this range are limited to the nearest legal value.
[:ra 120] Although the slowest possible rate is 75 wpm, 120 wpm is ideal for situations where material such as a phone number has to be copied down by a listener. Note that it might be frustrating to listen to extended speech at slow rates unless the listener is actually copying down each numeral.
[:ra 160] This rate is moderate [160 wpm]. It sounds a little slow, but is sometimes preferred in certain examples. For example, when DECtalk Software is speaking math equations or long lists of acronyms.
[:ra 180] This rate is the default rate for DECtalk Software (180 wpm). It is ideal for listening to continuous text under optimal conditions.
[:ra 240] This rate is faster, (240 wpm). Practiced listeners might prefer to skim material at this rate. Inexperienced listeners might not understand every word at this rate.
[:ra 350] This rate is very fast, (350 wpm). In fact, it is too fast to follow, but it does have applications in special circumstances where an individual needs to scan sections of text quickly.
[:ra 550] This rate is the fastest usable rate. It is too fast for many people to follow, but it does have applications for individuals who want to scan text very quickly.
Changes in the speaking rate influence the duration and the number of pauses in
text, as well as the duration of individual phonemes. At rates below 140 wpm,
DECtalk Software inserts pauses at all phrase boundaries and pauses, and
inserts phonemes near the ends of phrases. At rates faster than 240 wpm,
DECtalk Software deletes all pauses and shortens phonemes. (Near the beginning
of phrases, phonemes are fairly short at both slow and fast speaking rates.)
At the default speaking rate (180 wpm), DECtalk Software pauses about half a
second after a period in the text and about a sixth of a second after a comma.
However, pause durations are adjusted automatically when you change the
speaking rate.
Adjusting Period and Comma Pause
Durations
In some situations, you might prefer a pause after a period without changing the speaking rate. For example, to get DECtalk Software to read a list of words at a normal rate with five-second pauses after each word (to allow the listener to write them down), you can use one of the following commands and end each word with a comma or a period:
[:pp 4500] Adds a period pause of 4500 ms (4.5 seconds) to the standard half-second pause that occurs after a period in text. The total pause between words is about five seconds. The accepted range for a period pause is from -380 to 30000 ms. A negative value shortens the standard period pause.
[:cp 4800] Adds a comma pause of 4800 ms (4.8 seconds) to the standard sixth of a second pause that occurs after a comma in the text at normal speaking rate. The total pause between words separated by a comma is about five seconds. The accepted range for a comma pause is from -40 to 30000 ms. Values specified outside this range are limited to the nearest legal value.
[:pp 0 :cp 0] Resets the period pause and comma pause to their normal
default values.
The text that follows is presented twice, the first time as originally written,
and the second time after phonemic and textual fixes were applied.
A California Shaggy Bear Tale for Seven DECtalk Software Voices
by Dennis Klatt
[:np] Once upon a time, there were three bears.
They lived in the great forest, and tried to adjust to modern times
[:nh] I'm papa bear. I love my family but I love honey best.
[:nb] I'm mama bear. Being a mama bear is a drag.
[:nk] I'm baby bear and I have trouble relating to all of the demands of
older bears.
[:np] One day, the three bears left their condominium to search for honey.
While they were gone, a beautiful young lady snuck into the bedroom through an
open window.
[:nw] My name is Wendy. My purpose in entering this building should be
clear. I am planning to steal the family jewels.
[:np] Hot on her trail was the famous police detective, Frank.
[:nf] Have you seen a lady carrying a laundry bag over her shoulder?
[:np] A woman kneeling with her left ear firmly placed against a large rock
responded.
[:nu] No. No one passed this way. I've been listening for earthquakes all
morning, but have only spotted three bears searching for honey.
Even though DECtalk Software allows for natural text-to-speech synthesis, the
quality of speech can often be enhanced by giving it a more natural flow. Much
of this tuning involves the strategic placement of commas and periods, which
tell the application to pause, as a native speaker of English does when
speaking. The spoken language and written text are different as spoken text
generally does not contain information about pausing.
Text-Tuning Example
[:np]
Original Version
[:np]
Add periods to add brief pauses after the title and author.
A California Shaggy Bear Tale for Seven DECtalk Software Voices.
By Dennis Klatt.
[:np] Once upon a time, there were three bears. They lived in the great forest and tried to adjust to modern times.
Add commas to increase pause length and quotation marks for emphatic stress.
[:nh] I'm papa bear. I love my family, but I love ["]honey best.
[:nb] I'm mama bear. Being a mama bear is a drag.
[:nk] I'm baby bear and I have trouble relating to all of the demands of older bears.
[:np] One day, the three bears left their condominium to search for honey. While they were gone, a beautiful young lady snuck into the bedroom through an open window.
[:nw] My name is Wendy. My purpose in entering this building should be clear. I am planning to steal the family jewels.
Use a new paragraph symbol [+] to begin a new paragraph.
[:np] [+] Hot on her trail was the famous police detective, Frank.
[:nf] Have you seen a lady carrying a laundry bag over her shoulder?
Add commas to increase pause length and phrasing.
[:np] A woman, kneeling with her left ear firmly placed against a large rock, responded.
Use pitch rise and fall symbols [/ \] and emphatic stress symbols [ ' ] to add pitch control and emphatic stress.
[:nu] ["]No. No [/]one passed this [/ \]way. I've been listening for"["]earthquakes all morning, but have only spotted three bears searching for honey.
You can write an electronic mail preprocessor to make the following text
conversions before sending the text to DECtalk Software:
Developing an Electronic Mail-Reading
Application
For example:
The number is, 1 (800) 5 5 5, 1 2 3 4. [:ra 120]
That is, [_<300>] 1 (800), [_<500>] 5 5 5,
[_<900>] 1 2 3 4. [:ra 180].
The spaces between the numbers ensure that "five five five" is spoken rather than "five hundred fifty five." (You can also use the [:mode spell on] command. The slower speaking rate, [:ra 120], and the silence phonemes, [_<300>], [_<500>], [_<900>], of specified durations, were carefully selected to allow enough time for the listener to write down the entire number. Silence phonemes were positioned after the commas (that is, [_<300>] 1 (800), [_<500>]), to maintain appropriate intonation.
As another example, if your application is required to speak sums of money (such as bank balances or item costs), you might code the text to say:
Your balance is $244.05. That is, 2 4 4, [_<400>] point 0 5, [_<400>] dollars.
When spelling an item out, your application might need to distinguish the case of letters. Consider using different voices to distinguish between uppercase and lowercase letters. For example:
[:nf]Maynard [:nf]M[:nb]a y n a r d [:nf]Maynard.
1. Send the sentence to DECtalk Software and listen to it a number of times, focusing on each word to detect any mispronunciations.
2. Change text to phonemic text for all mispronounced words.
Note
For words that have two pronunciations (homographs), see online help or Appendix B.
DECtalk Software can choose the correct pronunciation by itself. For example, if you enter the following sentences:
He produced a lot of REFUSE. He REFUSEd the produce.
He INSERTS 5 INSERTS per minute. He DELIBERATEd DELIBERATEly for a long time.
You can see how some of these words could be pronounced incorrectly. You can correct such mispronunciation by doing one of the following:
Replace the correct spelling of the word with a clever misspelling.
I red yesterday that . . .
Spell the word phonetically
I [r'ehd] yesterday that . . .
Additionally, use the following steps to optimize spoken text.
1. If the word is a compound, use a hyphenated spelling to help DECtalk Software see the two parts of the compound.
The slide-show host . . .
2. Replace the text version by a phonemic string. Use the commands and phonemic symbols, but make sure to place the lexical stress pattern correctly.
Note
Sometimes, a word does not sound quite right even when the best
phonemic representation is selected. Usually, such subtle pronunciation defects
are not correctable.
3. Now that each word has been pronounced in the best possible way, listen to the total sentence rhythm and accent pattern. If it is not right, follow these steps.
(a) If it sounds like there should be a short pause in a particular sentence location, but DECtalk Software says the sentence without a pause, insert a comma between the words in question.
(b) If the wrong word is emphasized in the sentence, emphasize the word that is suppose to take the emphasis with the correct stress symbols .
The ["] younger man is the trouble-maker, not the older one.
(c) Use the pitch control symbols slash [/], backslash [\], and slash
and backslash
[/ \] to make final adjustments.
(d) If none of these actions gives you a satisfactory sentence, you can still
specify duration and fundamental frequency motions for all phonemes with the
voice-control commands discussed fully in Chapter 5.
When using DECtalk Software, try to avoid making two common errors by doing the
following:
Avoiding Common
Errors
If you forget to return DECtalk Software to the default voice after using one of the other voices, all future text uses the currently selected voice.
If the [:phoneme arpabet speak] command is entered allowing phonemic input, it is possible for DECtalk Software to enter phonemic mode unintentionally if the text being spoken contains an unexpected left bracket ( [ ), or if you forget to enter a right ( ] ) bracket after a phonemic entry. DECtalk Software is left in a state where it will interpret all remaining text phonemically. For example:
] [:phone on] [:ra 220 [:nh] Ladies and Gentlemen
^ (right bracket ( ] ) is missing