Title Page Preface Chapter 1 Chapter 2 Chapter 3 Chapter 4 Appendix A Appendix B


Chapter 4:
Advanced Voice Control Topics

This chapter provides an in-depth look at controlling voice characteristics within a DECtalk Software text file or application.

Topics include:

  • Using advanced voice modification commands

  • Text-tuning example

  • Developing an advanced speech application. DECtalk Software provides several advanced methods of controlling speech output in addition to changing the speaking voices and rearranging the text file format and speaking sequence as dscussed in the previous chapter. These include:

  • Changing rhythm, stress, and intonation with symbols

  • Increasing or decreasing the speaking rate with the Rate Selection command [:ra _]

  • Controlling speech-pause durations with the Comma Pause [:cp _] and Period Pause durations [:pp _] commands

    Changing Rhythm, Stress, and Intonation

    DECtalk Software uses stress and syntactic symbols to control aspects of rhythm, stress, and intonation patterns within a spoken text file. These symbols include punctuation marks such as commas, periods, and open and close parentheses. Punctuation marks are recognized by DECtalk Software as indicating special phrasing requirements. Table 4-1 and Table 4-2 list these symbols.

    Table 4-1 -- Stress Symbols

     Symbol   Name               Indicates                     
     [ ' ]    apostrophe         primary stress                
     [ ` ]    grave accent       secondary stress              
     [ " ]    quotation mark     emphatic stress               
     [ / ]    slash              pitch rise                    
     [ \ ]    backslash          pitch fall                    
    [ /  \ ]  slash and          pitch rise and fall           
              backslash                                        
    
    

    Table 4-2 -- Syntactic Symbols

     Symbol   Name               Indicates                   
     [ - ]    hyphen             syllable boundary            
     [ * ]    asterisk           morpheme boundary            
     [ # ]    number sign        compound noun                
     [ ) ]    close parenthesis  beginning of verb phrase     
     [ , ]    comma              clause boundaryClause        
                                 boundary: defining symbol    
     [ . ]    period             end of sentence              
     [ ? ]    question mark      end of question              
     [ ! ]    exclamation point  end of exclamation           
     [ + ]    plus sign          new paragraph                
                                                              
    
    

    Speaking Rate

    The default speaking rate is 180 words per minute. DECtalk Software speaking rates now range from 75 to 600 WPM. Speaking rates can beadjusted to be very slow, very fast, or anywhere in between by using the following commands:

    Valid speaking rates are between 75 and 600 in the [:ra ] command. Rates specified outside this range are limited to the nearest legal value.

    [:ra 120] Although the slowest possible rate is 75 wpm, 120 wpm is ideal for situations where material such as a phone number has to be copied down by a listener. Note that it might be frustrating to listen to extended speech at slow rates unless the listener is actually copying down each numeral.

    [:ra 160] This rate is moderate [160 wpm]. It sounds a little slow, but is sometimes preferred in certain examples. For example, when DECtalk Software is speaking math equations or long lists of acronyms.

    [:ra 180] This rate is the default rate for DECtalk Software (180 wpm). It is ideal for listening to continuous text under optimal conditions.

    [:ra 240] This rate is faster, (240 wpm). Practiced listeners might prefer to skim material at this rate. Inexperienced listeners might not understand every word at this rate.

    [:ra 350] This rate is very fast, (350 wpm). In fact, it is too fast to follow, but it does have applications in special circumstances where an individual needs to scan sections of text quickly.

    [:ra 550] This rate is the fastest usable rate. It is too fast for many people to follow, but it does have applications for individuals who want to scan text very quickly.

    Changes in the speaking rate influence the duration and the number of pauses in text, as well as the duration of individual phonemes. At rates below 140 wpm, DECtalk Software inserts pauses at all phrase boundaries and pauses, and inserts phonemes near the ends of phrases. At rates faster than 240 wpm, DECtalk Software deletes all pauses and shortens phonemes. (Near the beginning of phrases, phonemes are fairly short at both slow and fast speaking rates.)


    Adjusting Period and Comma Pause Durations

    At the default speaking rate (180 wpm), DECtalk Software pauses about half a second after a period in the text and about a sixth of a second after a comma. However, pause durations are adjusted automatically when you change the speaking rate.

    In some situations, you might prefer a pause after a period without changing the speaking rate. For example, to get DECtalk Software to read a list of words at a normal rate with five-second pauses after each word (to allow the listener to write them down), you can use one of the following commands and end each word with a comma or a period:

    [:pp 4500] Adds a period pause of 4500 ms (4.5 seconds) to the standard half-second pause that occurs after a period in text. The total pause between words is about five seconds. The accepted range for a period pause is from -380 to 30000 ms. A negative value shortens the standard period pause.

    [:cp 4800] Adds a comma pause of 4800 ms (4.8 seconds) to the standard sixth of a second pause that occurs after a comma in the text at normal speaking rate. The total pause between words separated by a comma is about five seconds. The accepted range for a comma pause is from -40 to 30000 ms. Values specified outside this range are limited to the nearest legal value.

    [:pp 0 :cp 0] Resets the period pause and comma pause to their normal default values.


    Text-Tuning Example

    Even though DECtalk Software allows for natural text-to-speech synthesis, the quality of speech can often be enhanced by giving it a more natural flow. Much of this tuning involves the strategic placement of commas and periods, which tell the application to pause, as a native speaker of English does when speaking. The spoken language and written text are different as spoken text generally does not contain information about pausing.

    The text that follows is presented twice, the first time as originally written, and the second time after phonemic and textual fixes were applied.


    Original Version

    [:np]

    A California Shaggy Bear Tale for Seven DECtalk Software Voices

    by Dennis Klatt

    [:np] Once upon a time, there were three bears.

    They lived in the great forest, and tried to adjust to modern times

    [:nh] I'm papa bear. I love my family but I love honey best.

    [:nb] I'm mama bear. Being a mama bear is a drag.

    [:nk] I'm baby bear and I have trouble relating to all of the demands of older bears.

    [:np] One day, the three bears left their condominium to search for honey. While they were gone, a beautiful young lady snuck into the bedroom through an open window.

    [:nw] My name is Wendy. My purpose in entering this building should be clear. I am planning to steal the family jewels.

    [:np] Hot on her trail was the famous police detective, Frank.

    [:nf] Have you seen a lady carrying a laundry bag over her shoulder?

    [:np] A woman kneeling with her left ear firmly placed against a large rock responded.

    [:nu] No. No one passed this way. I've been listening for earthquakes all morning, but have only spotted three bears searching for honey.


    Revised Version

    In this section, text from the original example has been enhanced with DECtalk Software embedded commands.

    [:np]

    Add periods to add brief pauses after the title and author.

    A California Shaggy Bear Tale for Seven DECtalk Software Voices.

    By Dennis Klatt.

    [:np] Once upon a time, there were three bears. They lived in the great forest and tried to adjust to modern times.

    Add commas to increase pause length and quotation marks for emphatic stress.

    [:nh] I'm papa bear. I love my family, but I love ["]honey best.

    [:nb] I'm mama bear. Being a mama bear is a drag.

    [:nk] I'm baby bear and I have trouble relating to all of the demands of older bears.

    [:np] One day, the three bears left their condominium to search for honey. While they were gone, a beautiful young lady snuck into the bedroom through an open window.

    [:nw] My name is Wendy. My purpose in entering this building should be clear. I am planning to steal the family jewels.

    Use a new paragraph symbol [+] to begin a new paragraph.

    [:np] [+] Hot on her trail was the famous police detective, Frank.

    [:nf] Have you seen a lady carrying a laundry bag over her shoulder?

    Add commas to increase pause length and phrasing.

    [:np] A woman, kneeling with her left ear firmly placed against a large rock, responded.

    Use pitch rise and fall symbols [/ \] and emphatic stress symbols [ ' ] to add pitch control and emphatic stress.

    [:nu] ["]No. No [/]one passed this [/ \]way. I've been listening for"["]earthquakes all morning, but have only spotted three bears searching for honey.


    Developing an Advanced Speech Application

    The development process described in this guide assumes that your application has full control of the text being spoken. However, if you are developing an application that reads arbitrary text (such as electronic mail messages), your task is more difficult because almost anything can appear in the text. You can put application-specific text filters in the controlling computer, rather than add many additional special cases (and switches to enable and disable them) to DECtalk Software.


    Developing an Electronic Mail-Reading Application

    You can write an electronic mail preprocessor to make the following text conversions before sending the text to DECtalk Software:

  • Parse the header boiler plate to remove extraneous information.

  • Add the new paragraph symbol [+] to each blank line between paragraphs if DECtalk Software is speaking paragraphs of text.

  • Create your own application-specific dictionary for words, such as proper names, that DECtalk Software mispronounces. If DECtalk Software is connected to a database containing names, consider adding a pronunciation field to the name record or entering phonemic text when appropriate. (DECtalk Software can handle many proper names and addresses using the [:pronounce name] or [:mode name ] commands.)

  • Scan the text for strings of numbers in a format understandable to your application but not to DECtalk Software. For example, if you can extract the time format from an electronic mail message, you can add code to your application to expand it to its "o'clock" form.

  • In many applications, the listener might want to write down number strings (such as prices or telephone numbers). Your application can scan the text for strings of numbers and, when found, send them to DECtalk Software in such a way that includes pauses at critical locations.

    For example:

    The number is, 1 (800) 5 5 5, 1 2 3 4. [:ra 120]

    That is, [_<300>] 1 (800), [_<500>] 5 5 5,

    [_<900>] 1 2 3 4. [:ra 180].

    The spaces between the numbers ensure that "five five five" is spoken rather than "five hundred fifty five." (You can also use the [:mode spell on] command. The slower speaking rate, [:ra 120], and the silence phonemes, [_<300>], [_<500>], [_<900>], of specified durations, were carefully selected to allow enough time for the listener to write down the entire number. Silence phonemes were positioned after the commas (that is, [_<300>] 1 (800), [_<500>]), to maintain appropriate intonation.

    As another example, if your application is required to speak sums of money (such as bank balances or item costs), you might code the text to say:

    Your balance is $244.05. That is, 2 4 4, [_<400>] point 0 5, [_<400>] dollars.

    When spelling an item out, your application might need to distinguish the case of letters. Consider using different voices to distinguish between uppercase and lowercase letters. For example:

    [:nf]Maynard [:nf]M[:nb]a y n a r d [:nf]Maynard.


    Optimizing the Quality of Spoken Text

    In some applications, it might be important to get a few sentences to sound natural and pleasing to the listener because they hear them more often. Therefore, you might want to improve the quality of a particular sentence. The following steps are suggested:

    1. Send the sentence to DECtalk Software and listen to it a number of times, focusing on each word to detect any mispronunciations.

    2. Change text to phonemic text for all mispronounced words.

    Note
    For words that have two pronunciations (homographs), see online help or Appendix B.

    DECtalk Software can choose the correct pronunciation by itself. For example, if you enter the following sentences:

    He produced a lot of REFUSE. He REFUSEd the produce.

    He INSERTS 5 INSERTS per minute. He DELIBERATEd DELIBERATEly for a long time.

    You can see how some of these words could be pronounced incorrectly. You can correct such mispronunciation by doing one of the following:

    Replace the correct spelling of the word with a clever misspelling.

    I red yesterday that . . .

    Spell the word phonetically

    I [r'ehd] yesterday that . . .

    Additionally, use the following steps to optimize spoken text.

    1. If the word is a compound, use a hyphenated spelling to help DECtalk Software see the two parts of the compound.

    The slide-show host . . .

    2. Replace the text version by a phonemic string. Use the commands and phonemic symbols, but make sure to place the lexical stress pattern correctly.

    Note
    Sometimes, a word does not sound quite right even when the best phonemic representation is selected. Usually, such subtle pronunciation defects are not correctable.

    3. Now that each word has been pronounced in the best possible way, listen to the total sentence rhythm and accent pattern. If it is not right, follow these steps.

    (a) If it sounds like there should be a short pause in a particular sentence location, but DECtalk Software says the sentence without a pause, insert a comma between the words in question.

    (b) If the wrong word is emphasized in the sentence, emphasize the word that is suppose to take the emphasis with the correct stress symbols .

    The ["] younger man is the trouble-maker, not the older one.

    (c) Use the pitch control symbols slash [/], backslash [\], and slash and backslash
    [/ \] to make final adjustments.

    (d) If none of these actions gives you a satisfactory sentence, you can still specify duration and fundamental frequency motions for all phonemes with the voice-control commands discussed fully in Chapter 5.


    Avoiding Common Errors

    When using DECtalk Software, try to avoid making two common errors by doing the following:

  • When making voice-selection changes, always return to the default voice you have chosen.

    If you forget to return DECtalk Software to the default voice after using one of the other voices, all future text uses the currently selected voice.

  • Enter a right bracket ( ] ) at the beginning of your text.

    If the [:phoneme arpabet speak] command is entered allowing phonemic input, it is possible for DECtalk Software to enter phonemic mode unintentionally if the text being spoken contains an unexpected left bracket ( [ ), or if you forget to enter a right ( ] ) bracket after a phonemic entry. DECtalk Software is left in a state where it will interpret all remaining text phonemically. For example:

    ] [:phone on] [:ra 220 [:nh] Ladies and Gentlemen

    ^ (right bracket ( ] ) is missing