Title Page Preface Chapter 1 Chapter 2 Chapter 3 Chapter 4 Appendix A Appendix B


Chapter 2:
Introduction to DECtalk Software

This chapter provides a general overview of DECtalk. Topics include:


Overview

DECtalk Software extends the capabilities of your workstation by turning text files into spoken words. It can accurately read ASCII text from a variety of sources, such as electronic mail and word processors, using a standard audio device for output. Nine different voices are provided and users can control voice pitch, rate of speech, and word or phrase emphasis.

DECtalk Software Features

DECtalk Software provides the following features:

Latest Version of Digital Speech Synthesis Technology

DECtalk Software contains the latest version of DECtalk Software speech synthesis. This incorporates a number of improvements from earlier versions of DECtalk Software and is a software-only version of DECtalk Software offered by Digital.

Letter Mode, Word Mode, and Clause Mode

DECtalk Software can immediately speak single characters without waiting for an entire clause to be buffered. This feature is useful in applications requiring tactile feedback for what was entered on the keyboard. It also provides normal clause buffering for natural speech. DECtalk Software can speak letters, words, phrases, clauses, paragraphs, and whole documents. DECtalk Software allows the application to terminate speech immediately instead of waiting for the buffered text to complete processing.

Short Command Strings

Many of the command strings, such as change rate, change voice, start, stop, and index marks can be abbreviated for greater ease of use in applications.

High-Quality Speech and Word Pronunciation

DECtalk Software speech retains its high quality. In addition, a number of improvements have been made in functionality and acoustic phonetic quality.

The accuracy of word pronunciation is higher than in any previous version of DECtalk Software. There have been significant improvements in the accuracy and quality of letter-to-phoneme rules. Also, DECtalk Software has a large built-in dictionary that is used in the accurate pronunciation of individual words as well as enhancing their rhythmic naturalness.


Pronunciation Heuristics

Certain heuristics have been improved and made more intelligent. For example, DECtalk Software is able to recognize and parse unpronounceable sequences such as uppercase initials (FBI, AAA, and so forth) in addition to the normal unpronounceable sequences such as those with no vowels (CBS or NBC, for example).

DECtalk Software API

The Text-To-Speech API is the Digital extension to Multimedia Services for Digital UNIX. You can use this API to write your own applications. You will need the DECtalk Software Development kit in order to access the APIs.

The API function set gives you a flexible method of manipulating DECtalk Software functionality from within your application. These functions perform a wide range of tasks associated with the Text-To-Speech system. See DECtalk Software Programmer's Reference Guide (QA-228AA-WZ.4.2A) for a complete list of API functions.


Voice-Control Commands

DECtalk Software programming aids include Voice-Control commands, also called inline commands. These commands can be used to perform simple voice-control operations, such as changing the speaking rate or speaking voice while DECtalk Software is speaking, or more complex operations, such as modifying the characteristics of each voice, controlling intonation and stress within written text, or creating special effects such as singing. Commands are inserted into ASCII text files displayed in one of the program applets or directly into the application sources through the API functions.

Commands have special syntax rules and components that you need to use when you insert them into files.


DECtalk Software Dictionaries

DECtalk Software has two pronunciation dictionaries: a large internal (built-in) dictionary and an optional user-defined dictionary. With the large built-in dictionary, developers can easily use many proper names and normally unpronounceable sequences, such as uppercase initials, in their applications. With the user dictionary build tool, developers can load application-specific words, or cultural- or language-specific terms into the user dictionary. A sample user-dictionary file is installed with the software.

DECtalk Software Components

The DECtalk Software components now installed on your system include:


Sample Applet

A sample applet, called speak, is bundled along with the DECtalk Software kit. This applet demonstrates the capabilities of DECtalk Software. A detailed description on the use of speak is provided in the next chapter.

Sample Programs

DECtalk Software comes with several sample programs. These are:


say Program

The say program is a command-line program that produces synthesized audio of the input ascii text. It has the following command line switches:


say [-h] [-s #] [-r #] [-d #] [file] [-a "text"]

-a "text" This command line switch is followed by the quoted string.
The text in the quoted string is spoken, at the end of which
the program returns to the Digital UNIX command prompt.

-d # Is used to select the audio output device.
-e # Is used to select the the output wave file format. Integers
1 to 3 are valid input to this option and they specify the
following:

1. PCM, 16 bit Mono 11 KHz format
2. PCM, 8 bit Mono 11 KHz format
3. Mu-law, 8 bit Mono 8 KHz format

-f <filename> Output wave file name
-h Displays the command line parameter list
-r # Speaking rate (75 - 650)
-s # Speaker number (1-9)
<filename> Name of an input ascii file to synthesize.


mailtalk Program

The mailtalk is a program applet included with DECtalk Software that announces the arrival of mail messages as they are delivered to your system. Depending on the options you select, mailtalk announces the sender of the message, its subject, or both. A more detailed explanation of this program is presented in next chapter.

aclock Program

aclock announces the time of the day. It takes the following command line parameters:

aclock [-h] [ # ]
where # is the interval in minutes
5 - every five minutes
15 - every fifteen minutes
30 - on the hour and half hour
60 - on the hour
-h - Displays the command line parameter list


User Dictionary Program (windict)

The user dictionary program, windict, is used to create special dictionary files. The dictionary file contains words which have special user-specified pronunciation rules. Dictionary work files are compiled into dictionaries that can then be loaded into the speak and say programs. More details of this tool are provided in the next chapter.

Unsupported Applications

The following unsupported applications are shipped with DECtalk Software 4.2A. Unsupported applications are provided to demonstrate the advanced capabilities of DECtalk Software. They are provided for demonstration purposes only and are not fully supported by Digital Equipment Corporation.

DECface

DECface is a computer-generated, synthetic face that synchronizes facial movements to synthesized speech provided by DECtalk. As DECtalk generates speech, DECface displays the facial expressions of a human actually speaking those words.

DECface offers the ability to develop a large variety of new applications by combining the audio functionality of a speech synthesizer with the graphical functionality of a computer-generated face. A synthetic character can give multimedia presentations, or monitor a system and report anomalies as a feedback agent.

DECface enhances DECtalk by providing an obvious and immediate visual feedback mechanism. In particular, multimedia projects involving direct user interaction can be enhanced to better attract and maintain the attention of viewers.

Specific information on how to invoke and use DECface can be found in the documents located in:

/usr/opt/DTKRT420/decface/docs
or by typing:

man DECface.


Emacspeak

Emacspeak uses text-to-speech extensively to provide emacs with access for the visually impaired. Emacspeak is context sensitive emacs extension that intelligently reads the contents of the screen rather than just scanning the screen and literally reading characters off the screen.

Information on how to use emacspeak is provided in the documents located in the directory:

/usr/opt/DTKRT420/emacspeak/docs
or by typing:

man emacspeak


DECtalk Software: How are the Components Used?

DECtalk Software applications and application-building components are targeted at two specific audiences: the application builder and the application user.

By the Programmer

As a DECtalk Software developer, you can use the DECtalk Software API calls to create a DECtalk Software application. The DECtalk Software API is made avilable in a separate product, the DECtalk Software Development kit.

Click here for Picture


By the Application User

The DECtalk application user accesses the application through the Motif windows environment or at the Digital UNIX command line. DECtalk Software also provides a CDE integration subset that can be installed on systems that support CDE. DECtalk Software provides several methods of control. The user can use the abbreviated command set provided with the application to control basic operations, such as, the speaking rate or the speaking voice. The user can also use the user dictionary to fine-tune the application's basic pronunciation and voice characteristics. Finally, the user can also embed in-line commands into text files to control DECtalk operations. Refer to the specific sections for more information on which method to use.

Click here for Picture


How DECtalk Software Works

DECtalk Software converts ASCII English language text into speech output through a speech synthesizer. There are two ways to feed text into the speech synthesizer: through the user interface or through the API. The flow of the text-to-speech process is explained below.

Figure: Flow of the DECtalk Software Text-to-Speech Conversion Process

Legend for Figure: Action of the DECtalk Software Module

  1. A sentence parser breaks the input stream into separate words and locates some clause boundaries (indicated by commas and other punctuation marks as well as by special words loaded in the DECtalk Software internal dictionary). The sentence parser also recognizes and deals with phonemic symbols and commands that you might have added to the input text.

  2. A word parser breaks words into their component parts, dividing words by their final pronounceable form. Strings of text that do not form pronounceable English words are spelled out letter by letter. A number formatter is used if the text contains numerals. The number formatter applies the rules for many common number formats and converts the numbers into English words. The number formatter also recognizes many common abbreviations, such as lb for pound.

  3. A dictionary lookup routine searches the pronunciation dictionaries. DECtalk Software has a built-in dictionary of many commonly used words. DECtalk Software also has a user dictionary that can be filled with words specific to an application. This dictionary and how to load it are described in Chapter 3. While this version of DECtalk Software has greater pronunciation accuracy than its predecessors, it may sometimes be necessary to send DECtalk Software the correct phonemics for words important for a particular application. This can be done by using the user dictionary.

  4. A phrase structure module recombines all phonemic output from the dictionary search and other modules. Durations of phonemes and pitch commands are computed for the clause, and appropriate sound variants are selected for those phonemes that can be pronounced in more than one way.

  5. A letter-to-sound module uses a set of English pronunciation rules to assign phonemic form and lexical stress patterns to words not found in the dictionary.

  6. The phoneme-to-voice module processes clauses passed from the phrase structure module and converts them to control signals for the speech synthesizer. This module modifies the clauses by changing the phonemes/allophones into parameters that determine the natural resonant frequencies of the vocal tract (formats), and sound source amplitudes. The control parameters are sent to the speech synthesizer for output.

  7. The Digital speech synthesizer computes a speech wave form with acoustic characteristics that are determined by the synthesizer control commands received.