This chapter provides an introduction to the DECtalk Software Text-To-Speech API services and a discussion of programming text-to-speech applications using the API services.
Topics include:
Table 2-1 -- Functions Listed by Category
Function | Purpose |
---|---|
Core API Functions | |
TextToSpeechStartup() | Initializes and starts up text-to-speech system. |
TextToSpeechSpeak() | Speaks text from a buffer. |
TextToSpeechShutdown() | Shuts down text-to-speech system. |
Audio Output Control Functions | |
TextToSpeechPause() | Pauses output. |
TextToSpeechResume() | Resumes output. |
TextToSpeechReset() | text-to-speech System is purged and output stopped. |
Blocking Synchronization Function | |
TextToSpeechSync() | Synchronizes to the text stream. |
Control and Status Functions | |
TextToSpeechSetSpeaker() | Selects one of nine speaking voices. |
TextToSpeechGetSpeaker() | Returns the last speaking voice to have spoken. |
TextToSpeechSetRate() | Sets the speaking rate of the text-to-speech system. |
TextToSpeechGetRate() | Gets the speaking rate of the text-to-speech system. |
TextToSpeechSetLanguage() | Sets the language to be used. |
TextToSpeechGetLanguage() | Returns the language in use. |
TextToSpeechGetStatus() | Gets status of text-to-speech System. |
TextToSpeechOpenWaveOutFile() | Opens a file for output. Text-To-SpeechSpeak writes audio data in wave format to this file. |
TextToSpeechCloseWaveOutFile() | Closes the specified wave file. |
TextToSpeechOpenLogFile() | Opens a log File. |
TextToSpeechCloseLog File() | Closes a log File. |
TextToSpeechOpenInMemory() | Produces buffered speech samples in shared memory. |
TextToSpeechCloseInMemory() | Returns the text-to-speech system to its normal state. |
TextToSpeechAddBuffer() | Adds a shared-memory buffer to the memory buffer list. |
TextToSpeechReturnBuffer() | Returns the current shared-memory buffer. |
TextToSpeechGetCaps() | Retrieves the capabilities of the text-to-speech system. |
Special Text-To-Speech Modes | |
Loading and Unloading a User Dictionary | |
TextToSpeechLoadUserDictionary() | Loads user dictionary. |
TextToSpeechUnloadUserDictionary() | Unloads user dictionary. |
The simplest application might use only these functions.
TextToSpeechSpeak("This will be spoken. ", TTS_NORMAL );
This text is spoken immediately by the system because it is terminated by a period and a space. These last two characters are one way to create a clause boundary.
TextToSpeechSpeak("This will be spok", TTS_NORMAL );
This produces output only after the following line of code executes to complete the phrase.
TextToSpeechSpeak("en. ", TTS_NORMAL );
Finally, a nonphrase string can be forced to be spoken by using the TTS_FORCE flag.
TextToSpeechSpeak("This will be spok", TTS_FORCE );
Note that the word spoken is not pronounced correctly in this case even if the final characters in the word spoken, (en), are queued immediately afterward.
The TTS_FORCE flag causes the previous line to be spoken before taking any subsequently queued characters into account.
It is important that all sentences are separated with a space character. To make sure of this, it is recommended that a space character is routinely included after the final punctuation in a sentence. An example of what will happen without this is shown below:
TextToSpeechSpeak("They are tired.", TTS_NORMAL ); TextToSpeechSpeak("I am Cold.", TTS_NORMAL );
Because there is no space, the Text-To-Speech system processes the following string:
"They are tired.I am Cold."
The string "tired.I" will be pronounced incorrectly because the system will treat it as one item instead of two words.
Table 2-2 -- Control and Status Functions
Function | Descriptions |
---|---|
TextToSpeechSetSpeaker() | Sets the speaker's voice (which becomes active at the next clause boundary). |
TextToSpeechGetSpeaker() | Returns the value of the last speaker to have spoken. This value cannot be the value previously set by the TextToSpeechSetSpeaker() function. |
TextToSpeechSetRate() | Sets the speaking rate, which becomes active at the next clause boundary. |
TextToSpeechGetRate() | Gets the speaking rate (the current rate setting is returned even if it has not been activated). |
TextToSpeechSetLanguage() | Sets the Text-To-Speech system language. (Currently, this must be TTS_AMERICAN_ENGLISH ). |
TextToSpeechGetLanguage() | Returns the current Text-To-Speech system language. |
TextToSpeechGetStatus() | Returns various Text-To-Speech system parameters, such as the number of characters in the text pipe, the ID of the wave output device, and a Boolean value that indicates whether the system is speaking or silent. |
TextToSpeechGetCaps() | Returns the capabilities of the Text-To-Speech system, which includes the version number of the system, the number of speakers, the maximum and minimum speaking rate, and the supported languages. |
Open | Close |
---|---|
TextToSpeechOpenWaveOutFile | TextToSpeechCloseWaveOutFile() |
TextToSpeechOpenLogFile() | TextToSpeechCloseLogFile() |
TextToSpeechOpenInMemory() | TextToSpeechCloseInMemory() |
The Text-To-Speech system must be in the startup state before calling any of the Open functions listed above. The corresponding Close functions return the system to the startup state.
When a buffer is completed, the buffer is returned to the application by sending a message to the callback function that corresponds to the callback function passed to the TextToSpeechStartup() function. A pointer to the returned TTS_BUFFER_T structure is contained in the LPARAM parameter of the message. The user is responsible for the allocation and freeing of memory for the following elements in the TTS_BUFFER_T structure: lpData, lpPhoneme array, and lpIndex array.
The TTS_BUFFER_T structure is considered completed when any one of the following conditions occurs:
o The sample buffer, which is pointed to by element lpData, is filled.
o The phoneme array is filled.
o The index mark array is filled.
o A TTS_FORCE is used in a call to the TextToSpeechSpeak() function.
The application must not modify any buffer passed to the Text- To-Speech system by function TextToSpeechAddBuffer() until the buffer is returned from the Text-To-Speech system in a message. The application then owns the buffer. If no buffers are available, the system blocks. If the application is processing relatively long passages of text, it is recommended that the application queue several buffers and then requeue each buffer after finishing with it so that the system is never idle.
A call to the TextToSpeechReset() function returns all buffers to the application. The TextToSpeechReturnBuffer() function is supplied to force the return of the current TTS_BUFFER_T structure, whether it is filled or not. This function might not be required by most applications. It is included so that an application can obtain the last buffer without forcing that buffer to be sent with the TTS_FORCE command in the TextToSpeechSpeak() function. This might be required, if the application performs its own buffer management.
The TTS_BUFFER_T structure and its elements are defined as follows:
typedef struct TTS_PHONEME_TAG { DWORD dwPhoneme; DWORD dwPhonemeSampleNumber; DWORD dwPhonemeDuration; DWORD dwReserved; } TTS_PHONEME_T;typedef TTS_PHONEME_T * LPTTS_PHONEME_T;
typedef struct TTS_INDEX_TAG { DWORD dwIndexValue; DWORD dwIndexSampleNumber; DWORD dwReserved; } TTS_INDEX_T;
typedef TTS_INDEX_T * LPTTS_INDEX_T;
typedef struct TTS_BUFFER_TAG { LPSTR lpData; LPTTS_PHONEME_T lpPhonemeArray; LPTTS_INDEX_T lpIndexArray; DWORD dwMaximumBufferLength; DWORD dwMaximumNumberOfPhonemeChanges; DWORD dwMaximumNumberOfIndexMarks; DWORD dwBufferLength; DWORD dwNumberOfPhonemeChanges; DWORD dwNumberOfIndexMarks; DWORD dwReserved; } TTS_BUFFER_T;
typedef TTS_BUFFER_T * LPTTS_BUFFER_T;
The index and phoneme arrays each contain a time stamp in the form of a sample number. This sample number is initialized at zero at startup and after each call to the TextToSpeechReset() function. The phoneme array also contains the current phoneme duration in frames. Each frame is approximately 6.4 milliseconds.