***********************************************************************
    IBM TEXT-TO-SPEECH RUN TIME KIT 
    Version 6.6.1.0
    Readme (win32.readme.6.6.1.0.txt)
    Copyright IBM Corporation, 2002.  All Rights Reserved 
***********************************************************************


CONTENTS
--------
 1.  Company
 2.  Product
 3.  Version 
 4.  Description
 5.  Contact Information
 6.  Upgrade Information
 7.  What's New
 8.  Installation Requirements
 9.  End-User Installation Instructions
10.  ISV Installation Instructions
11.  Working with Concatenative Voices
12.  Uninstall Instructions
13.  General Limitations and Comments 
14.  Known Problems & F.A.Q.
15.  Developer Notes
16.  Memory and Performance Tools
17.  Logging Utilities
18.  Trademark Information



1.  COMPANY
-----------
    International Business Machines Corporation (IBM)


2.  PRODUCT
-----------
    IBM Text-to-Speech Run Time Kit


3.  VERSION
-----------
    IBM Text-to-Speech Run Time Kit, Version 6.6.1.0 


4.  DESCRIPTION 
----------------
IBM Text-to-Speech Run Time Kit, provides the speech synthesis engine
and components necessary for applications to produce speech. IBM 
Text-to-Speech Run Time Kit, Version 6.6.1.0 produces speech from 
recordings of units of human speech. These units (possibly phonemes, 
syllables, words, or phrases) are then combined (concatenated) 
according to linguistic rules formulated from analyzed text. When these 
recorded speech units are entire phrases or sentences, the output can 
be very natural, human-sounding speech.

The Speech synthesis engine and data include capability for a 
concatenative voice dataset representation as well as for computer 
synthesized voice representation known as formant synthesis. The 
concatenative voice is derived from a professional speaker, speaking a 
particular language and dialect, recorded at a particular sampling 
rate. When a client program changes languages, and it is doing 
concatenative synthesis, a new voice dataset may have to be loaded into 
memory from disk, if it is not already cached in memory from previous 
usage. 

The system will automatically choose concatenative synthesis if a voice 
data set is available for the language, voice, and sample rate that you 
select. For example, if you are using English at 8KHz with voice 1 and 
U.S. English voice 1 at 8Khz has been installed, then the system will 
automatically do concatenative synthesis. Otherwise, the system will do 
formant synthesis. 

When concatenation is being done, ECI voice selections appear to the 
concatenative engine as requests to switch between already-loaded voice 
datasets, while voice attribute settings appear as changes in the 
phonetic and acoustic data that it receives. 

Generation 3
A Generation 3 voice is built using a significantly different process 
compared to previous releases. The process uses new algorithms for voice 
creation and synthesis.

The Generation 3 voices vary in size from 280 to 330MBs depending on the 
language and speaker. In spite of the size increase, performance is 
comparable to that of the previous release.


5.  CONTACT INFORMATION
-----------------------
Please visit our Web site for enhancements and updates to Text-to-Speech.

    http://www.software.ibm.com/speech/dev


6.  UPGRADE PATH TO FULL VERSION
--------------------------------
The full version is currently included.


7.  WHAT'S NEW
--------------
Improved voice quality of our concatenative text-to-speech is achieved
with the creation of Generation 3 voices and the use of algorithm enhancements.
The API for this release is unchanged from the previous release, Version 6.4.1.0.
Changes have been made to the interface for inilog and inicache.

Generation 3 Voices:
- US English (Male & Female)
- French (Male)
- German (Male)

Concatenative Voices for:
- Italian (Male)
- Norwegian (Male)
- Swedish (Male)
- French (Female)

Also new in this release:
- User Modifiable Prosody
- Enhanced Logging
- Enhancements to Synthesis Algorithms (e.g. pitch smoothing)
- Improved Phonetization
- German & French E-mail Filters
- Enhanced Cache Administration


8.  INSTALLATION REQUIREMENTS
-----------------------------
Hardware: 
Formant
- Processor performance equivalent to Intel Pentium 133MHz with MMX 
  with 256K L2 cache
- 48MB of RAM 
- 10MB available hard disk space
- Compatible 16 bit sound card 
- CD-ROM drive 
Note: Formant functionality is supported under:
      Windows 98
      Windows 2000
      Windows NT 4.0
      Windows Millennium
      Windows XP

Concatenative
- Processor performance equivalent to Intel Pentium III 266MHz
- 48MB of RAM plus additional RAM per Concatenative Voice loaded as specified below.
- 10MB available hard disk space + Voice requirements specified below 
- Concatenative Voice requirements (hard disk space & RAM):
   280MB for United States English (Male) 
   330MB for United States English (Female) 
   320MB for French (Male)
   300MB for German (Male)
   150MB for all other voices
- Compatible 16 bit sound card 
- CD-ROM drive 
Note: Concatenative functionality is only supported under:
      Windows 2000 with Service Pack 1 
      Windows NT 4.0 with Service Pack 6
      Windows XP


9.  END-USER INSTALLATION INSTRUCTIONS
--------------------------------------
Run setup.exe from the installation media.
Follow the instructions presented to you.
You may be prompted to install concatenative voices.
Select the voices to be used with concatenative voice synthesis.


10.  ISV INSTALLATION INSTRUCTIONS
----------------------------------
If you are deploying applications using the IBM Text-to-Speech Run Time 
Kit, you must obtain a license from IBM for redistribution.
In addition, you will want to integrate our product installation with 
your product's installation program. You will need to copy the 
redistributable TTS driver to your installation media and invoke 
setup.exe.
The IBM Text-to-Speech Run Time Kit installation program setup.exe, takes 
the following command line arguments: 

setup.exe [installPath] [/silent] [/hideaddremove] [/nr] [/ns] 
[/nl] [/nk] [-SMS] [/statusnone] [/statusold] [/concatall] 
[/concatnone] -lXXXX 


-l (Lower Case L) requires the the following XXXX language code

0003-Catalan   0005-Czech      0007-German
0008-Greek     0009-English    000a-Spanish   000b-Finnish
000e-Hungarian 0010-Italian    0011-Japanese  0012-Korean
0013-Dutch     0014-Norwegian  0015-Polish    0019-Russian
001a-Croatian  001b-Slovak     001d-Swedish   001e-Thai
001f-Turkish   0021-Indonesian 0024-Slovenian 002d-Basque
0404-Chinese (Taiwan) 040c-French (Standard)0416-Portuguese (Brazilian)
0804-Chinese (PRC) 0816-Portuguese (Standard) 0c0c-French (Canadian)

**Note due to an InstallShield limitation, if you are using DoInstall 
you must 
specify the same language as the parent installation.  See IS document 
Q144122.

<Installpath> can contain spaces and is a fully qualified path.  No 
quotes 
should be placed around the path.  Path will be ignored if TTS is 
already on the system.  If a path is provided on the command line, the 
choose directory dialog will not be shown.

/silent
Prevent everything except the path dialog from appearing.  If voice 
data is detected it too will ask which voices to install regardless of 
this parameter.

/hideaddremove
Deletes the Add/Remove program entry from the control panel.

/nr 
No reboot message and subsequent reboot. If a calling application 
executes our install with a GUI, the calling install may perform 
additional logic.  The calling install should then reboot if TTS 
requests.  Please see appendix 2 for how to determine whether TTS 
requires a reboot.  TTS functionality will not work until the requested 
reboot is carried out.  If the /silent option is used /nr is redundant.

[-SMS] 
This switch prevents a network connection and Setup.exe from closing 
before the installation is complete. The switch works with 
installations originating from a Windows NT server over a network. 
Please note that SMS must be uppercase.
This switch is case-sensitive. 

/statusold
By default, the TTS install will show a large progress bar dialog box.
To display the small dialog box, use the /statusold option.

/statusnone
To turn off the status box altogether, use the option /statusnone.

/concatall
Install all concatenative voices. Check return codes for out of space.

/concatnone
To not install any of the concatenative voices.


[Redundant but still supported for backwards compatibility]
/nk do not hide add remove (now default behavior)
/nl no license (no license now packaged).
/ns (silent install)

*Please note the language parameter is not optional. A minimal amount 
of change is required to make old installations work.


11.  Working with Concatenative Voices
--------------------------------------
During installation you may install concatenative voices from the 
selection presented to you.  Due to disk space issues or for periodic 
updates, you may wish to add, remove, or relocate a concatenative voice.
To add a voice, rerun the installation selecting the voice you wish to 
add. To remove a voice you must unregister the voice then manually 
delete it from the 
<INSTALLATION DIRECTORY>\voices\<LANGUAGE>\<VOICENUMBER> 
directory. 
To relocate a voice or update a voice from a downloaded file you must 
register the location of the voice using the inivoice.exe utility. 

inivoice.exe [-u] <VOICENUMBER> <QUALIFIED PATH TO SYNTHINFO FILE>

For example, to move voice 1 from TTS's default installation path 
to F:\TTSVoices\us\1, move the data files and then invoke the 
following command:

C:>inivoice.exe 1 "F:\TTSVoices\us\1at8000KHz_1_0\synthinfo"

To unregister a voice with the system use the -u command.

C:>inivoice.exe -u 1 "F:\TTSVoices\us\1at8000KHz_1_0\synthinfo"


Note: Concatenative voices allow the following parameters to be 
adjusted at run time:
   - Volume
   - Pitch Baseline*
   - Speed
   - Pitch Fluctuation*
* Applies only to some voices. All Generation 3 voices support pitch baseline adjustments.

The following parameters are not changeable for concatenative voices:
   - Gender
   - Sample Rate (see section 4 above)
   - Head Size
   - Roughness
   - Breathiness

Note: The pause annotation can only be executed within a sentence. 

If a change is executed to one of the above (not changeable 
parameters), no error will occur and the voice synthesis will not 
change.

In concatenative TTS, when you change languages, the voice
characteristics are set to the default values for the currently 
active voice.  As a result, if you've modified the speed or volume,
and do a language change, the speed and volume will revert to the
default for the voice.  


12.  UNINSTALL INSTRUCTIONS
---------------------------
To uninstall the Text-to-Speech Run Time Kit: 

  Open Control Panel 
  Select Add Remove Programs
  Select the entry for IBM Text-to-Speech Runtime (for the appropriate 
  language)

You will be guided through the uninstall process.  


13.  GENERAL LIMITATIONS AND COMMENTS
-------------------------------------
This section contains information that is not specific to any 
particular element of the Text-to-Speech Run Time Kit but is general or 
generic in nature. It is very important to heed these warnings and 
follow the instructions given to avoid abnormal or unpredictable 
results.

*  Currently, only 8 KHz concatenative voices are provided. 
   Application programmers requiring higher quality audio should 
   upgrade their voice datasets.  For more information visit the IBM 
   Text-to-Speech home page.

*  Version 6.6.1.0 supports the following languages with 
   formant voices (Note: languages with a * denote formant and 
   concatenative voice support):
   
   Brazilian Portuguese*
   French*
   Canadian French* 
   Finnish
   German*
   Italian*
   Mexican Spanish*
   Spanish*
   United States English*
   United Kingdom English*
   Korean  
   Cantonese
   Chinese - Simplified*
   Chinese - Traditional*
   Japanese*
   Norwegian (Male Only)
   Swedish (Male Only)

*  Version 6.6.1.0 supports all 6.4.1.0 languages and voices. 
   Note: Only the following concatenative voices are shipped with this release:

   United States English (Male and Female)
   French (Male and Female)
   German (Male)
   Italian (Male)
   Norwegian (Male Only)
   Swedish (Male Only)

   To use a concatenative voice not shipped with this release, install the desired
   voice using version 6.4.1.0.

*  Currently, the included e-mail filter is only available for the 
   English, French, and German languages.

*  The email filter included with IBM Text-to-Speech recognizes the 
   following keywords in an email message:

Keyword                      Action
-------                      ------
These keywords are recognized by all the e-mail filters:

Subject:	                 Parse out the subject of the message 
                              and return a new subject string in the
                              current language to the client application.
From:	                       Parse out the sender of the message 
                              and return a new string in the current
                              language to the client application.
Date:	                       Parse out the date in the message 
                              and return a new string in the 
                              current language with that date to 
                              the client application.
                              
In addition, the German e-mail filter recognizes the following additional keywords:

Betreff:                     Parse out the subject of the message
                              and return a new subject string to the 
                              client application.
Von:                         Parse out the sender of the message 
                              and return a new string to the client 
                              application.
Datum:                       Parse out the date in the message
                              and return a new string with
                              that date to the client application.
Gesendet am:                 Parse out the date in the message
                              and return a new string with
                              that date to the client application.
Sent:	                     Parse out the date in the message 
                              and return a new string in the
                              curent language with that date to 
                              the client application.



While the French e-mail filter recognizes the following keywords:

Objet:                       Parse out the subject of the message
                              and return a new subject string to the 
                              client application.
De:                          Parse out the sender of the message 
                              and return a new string to the client
                              application.
Reu:                        Parse out the date in the message 
                              return a new string with
                              that date to the client application.
Envoy:                      Parse out the date in the message
                              and return a new string with 
                              that date to the client application.
   
All other keywords are filtered out by default.

*  The included e-mail filter will also filter the following "emoticons"
   from messages:

(R)   (C)   :-)   :-(   :-]   :)    ;)    :-#|  :(   :->   :-<   :-\\  
(-:   >:-<  :-|   :-o   :-c   |-)   |-O   :-#   :-%   :-&  :-'|  :-)'  
:-)8  :-* :-/   :-:   :-?   :-@   (:I   :-[   *:o)  +-(:-).-)  <:I   
@:I   [:-|] 8-#  8:-)  }(:-( :-{   :-{(  :-}   :-O   :-6   :-8(  :-9  
:-D   :-e   :-i   :-p :-t   :-v   ::-)  8-)   :<|   :=)   :>)   :~)   
;-)  %-)   (-)   (:-)  )8-) *-(   *<|:-)-:-)  ;-\\  =:-)  [:-)  O-)   
8-|   {(:-){:-)  <g>   <G>   

*  The eciUpdateFilter function for the included e-mail filter only 
   supports   changing the behavior for the "From:", "Date:", and 
   "Subject:" fields. 
   
*  The Text-to-Speech SDK includes a file "maildict.dct" that includes 
   translations for common e-mail jargon and abbreviations.  For best 
   results when processing  e-mail messages, this dictionary file 
   should be used in conjunction with the included e-mail filter.
   

=========
inifilter

The inifilter tool registers and unregisters filters which are used
as preprocessor addins for eci to modify text.

inifilter [-ul] /filter:[filterNum] /path:[filterPath] /autoload:[y/n]
 /lang:[lang] /ECIINI:[IniPath]

        -u              Disable specified filter
        -l              Display statistics about specified filter
        filter          Filter number
        path            Fully qualified filename of filter
        autoload        Filter is automatically loaded when language
                        selected
                        Valid values are:
                                n   Filter is not automatically loaded
                                y   Filter is automatically loaded
        lang            Language/Dialect for the filter
                        Valid language/dialect values are:
                                 1.0 - US English
                                 1.1 - British English
                                 2.0 - Castilian Spanish
                                 2.1 - Mexican Spanish
                                 3.0 - Standard French
                                 3.1 - Canadian French
                                 4.0 - Standard German
                                 5.0 - Standard Italian
                                 6.0 - Mandarin Chinese
                                 6.1 - Taiwanese Chinese
                                 7.0 - Brazilian Portuguese
                                 8.0 - Standard Japanese
                                 9.0 - Standard Finnish
                                13.0 - Standard Norwegian
                                14.0 - Standard Swedish

        ECIINI          Path to ECIINI file (not used on Windows
                        platforms) ECIINI environment variable used 
                        on other platforms if ommitted

NOTE: If -u is specified, only the language, filter and INI file may be 
      specified.

To activate an installed filter, your application must send the new 
text annotation `faN, where N is the filter number.
To deactivate a filter, send the 'fdN text annotation, where N is the 
filter number.

Example:  

To register the French e-mail filter as the default filter, enter the following:
        
        inifilter /filter:0 /path:"C:\Program Files\ibmtts\FRAmfilt.dll" /lang:3.0

Note:

The e-mail filter can be registered with any filter number.  However, the filter number that
is chosen with the inifilter command is the filter number you must use to activate it.  For
example, if the e-mail filter is assigned a filter number of 10, you would use that filter number
for the filter annotations or APIs.


14.  KNOWN PROBLEMS & F.A.Q.
----------------------------
The following are known problems that are included in this release:

*  If you are upgrading from TTS version 4.7 to TTS Version 6.6.1.0, 
   you will need to remove TTS version 4.7 prior to installing TTS 
   Version 6.6.1.0. 

*  On Windows XP, and Windows 2000 non-administrator users may receive 
   error messages pertaining to the InstallShield engine not being able 
   to  register. You will need to have the proper access permissions to 
   properly install.

*  On Windows XP, and Windows 2000 you must have proper access 
   permissions to run the command line tools (inicache, inifilter,
   inivoice, and initrace). If you do not have the proper access 
   permissions, there is no error message, and your changes will not
   be made. 

*  Setting the pitch baseline after setting head size may return an 
   error in certain situations.

*  The installation copies a large amount of data from the installation 
   media. During the copy process, very little screen activity is 
   visible. 

*  If multiple versions of TTS are to be installed on the same system, 
   you should install all versions of TTS to the same directory.


F.A.Q
-----
Q: Why is my application still synthesizing with format synthesis.

A: When you install an 8KHz voice the system will produce concatenative 
   synthesis for any application which requests synthesis at 8KHz.  By
   default the system generates audio at 11KHz.  In order to produce 
   concatenative speech use eciSetParam to set the sample rate. Also,
   check that version 5.0 was not installed after version 6.6.1.0 if both 
   version reside on the same machine.


15.  DEVELOPER NOTES
--------------------
*  The Text-to-Speech SDK is a good starting point for developing 
   applications.

*  Concatenative Memory Manager (CMM) cmmcmd Utility
   A support utility called cmmcmd was created to interface with the
   Concatenative Memory Manager (CMM).
   Note : This is a support tool and was not intended to be an end 
   user utility.

   Invoke cmmcmd as follows:

   cmmcmd shutdown       -- shuts down the CMM 
   cmmcmd timeout ##     -- sets the CMM timeout to ## seconds

*  User Modifiable Prosody
   This release supports user modifiable prosody in concatenative 
   synthesis when using Generation 3 voices. Users control the prosody via the 
   same API calls and annotations implemented in earlier releases but which applied only 
   to formant synthesis.


16.  Memory and Performance Tools
---------------------------------
Due to the computational complexity and amount of memory required to
produce concatenative speech, IBM Text-to-Speech utilizes shared memory 
and speech caching to reduce the amount of system resources required.

*  The concatenative TTS engine requires more physical memory (to store
   the data required to produce natural speech synthesis) than formant 
   synthesis.  Since many processes on a server may require access to 
   the same data, IBM Text-to-Speech loads and shares one instance 
   between all the processes.  In addition, IBM Text-to-Speech allows 
   configuration of how long voice data will remain loaded after the 
   last access.  By default, each concatenative voice remains loaded 
   for 10 minutes.  To configure and stop sharing the memory the 
   Concatenative Memory Manager (CMM) utility, cmmcmd.exe, is provided:

   cmmcmd { shutdown | timeout [secs] }

      shutdown         - shut down the server immediately.

      timeout [secs]   - get/set the server time-out to the specified 
                         number of seconds.  If secs is 0 or omitted 
                         the current shut down time-out is returned.

*  The concatenative TTS engine requires more computational power than
   formant TTS engine.  Since the domains of many TTS applications are 
   limited to a small vocabulary, IBM Text-to-Speech now provides a 
   mechanism (speech caching) to bypass complex computations for text 
   which has already been processed.  The concatenative system can be
   configured, per language, to set a number of phrases 'to remember'
   as pre-synthesized phrases.  In addition, the memory can be made 
   persistent (that is, saved on exit and reloaded at voice 
   initialization). This mechanism can be beneficial in environments in 
   which the text that is synthesized is very repetitive.
   By default caching is turned on at installation and the location of the 
   cache file is \Program Files\viavoicetts\<language>\<voicename>\cache 
   on Windows.
   To enable and configure speech caching, the utility inicache.exe is provided:

Usage: inicache -d [-l <language>] [-r <sample rate>] [-v <voice>]
       inicache [-l <language>] [-r <sample rate>] [-p <path>]
       [-v <voice>] [-s <persistent>] [-h <phrases>] [-u <dual language>]
       [-x] [-i <ini>]

where:
  -d displays current values (may only specify language, sample rate, or voice)
  -l <language> is the language dialect:
     1.0 - US English (Default) |  6.0  - Simplified Chinese
     1.1 - British English      |  6.1  - Traditional Chinese
     2.0 - Castilian Spanish    |  7.0  - Brazilian Portuguese
     2.1 - Mexican Spanish      |  8.0  - Standard Japanese
     3.0 - Standard French      |  9.0  - Standard Finnish
     3.1 - Canadian French      |  13.0 - Standard Norwegian
     4.0 - Standard German      |  14.0 - Standard Swedish
     5.0 - Standard Italian     |  
  -r <sample rate> is the sample rate:
     0 = 08 KHz (Default)
     1 = 11 KHz
     2 = 22 KHz
  -p <path> is the path to the cache file.
  -v <voice> is the voice number. (Default: 1.)
  -s <persistent> is the cache file persistent (saved):
     0 = No
     1 = Yes (Default)
  -h <phrases> is the maximum number of phrases to cache.  (Default: 100.)
  -u <dual language> is the cache for the dual language (if applicable).
     0 = No (default)
     1 = Yes
  -x remove caching for specified language.
  -i <ini> is the path to ECIINI file (not used on Windows platforms)
  Examples:
    inicache -p "C:\Program Files\ViaVoiceTTS\voices\En_US\1at08KHz_1_0\cache"
    inicache -l 2.0 -v 2 -r 1
    inicache -h 5000 -u 1 -s 0
 

17.  Logging Utilities
----------------------
A new logging utility, inilog, is included in this release.
Inilog replaces initrace from earlier releases.

Logging is automatically turned on when the product is installed.
By default, this log is \Program Files\viavoicetts\tts.log on Windows.  

Users and developers have the ability to customize the format of the 
log or the location of the log via inilog.
This utility can be used to view the current settings and to modify them.

Invoke inilog as follows:
Usage: inilog -d
       inilog [-f <filter>] [-l <level>] [-p <path>] [-c <components>]
where:
  -d displays current settings (any other options will be ignored)
  -f <filter> is the filter mask:
     D = Data and Time (Default)  |  F = Function/Method name (Default)
     P = Process ID               |  I = Source file name
     T = Thread ID                |  L = Source Line number
     C = Component name (Default) |
  -l <level> is the logging level:
     0 = No logging               |  3 = Trace + Warning + Error
     1 = Error only (Default)     |  4 = Debug + Trace + Warning + Error
     2 = Warning + Error          |
  -p <path> is the path to the tts.log file. (Default: current working dir.)
  -c <components is a list of components and levels to be logged.
  Note: Paths that include spaces much be enclosed in double-quotes.
  Examples:
    inilog -p "C:\Program Files\ViaVoiceTTS\log"
    inilog -l 2 -f DCF
    inilog -c eci:4,cmm:2,tor:2
    
For example, inilog -f DC produces log files containing the date, 
time and component name.

inilog -c <component>:<loglevel>   -- to modify the log level for certain comonents.
The following are the valid component names:
	eci, torr, cmm, eti

Multiple components can be specified separated by a comma.
For example, inilog -c eci:4,cmm:2 modifies the log level of eci to 4 
and the log level of cmm to 2.

The default logging is level 1 with format DCF (date, time component and function name).
  

18.  TRADEMARK INFORMATION
--------------------------
IBM is a registered trademark or trademark of International Business
Machines Corporation in the United States and other countries.

Microsoft, Windows, Windows NT, Windows 95, Windows 98, Windows XP, 
and Windows 2000 logo are trademarks or registered trademarks of 
Microsoft Corporation in the United States and/or other countries.

All other names are registered trademarks, trademarks or service marks 
of their respective companies.


Doc Number: win32.readme.6.6.1.0.txt.021803
