+                            CHAPTER 6
                       MODIFYING THE VOICES
                                
This  chapter  shows how to change and create voices  spoken  by
DECtalk.  Changing  and  creating  voices  requires  a   certain
knowledge   of  acoustic  phonetics and the  human  voice.  This
chapter   necessarily  goes  into detail  on  speaker-definition
parameters.   Understanding those parameters  is  necessary  for
effective voice  modification and will make the task easier.

VOICE CHARACTERISTICS
The  DECtalk has a set of simple commands that you can   use  to
change the speaking rate, or to change the voice to one of  nine
different  voices  --  male,  female,  child,  or  a  developer-
definable   voice  -  as shown below. You can  use  other,  more
complex   commands to modify the characteristics of each  voice,
or  to  create   a  new  voice or special effects.  The  complex
commands  require  skill and experience to use effectively,  but
the   simple  commands  are   easy  to  use  in  normal  DECtalk
applications.

We  can  usually  tell whether the voice of  a stranger  at  the
other end of a telephone line is that of a man, woman, or child.
Slight differences  in voice quality are characteristic of these
different speakers.  For example, women's and children's  voices
are  usually higher  pitched than men's voices.  The size of the
head  and  length of the vocal tract  account for  some  of  the
differences. We also notice that some  people speak more quickly
or more distinctly than others.

Chapter  5 described ways in which  DECtalk pronunciation  could
be  modified.   This  chapter  shows how  voice  characteristics
themselves  can be changed by selecting the speaking rate,  sex,
and other voice parameters.

DECtalk   has   a   number  of  commands  which   modify   voice
characteristics.  Because  the  commands  are  entered    within
phonemic  brackets [ and ], you must have the [:phoneme  arpabet
speak] command set to ON.  This option is set OFF at power-up.
NOTE:   DECtalk  interprets   text between  square  brackets  as
phonemes only when the [:phoneme arpabet speak on] command  must
be  sent.  For  DECtalk  to interpret   the  [  and  ]  and  the
characters between them literally,  [:phoneme arpabet speak off]
command must be sent.

WARNING: If the command [:phoneme arpabet speak on]  is set  and
you  forget the final "]", DECtalk will try to  interpret  ASCII
text  as  phonemes,  skipping over illegal letter  combinations.
The  resulting text will appear to sound garbled.   To  recover,
close phonemic mode by typing "]".

The commands that modify the voice characteristics are
       1.      Speaking rate [:ra _]
       2.      Comma pause duration [:cp _]
       3.      Period pause duration [:pp _]
       4.      New voice [:n_]
       5.      Design voice [:dv _]
where the "_" represents a variable letter, value, or parameter.

Each  of the first three commands has a single, simple function.
The  fourth  (new  voice) command selects the  standard  DECtalk
voices. The fifth (design voice) command allows you to create  a
completely new voice.

SPEAKING RATE [:RA _]
The  default  speaking  rate  is 180  words  per  minute  (wpm).
Speaking   rate  values  have been calibrated  with  a  300-word
standard   paragraph using Fairbanks, G. Voice and  Articulation
Drill  Book.  Second   Edition. Harper and Row,  1960,  p.  114.
Speaking  rates can be adjusted to be very slow, very  fast,  or
anywhere in between by using the following commands:
[:ra  120]      This rate, 120 wpm is  ideal for  situations where 
material such as a phone number is  to  be copied by the listener.

NOTE: It may be frustrating to listen to extended speech at slow
rates unless the listener is actually copying down every word.

[:ra  160]      This rate is moderate, 160 wpm. This rate sounds
a little slow, but may be preferred in certain situations.

[:ra  180]      This rate is normal (moderately fast), 180  wpm.
It  is the default rate for DECtalk, and is ideal for  listening
to continuous text under optimal  conditions.

[:ra   240]        This  rate  is  faster,  240  wpm.  Practiced
listeners  can  skim material at this rate and  prefer  it  when
scanning  text for important sections. Inexperienced   listeners
may not understand every word at this  rate.

[:ra  350]      This rate is very fast,  350 wpm. It is too fast
for  many   people to follow, but it does have  applications  in
special circumstances.

[:ra  550]       This rate is the fastest,  550 wpm. It  is  too
fast  for many  people to follow, but it does have  applications
for  unsighted  individuals who wish to scan text quickly.  This
rate is 200 wpm faster than any previous version of DECtalk.

Any speaking rates between 75 and 600 are permitted in the [:ra
_]  command. Rates specified outside this range are  limited  to
the  nearest legal value.

Changes  in  speaking rate influence the duration and especially
the   number  of  pauses in text, as well  as  the  duration  of
individual   phonemes. At rates below 140 wpm,  DECtalk  inserts
pauses  at  all  phrase boundaries and pauses and phonemes  near
the  ends   of  phrases are lengthened considerably..  At  rates
faster than 240 wpm, DECtalk deletes  the comma pause, and other
pauses  and  phonemes  are shortened.  (Near  the  beginning  of
phrases,   phonemes  are  fairly short at  both  slow  and  fast
speaking rates.)

PAUSE DURATIONS [:PP _] AND [:CP _]
At  the  normal  speaking rate of 180 words per minute,  DECtalk
pauses about half a second after a period in the text and  about
a   sixth  of a second after a comma. These pause durations  are
adjusted appropriately when you change the speaking rate.

Speech Command Parameters
CommandMinimum Maximum Unit per Parameter
:ra              75            650            Words per minute
:cp            -40             30000          Milliseconds
:pp            -380            30000          Milliseconds
:n_            NA              NA             pbhfkrudwv
:dv            --              --             See Appendix D

In  some  situations,  you might like a  pause  after  a  period
without  changing the speaking rate. For example, to get DECtalk
to  read a  list of words at a normal rate with 5-second  pauses
after each word (to allow the listener to write them down),  you
could use one  of the following commands and end each word  with
a  comma   (continuation rise intonation) or a  period  (falling
intonation).

[:pp  4500]      Add a period pause of 4500 ms (4.5 seconds)  to
the standard half-second pause that occurs after a period in the
text. The total pause between words will be about 5  seconds.

[:cp  4800]      Add a comma pause of 4800 ms (4.8  seconds)  to
the  standard sixth of a second pause that occurs after a  comma
in  the   text at normal speaking rate. The total pause  between
words separated by a comma will last about 5  seconds.

[:pp 0 :cp 0]  Reset the period pause and comma pause to  their
normal default values.

The permitted range for a period pause is from -380 to 30000 ms.
A  negative  value  shortens  the  standard  period  pause.  The
permitted   range  for a comma pause is from -40  to  30000  ms.
Values  specified   outside this range will be  limited  to  the
nearest legal value.

SELECTING A STANDARD VOICE [:N_]
DECtalk  has  nine  built-in  voices  and  one  voice  that   is
definable.  You  can refer to each voice by  the  command  [:n_]
where   "_" is a letter representing one of the DECtalk  voices.
The values of n are p=paul, h = harry, f = frank, d = dennis,  b
= betty, u = ursula, r = rita, w = wendy, k = kit and v = val.
You  can  change voices with the new voice command  as  in  this
example:

       [:nb] Hello. I'm Betty.

You can also change voices in the middle of a sentence:

       [:np] This is a demo [:nb] of a sudden change in voice.

If  a  voice change request occurs in the middle of a  sentence,
DECtalk will automatically pause very slightly. The pause is the
equivalent    of  inserting  a  comma  before  the  mid-sentence
command.  For example,  you could type the previous sentence  as
follows:

       [:np] This is a demo, [:nb] of a sudden change in voice.

Such   a  pause  in  DECtalk,  however,  is  barely  noticeable.
Nevertheless,  it  is  good practice to always  end  a  sentence
(insert  a  period)   before changing voices.  This  allows  the
listener to prepare for a  new speaker.

DESIGNING A VOICE

PARAMETERS [:DV _]
The  DECtalk  voices  provide  an adequate  selection  for  most
developer's   applications.  However,  if  you  have  a  special
application  requiring   a monotone or unusual  voice,  you  can
modify  the  parameters defined in this  section on a trial-and-
error basis to get the desired voice.

The  nine built-in voices of DECtalk are distinguished from  one
another by a large set of speaker-definition parameters.

Speakers  can  differ in sex, age, head size and  shape,  larynx
size   and  behavior,  pitch  range, pitch  and  timing  habits,
dialect, and  emotional state. DECtalk cannot approximate all of
these  options.  Therefore, the space of distinguishable  voices
is  quite  limited,   even  though  DECtalk  has  many  speaker-
definition parameters that  can be modified.

The  design  voice  [:dv  _]  command  introduces  the  speaker-
definition  parameters that can be entered as a string or one at
a time.

The following sections discuss speech production, acoustics, and
perception.  Some  of  the information is relatively  technical,
but  the examples should make it possible for all developers  to
effectively modify any parameter and listen to the results.

CHANGING SEX AND HEAD SIZE
Six speaker-definition parameters control the size and shape  of
the head. These parameters are listed below and are described in
the chapter on modifying voices.

       sx      Sex 1(male) or 0 (female)
       hs      Head size, in percent
       f4      Fourth formant resonance frequency, in Hz
       f5      Fifth formant resonance frequency, in Hz
       b4      Fourth formant bandwidth, in Hz
       b5      Fifth formant bandwidth, in Hz

     Sex, sx
         Male and female voices have many differences, including
head  size,  pharynx length, larynx mass, and  speaking   habits
such  as degree of breathiness, liveliness of pitch, choice   of
articulatory target values, and speed of articulation.  Some  of
these  differences are under the control of a single  parameter,
sx,   the  sex of the speaker. Speakers Paul, Harry, Frank,  and
Dennis   are male (sx = 1), while speakers Betty, Rita,  Ursula,
Wendy, and  Kit are female (sx = 0). Actually, Kit
the Kid can be male or  female because children younger than  10
years old have similar  voices for both sexes.

Changing  the sx parameter causes DECtalk to access a  different
(male or female) table of target values for formant frequencies,
bandwidths,  and source amplitudes. The male and  female  tables
are   patterned  after two individuals who were judged  to  have
pleasant,   intelligible voices. DECtalk's built-in  voices  are
only  scaled  transformations of Paul and Betty, the  two  basic
voices.

You  can change the sex of any of DECtalk's voices by making the
voice  current and then modifying the sx parameter. For example,
the   following  command  gives  Paul  some  of   the   speaking
characteristics  of a woman. (The sx parameter does  not  change
the  average pitch or breathiness, so a peculiar combination  of
simultaneous  male  and  female  traits  results  from  this  sx
change.)

       [:np :dv sx 0] Am I a man or woman?

The  sx  parameter can also be specified as  m  or  f  with  the
commands  [:dv sx m] or [:dv sx f].

WARNING:  If you change the sex of the voice, some phonemes  may
cause  DECtalk's filters to overload, producing a  squawk.  (The
squawk  is  unpleasant,  but it will not  damage  DECtalk.)  The
modification  of  certain parameters such  as  F4,  F5,  and  G1
(explained below) can help to to correct this problem.

      Head Size, hs
         Head size is specified as the average size for an adult
man  (if sx = 1) or an adult woman (if sx = 0). A  head size  of
100  percent is normal or average for a given sex,  but   people
can differ quite a bit in this characteristic. Head size  has  a
strong influence on a person's voice. Large musical  instruments
produce  low  notes, and humans with large heads tend  to   have
low,  resonant voices. For example, to make Paul  sound  like  a
larger  man  with a 15 percent longer vocal tract  (and  formant
frequencies  that  are  scaled down by a factor  of  about  0.85
percent), type the following command.

        [:np  :dv  hs 115] Do I sound more like Huge Harry  this
way?

Head  size  is one of the best variables to use if you  want  to
make   dramatic voice changes. For example, Paul has a head size
of  100,  while Harry's deep voice is caused in part by  a  head
size  change   to  115,  or  15  percent  greater  than  normal.
Decreasing  head size  produces a higher voice,  such  as  in  a
child  or adolescent.  Extreme changes in head size, as  in  thefollowing examples, are  somewhat difficult to understand.

       [:nh :dv hs 135] Do I have a swelled head?
       [:nk] I am about 10 years old.
       [:nk :dv hs 65] Do I sound like a six year old?

WARNING:  Extreme changes in head size can cause  overloads,  as
well   as   difficulties  in  understanding  the  speech.    The
modification  of certain parameters such as   F4,  F5,   and  G1
(explained below) can help to correct this problem.

      Higher Formants, f4, f5, b4, and b5
 A male voice typically has five prominent resonant peaks in the
spectrum  (over   the  range from 0 to 5 kHz),  a  female  voice
typically  has only four  (due to a smaller head  size),  and  a
child  has  three. If fourth and fifth formant resonances  exist
for  a  particular  voice,  they are   fixed  in  frequency  and
bandwidth  characteristics. These  characteristics are specified
by  the  parameters f4, f5, b4, and  b5, in Hz. Values for  each
predefined voice are given below.

If  a higher formant does not exist, the frequency and bandwidth
of   the speaker definition are set to special values that cause
the   resonance to disappear. To make a resonance disappear, the
frequency is set to 2500 Hz, and the bandwidth to 2048 Hz.  This
is   what has been done to the fourth and fifth formants for Kit
the  Kid.

The  permitted  values  for f4 and f5  have  fairly  complicated
restrictions.  Violating these restrictions can cause  overloads
and   squawks. The restrictions are listed below for cases where
a  higher formant exists.

       1.      F5 must be at least 300 Hz higher than f4.
       2.      If sx is 1 (male), f4 must be at least 3250 Hz.
        3.       If  sx is 0 (female), f4 must be at least  3700
Hz.
        4.       If  hs  is not 100, the above values should  be
multiplied by (hs/100).

These  higher formants produce peaks in the spectrum that become
more  prominent if b4 and b5 are smaller, and if f4 and  f5  are
closer  together. The limits placed on b4 and b5  should  ensure
that  no problems occur. However, smaller values for  bandwidths
may  produce  an overload in the synthesizer. You  can   correct
these overloads by increasing the bandwidths or by  changing the
gain control g1  (below).

     CHANGING VOICE QUALITY
Six  speaker-definition parameters control aspects of the output
of   the  larynx,  which, in turn, control voice quality.  These
parameters are listed below.

       br      Breathiness, in decibels (dB)
       lx      Lax breathiness, in percent
       sm      Smoothness, in percent
       ri      Richness, in percent
       nf      Number of fixed samples of open glottis
       la      Laryngealization, in percent

      Breathiness, br
Some  voices  can be characterized as breathy. The  vocal  folds
vibrate  to  generate voicing and breath  noise  simultaneously.
Breathiness is a characteristic of many  female voices,  but  it
is also common under certain circumstances  for male voices.

The  range of the br parameter is from 0 dB (no breathiness)  to
70   dB  (strong breathiness). By experimenting, you  can  learn
what   intermediate values sound like.  For   example,  to  turn
Paul  into  a  breathy, whispering speaker, type the   following
command.

        [:np :dv br 55 gv 56] Do I sound more like Doctor Dennis
now?

This  voice is not as loud as the others due to the simultaneous
decrease in the gain of voicing, gv, but it is intelligible  and
human sounding.

       Lax Breathiness, lx
The  br  parameter  creates  simultaneous  breathiness  whenever
voicing  is turned on.  Another type of breathiness occurs  only
at  the  ends  of  sentences   and when  going  from  voiced  to
voiceless  sounds. This type of "lax"  breathiness is controlled
by the lx parameter in percent.

A  non-breathy,  tense voice would have lx set  to  0,  while  a
maximally  breathy,  lax voice would have lx  set  to  100.  The
difference  between these two voices is not great, but  you  can
hear  it if you listen closely.

      Smoothness, sm
Smoothness refers to vocal fold vibrations. The vocal folds meet
at  the mid-line, as they do in  normal voicing, but they do not
slam  together forcefully to create a very sudden  cessation  of
airflow.

DECtalk uses a variable-cutoff, gradual low-pass filter to model
changes to smoothness. The range of sm is from 0 percent  (least
smooth and most brilliant)
to  100  percent (most smooth and least  brilliant). The voicing
source  spectrum is tilted so that energy  at higher frequencies
is  attenuated by as much as 30 dB when  smoothness is set to  a
maximum, but is not attenuated at all when  smoothness is set to
0.

Professional  singing voices that are trained to sing  above  an
orchestra  are usually brilliant, while anyone who talks  softly
becomes breathy and smooth. To synthesize a breathy voice, an sm
value  of about 50 or more is good. Changes to sm do not have  a
great effect on perceived voice quality.

      Richness, ri
Richness  is  similar to smoothness and brilliance, except  that
the spectral change occurs at lower  frequencies, and is due  to
a  different  physiological mechanism.  Brilliant,  rich  voices
carry  well  and  are  more intelligible in noisy  environments,
while  smooth  soft  voices sound more friendly.   For  example,
typing the following command produces a soft, smooth  version of
Paul's voice.

       [:np :dv ri 0 sm 70] Do I sound more mellow?

The  following  command produces a maximally rich and  brilliant
(forceful) voice.

       [:np :dv ri 90 sm 0] Do I sound more forceful?

Smoothness and richness are usually negatively correlated when a
speaker  dynamically changes laryngeal output.  The  sm  and  ri
parameters do not influence the speaker's identity very much.

      Nopen Fixed, nf
The  number of samples in the open part of the glottal cycle  is
determined  not only by ri, but also by a second parameter,  nf.
nf  is  the number of fixed samples in the  open portion of  the
glottal cycle.

Most speakers adjust the open phase to be a certain fraction  of
the  period,  and  this  fraction is  determined  by  ri.  Other
speakers  keep the open phase fixed in duration when the overall
period   varies. To simulate this behavior, set ri  to  100  and
adjust  nf  to   the  desired duration of the  open  phase.  The
shortest  possible open  phase is 10 (1 ms), and the longest  is
three  quarters of the  period duration (about  70  for  a  male
voice).

      Laryngealization, la
Many  speakers  turn  voicing  on and  off  irregularly  at  the
beginnings and ends of sentences, which  gives a querulous  tone
to the voice. This departure from perfect  periodicity is called
laryngealization or creaky voice quality.
The  la parameter controls the amount of laryngealization in the
voice. A value of
0  results in no laryngealized irregularity, and  a value of 100
(the  maximum)  produces laryngealization  at  all   times.  For
example,  to  make  Betty  moderately  laryngealized,  type  the
following command.

       [:nb :dv la 20]

The  la  parameter creates a noticeable difference in the voice,
although it is not altogether a pleasant change.

CHANGING THE PITCH AND INTONATION OF THE VOICE
Seven  speaker-definition  parameters  control  aspects  of  the
fundamental   frequency  (f0)  contour  of  the   voice.   These
parameters  are listed below and are described in the chapter on
modifying voices.

       bf      Baseline fall, in Hz
       hr      Hat rise, in Hz
       sr      Stress rise, in Hz
       as      Assertiveness, in percent
       qu      Quickness, in percent
       ap      Average pitch, in Hz
       pr      Pitch range, in percent

      Baseline Fall, bf
The bf parameter (baseline fall in Hz) determines one aspect  of
the dynamic fundamental frequency contour for a sentence. If  bf
is  0,  the  reference  baseline   fundamental  frequency  of  a
sentence begins at 115 Hz and ends at  this frequency. All rule-
governed dynamic swings in f0 are  computed with respect to  the
reference baseline.

Some  speakers  begin a sentence at a higher f0,  and  gradually
fall   as  the  sentence  progresses.  This  "falling  baseline"
behavior can  be simulated by setting bf to the desired fall  in
Hz.  For example,  setting bf to 20 Hz will cause the f0 pattern
for a sentence to  begin at 125 Hz (115 Hz plus half of bf), and
fall at a rate of 16  Hz per second until it reaches 105 Hz (115
Hz  minus half of bf).  The baseline remains at this lower value
until  it  is reset  automatically before the beginning  of  the
next  full  sentence  (right after a period, question  mark,  or
exclamation  point). The  rate of fall, 16  Hz  per  second,  is
fixed, no matter what the  extent of the fall.

Whenever  you include a [+] phoneme in the text to indicate  the
beginning of a paragraph, the baseline is automatically  set  to
begin  slightly higher for the first sentence of the  paragraph.
The   following  sentences of a paragraph are all  identical  in
having a  normal baseline fall.

While baseline fall differs among the speakers, it is not a very
good  cue for differentiating between speakers. As long  as  the
fall   is  not  excessive,  its  presence  or  absence  is   not
particularly  noticeable.

      Hat Rise, hr
The  hr  (nominal hat rise in Hz) and sr (nominal stress impulse
rise  in  Hz)  parameters  determine  aspects   of  the  dynamic
fundamental  frequency contour for a sentence. To  modify  these
values selectively, you should understand how the f0  contour is
computed  as a function of lexical stress pattern and  syntactic
structure of the sentence.

A  sentence  is  first  analyzed and broken  into  clauses  with
punctuation  and  clause-introducing  words  to  determine   the
locations  of  clause  boundaries. Within each  clause,  the  f0
contour  rises on the first stressed syllable, stays at  a  high
level  for  the remainder of the clause up to the last  stressed
syllable, and  falls dramatically on the last stressed syllable.
This  rise-at-the-beginning and fall-at-the-end pattern has been
called   the  "hat pattern" by linguists, using the  analogy  of
jumping from  the brim of a hat to the top of the hat, back down
again.

The  hat  rise  parameter, hr, indicates the nominal  height  in
hertz   of  a pitch rise to a plateau on the first stress  of  a
phrase.  A   corresponding pitch fall is placed by rule  on  the
last  stress of  the phrase. Some speakers use relatively  large
hat  rises  and   falls, while others use a local "impulse-like"
rise  and fall on  each stressed syllable. The default hr  value
for  Paul  is  22  Hz, indicating that the f0  contour  rises  a
nominal 22 Hz when going from the brim to the top of the hat. To
simulate  a speaker who does not use hat rises and falls,  enter
the  command [:dv hr 0].

Other  aspects  of  the  hat pattern are important  for  natural
intonation   but   are   not  accessible  by  speaker-definition
commands.   For  example, the hat fall  becomes  a  weaker  fall
followed by a  slight continuation rise if the clause is  to  be
succeeded  by  more   clauses in the  same  sentence.  Also,  if
unstressed  syllables follow  the last stressed  syllable  in  a
clause,  part  of  the  hat  fall   occurs  on  the  very   last
(unstressed)  syllable of the clause. If  the  clause  is  long,
DECtalk  may  break  it into two hat patterns  by   finding  the
boundary between the noun phrase and verb phrase.

If  DECtalk is in phoneme input mode and you use the pitch  rise
[/]   and  pitch fall [\] symbols, the hr parameter   determines
the actual rise and fall in Hz.

      Stress Rise, sr
The sr parameter indicates the nominal height, in Hz, of a local
pitch  rise and fall on each stressed  syllable. This  rise-fall
is added to any hat rise or fall that may  also be present.  For
example, Paul has pr set to 32 Hz, resulting in an f0 rise-fall
gesture  of 32 Hz over a  span of about 150 ms, which is located
on  the  first  and  succeeding   stressed  syllables.  However,
DECtalk  rules  reduce the actual  height of  successive  stress
rises and falls in each clause, and  cause the last stress pulse
to occur early so that there is time for the hat fall during the
vowel.

If  the  sr parameter is set too low, the speech sounds monotone
within  long  phrases. Great changes to hr  and  sr  from  their
default  values for each speaker are not necessary or desirable,
except in  unusual circumstances.

     Assertiveness, as
Assertive  voices have a dramatic fall in pitch at  the  end  of
utterances. Neutral or meek speakers  often end a sentence  with
a  slight "questioning" rise in pitch to  deflect any challenges
to  their  assertions. The as parameter, in  percent,  indicates
the  degree to which the voice tends to end  statements  with  a
conclusive final fall. A value of 100 is very  assertive,  while
a value of 0 is maximally meek.

      Quickness, qu
The qu parameter, in percent, controls  the speed of response to
a  request  to change the pitch. All hat  rises, hat falls,  and
stress rises can be thought of as suddenly  applied commands  to
change the pitch, but the larynx is sluggish,  and responds only
gradually to each command. A smaller larynx  typically  responds
more  quickly, so while Harry has a quickness  value of 10,  Kit
has a value of 50.

In  engineering  terms, a value of 10 implies  a  time  constant
(time   to get to 70 percent of a suddenly applied step  target)
of  about  100 ms. A value of 90 percent corresponds to  a  time
constant  of  about 50 ms. Lower quickness values may mean  that
the  f0  never   quite  reaches the target value  before  a  new
command comes along to  change the target, but this is perfectly
natural.

      Average Pitch, ap, and Pitch Range, pr
The  ap (average pitch in Hz) and pr (pitch range in percent  of
normal   range)   parameters  modify  the  computed  values   of
fundamental frequency,  f0, according to the formula:

       f0' = ap + (((f0 - 120) * pr) / 100)

If  ap is set to 120 Hz and pr to 100 percent, there will be  no
change to the "normal" f0 contour that is computed for a typical
male  voice.  The  effect  of  a  change  in  ap  is  simply  to
independently  raise  or lower the entire  pitch  contour  by  a
constant  number of Hz, while the effect of pr is to  expand  or
contract the swings in pitch about 120 Hz.

Normally,  a  smaller larynx simultaneously produces  f0  values
that   are higher in average pitch and higher in pitch range  by
about the  same factor (the whole f0 contour is multiplied by  a
constant   factor). Observing the values assigned to ap  and  pr
for  each  of   the voices (Appendix D), you can  see  that  the
voices  rank  in  average pitch from low (Harry) to high  (Kit).

Rankings for pr are  similar, except that Frank has a flat, non-
expressive pitch range  compared with his average pitch.
The best way to determine a good pitch range for a new voice  is
by   trial  and  error. You can create a monotone or  robot-like
voice  by   setting the pitch range to 0. For example,  to  make
Harry  speak in  a monotone at exactly 90 Hz, type the following
command.

        [:nh :dv ap 90 pr 0] I am a robot.

Reducing  the  pitch range reduces the dynamics  of  the  voice,
producing  emotions such as sadness. Increasing the pitch  range
while  leaving the average pitch the same or setting it slightly
higher suggests excitement.

Due  to  constraints involved in pitch-synchronous  updating  of
other    dynamically   changing  parameters,   the   fundamental
frequency contour  that is computed by the above formula is then
checked for values  that are out of bounds with respect  to  the
following limits.

       f0 maximum = 500 Hz
       f0 minimum = 50 Hz

Any  value  outside  this range is limited to  fall  within  the
range.

To  keep  you from exceeding reasonable limits on the parameters
controlling  pitch,  constraints have  been  placed   on  values
selected.  If  a [:dv _] command requests values outside   these
limits,  the  request  is limited to the  nearest  listed  value
before execution.

     CHANGING RELATIVE GAINS AND AVOIDING OVERLOADS
Eight speaker-definition parameters control the output levels of
various internal resonators. These parameters are listed below.

       gv      Gain of voicing source, in dB
       gh      Gain of aspiration source, in dB
       gf      Gain of frication source, in dB
       gn      Gain of nasalization, in dB
       g1      Gain of cascade formant resonator 1, in dB
       g2      Gain of cascade formant resonator 2, in dB
       g3      Gain of cascade formant resonator 3, in dB
       g4      Gain of cascade formant resonator 4, in dB                             
       g5      Loudness of the voice, in dB Loudness g5

Each  predefined voice has been adjusted to have about the  same
perceived loudness, a value that is about  optimum for telephone
conversation.  The  value chosen is near  maximum  (if  loudness
were  increased  much, some phonemes would   probably  cause  an
overload squawk). A near maximum value was  selected to maximize
the signal-to-noise level of DECtalk.

If  you  want  to decrease the loudness of a voice,  or  make  a
temporary  increase for a phrase that is known not to  overload,
determine  the  g5 value in dB for the voice in  question   Then
adjust the voice by using the  following command.

        [:np  :dv g5 76]. I am speaking at about half my  normal
level.

Because  the  g5  entry  for Paul is 86, this  command   reduces
loudness by 10 dB. Perceived loudness approximately  doubles (or
halves) for each 10 dB increment (decrement) in g5.

Software  control  over  loudness is  useful  in  a  loudspeaker
application where the background noise level in the  room  might
change.  For  example,  a  vocally handicapped  wheelchair-bound
person   does  not  want to appear to be  shouting  in  a  quiet
interpersonal  conversation, but may wish to be able to converse
in  a  noisy  room   as  well.  Using  a  software  abbreviation
facility,  such  a  person  could type lo to  select  a  command
making  the voice maximally loud,  and soft to invoke  a  command
setting g5 to a reduced value.

Note:  DECtalk  comes  with both software  and  hardware  volume
control so that the modification of the g5 parameter should  not
be  necessary.  Using the [:volume ...] command  or  the  volume
control knob on the external loudspeaker is recommended.

     Sound Source Gains, gv, gh, gf, and gn
Several  types  of  sound  sources are activated  during  speech
production:  voicing,   aspiration, and  frication  as  well  as
nasalization.  The relative output levels of these   sounds,  in
dB,  are  determined  by  the gv, gh,  gf,  and  gn   parameters
respectively.  The default settings  for  these parameters  have
been  factory pre-set to maximize the  intelligibility  of  each
voice.  However,  changing  the  settings  can   be  useful   in
debugging  the  system  or  in  demonstrating  aspects  of   the
acoustic  theory  of  speech production. You  could  change  the
level  of  one  sound  source globally. For  example,  turn  off
frication  to  be able to hear just the output  of  the  larynx.

These   parameters might have to be reduced to overcome  certain
kinds  of  overloads, but try the procedure in the next  section
first.

     Cascade Vocal Tract Gains, g1, g2, g3, and g4
Changes  in head size or other parameters can sometimes  produce
overloads   in  the synthesizer circuits. If this occurs,  first
check  to  see  that f4 and f5 are set to reasonable values.  If
the  squawk   remains, you can adjust several gain controls,  g1
through  g4  in  the  cascade  of  formant  resonators  of   the
synthesizer  to  attenuate the signal at critical points.  These
voice levels can then be amplified back to desired output levels
later in the synthesis.

Use  the  following procedure to correct an overload  (typically
indicated by a squawk during part of a word).

1.     Synthesize the word or phrase several times to make sure
        the  squawk occurs consistently. Use the same test  word
        each time one of the following changes to a gain is made.

2.      Determine the default values for g1 through g4  for  the
        voice that overloads.

3.      Reduce g1 by increments of 3 until the squawk goes away.

When  the squawk goes away, note the reduction  that was needed.

If  more  than  a  10  dB  decrement is   required,  some  other
parameter  has  probably been changed  too much. If  the  squawk
does not go away at all, then you  may need to reduce gv instead
of g1.

4.      Increment g5 by the g1 decrement to return the output to
its original level. For example, if g1 was reduced by 6 dB,  add
6 dB to g5 (or g4 if g5 is already maximum). If  incrementing g5
causes the squawk to return, decrease g5 slowly until the squawk
goes away.

This procedure works in most cases, but using g2 rather than  g1
can  work  better. If you can return g1 to its  factory  pre-set
value  and  reduce g2 instead to make the squawk go  away,  then
the  signal-to-quantization-noise level in g1 remains maximized.
If  you fix the squawk by using g3 or g4 rather than g2, more of
the cascaded resonator system can be made immune to quantization
noise accumulation.

     THE [SAVE ] PARAMETER AND [:NV] VOICE
You  can  save  a modified speaker definition in a buffer  while
synthesizing  speech with one of the other voices. The  Variable
Val   voice  [:nv] is either male or female, depending  on  what
values  are stored in the buffer. If you call Val before storing
any  values in  the buffer, DECtalk uses the Perfect Paul  voice
[:np].  The  following commands store a modified Betty voice  in
Val and then  recall it.

       [:nb :dv sex m save ]
       (Store the modified Betty voice in Val.)
       [:np] I am Paul.
       (Use another voice.)
       [:nv] I am Val.
       (Recall the Val voice.)

The  buffer holds its contents until you power down DECtalk. You
must re-enter new voice characteristics if you turn off DECtalk.

Note: If you wish to use the save command, leave a space between
the command and the trailing bracket, e.g., [:dv save ]

     SUMMARY OF SPEAKER-DEFINITION PARAMETERS
Of  the 27 parameters, only a few cause dramatic changes in  the
voice. The greatest effects are obtained with changes to hs, ap,
pr,  and sx, while moderate changes occur when modifying la  and
br.   To some extent, DECtalk's nine factory-set speakers  cover
most of  the possible voices, so don't expect to be able to find
a  voice   that  is highly novel and intelligible. However,  you
might  easily  find ways to slightly improve one of the standard
voices.

VOICE COMMAND SYNTAX
DECtalk uses the following voice command syntax:
       1. Begin every command with a bracket and colon ([:).
       2. Separate each command and its parameter(s) from  the text
          by a valid word boundary marker such as a space, tab, or
          carriage return.
       3. You can include several commands in the same square-
          bracket set.
               [:ra 150 :nb] Hello. How are you?
       4. You can include several parameters in the same square
          bracket set if the command allows more than one  parameter. 
          If you  use several parameters,  you  must give them all  
          before  a  second command in the same square  bracket set.

               [:dv ap 160 pr 50 save :nv] Hi there.
          (The  parameter  group  modifies  the  [:dv  _] command.)
               [:dv ap 160 save  :nv pr 50] Hi there.
               (Wrong. The parameter group is out of place.)
        5.  If  you give two conflicting parameters or commands,
            DECtalk will use the last command in the sequence. For  
            example,  if you type:
               [:nb :np] Hello.
            DECtalk will use Paul's voice.
        6.  You  can  use  phonemic symbols in the  same  square
            brackets with voice commands.
               Now I'm [:dv ap 90 pr 130 r"iyliy] thrilled!
       7.  If the value in a [:dv _] command is too low, DECtalk 
           will use the minimum valid value. If the value is too  
           high, the maximum valid value will be used.
       8. Once you give a command, that command applies to all
          further text until overridden by another command. For  
          example, the command [:nk] will make DECtalk use Kit's 
          voice until you enter another new voice command.
       9. All  [:dv _] commands are lost when you power  down
          DECtalk.
       10. Invalid commands are ignored. Set the [:error ...]
           command to receive an audible warning that an invalid 
           command has been entered.

TEXT TUNING EXAMPLE
The  following  is  an  example of  how  to  tune  text.  Speech
synthesis technology allows for more natural text-to-speech with
each  passing year. However, there are still areas in the speech
which  can  be  "tuned-up" for more naturalness.  Much  of  this
involves  the  strategic  placing of commas  and  periods  which
essentially  tell  the DECtalk to pause as a native  speaker  of
English  would when speaking the same text because written  text
often lacks infomration about pauses that are nor mal  in  speaking. 
The text below is presented twice, the  first time  as  originally 
written, and the second time after phonemic and textual  fixes 
were applied.

Original Version
[:np]
     A California Shaggy Bear Tale for Seven DECtalk Voices
                         by Dennis Klatt
[:np]  Once upon a time, there were three bears. They  lived  in
the  great forest, and tried to adjust to modern times
[:nh] I'm papa bear. I love my family but I love honey best.
[:nb] I'm mama bear. Being a mama bear is a drag.
[:nk]  I'm baby bear and I have trouble relating to all  of  the
demands of older bears.
[:np]  One day, the three bears left their condominium to search
for  honey.  While they were gone, a beautiful young lady  snuck
into  the bedroom through an open window.
[:nw]  My name is Whispering Wendy. My purpose in entering  this
building  should  be clear. I am planning to  steal  the  family
jewels.
[:np]  Hot  on her trail was the famous police detective,  Frail
Frank.
[:nf]  Have  you  seen a lady carrying a laundry  bag  over  her
shoulder?
[:np] A woman kneeling with her left ear firmly placed against a
large rock responded.
[:nu]  No.  No  one  passed this way. I've  been  listening  for
earthquakes  all  morning,  but have only  spotted  three  bears
searching for honey.


Changed Version
[:np]
(Add periods after the title and author.)
A California Shaggy Bear Tale for Seven DECtalk Voices.
By Dennis Klatt.
(Make phonemic corrections.)
[:nh]  This  story was used to demonstrate DECtalk at ['aykaesp]
84,  in May of 1984, at San Diego California.
[:np]  Once upon a time, there were three bears. They  lived  in
the  great forest and tried to adjust to modern times.
(Add commas and emphatic stress.)
[:nh] I'm papa bear. I love my family, but I love ["]honey best.
[:nb] I'm mama bear. Being a mama bear is a drag.
[:nk]  I'm baby bear and I have trouble relating to all  of  the
demands of older bears.
(Begin a verb phrase.)
[:np]  One  day,  the three bears [)] left their condominium  to
search  for honey. While they were gone, a beautiful young  lady
snuck into the bedroom through an open window.
[:nw]  My name is Whispering Wendy. My purpose in entering  this
building  should  be clear. I am planning to  steal  the  family
jewels.
(Begin a new paragraph.)
[:np]  [+]  Hot  on  her trail was the famous police  detective,
Frail  Frank.
[:nf]  Have  you  seen a lady carrying a laundry  bag  over  her
shoulder?
(Add commas for phrasing.)
[:np]  A woman, kneeling with her left ear firmly placed against
a  large rock, responded.
(Add pitch control and emphatic stress.)
[:nu]  ["]No. No [/]one passed this [/\]way. I've been listening
for  ["]earthquakes  all morning, but have  only  spotted  three
bears  searching for honey.
                                
                                
End of Chapter 6.                                
                                
        
