Mikropuhe Linux version 5.x 15th January 2004 (partial instructions - Finnish version is complete)


FILES

(File type after period is important one - shared library version uses same files with different name, e.g. libmplinux.ini.)

mplinux
The synthesizer if used as named pipe. Start with command:
mplinux &

mplinux.ini
Settings.

mplinux.tfg
Low level settings for text interpreter.

normaali.tul
Default text interpretation file. Files from Windows version work as well.

*.pu5
Voice files. Files from Windows version work as well.


GENERAL

All speech parameters are controlled with <tags>. The parameters set synthesizer's state permanently if parameter's end tag is not used. Settings are not automatically saved to settings (.ini) file.

If speech is interrupted, settings are not saved. Instead synthesizer returns to state it was before interrupted speak-request.

All LF-characters (ascii 10) are converted into spaces (32).

This document contains instructions only for shared library interfaces. Only recommended SAPI5/VOICEXML tagging is described (Mikropuhe internal tagging may be used but is not recommended).


SAPI5/VOICEXML TAGS

As in XML you need to convert a few characters into entities:

&	&amp;
\	&apos;
>	&gt;
<	&lt;
"	&quote;

Special characters  can be used straight or converted into entities: For example:

1&lt;2	(1<2)
&#228;	()

According to XML UPPERCASE/lowercase matters for tag names. Anyway Mikropuhe does not currently make any difference for the case.

Tgas can affect from now on (empty xml <tag/> or to given text <fast>1 2 3</fast>. For example using imaginary tags:
<shout/>Shouting from here till the end
<shout/>Shouting from here <whisper>whispering for change</whisper> till the end

LIST OF SUPPERTED TAGS

<reset/>
Runs line beginning with "asetus=" from initial setting file (*.ini). This restores the synthesizer to the initial state (supposing that the setting string (asetus=xxx) sets all changed synthesizer settings). Also flushes all cahced text interpretation files (*.tul). 

<break time="xxx"/>
Empty tag. Keeps pause. Its length depends on speech rate. Attribute time="xxx" may contain one of following values:
<break time="none"/>	No pause but affects synthesizer by breaking continuity of the voice.
<break time="x-small"/>	Very short pause, about 1/5 of the pause between sentences.
<break time="small"/>	Short pause, half of the pause between sentences.
<break time="medium"/>	Medium pause, same as the pause between sentences.
<break time="large"/>	Long pause, double the pause between sentences.
<break time="x-large"/>	Very long pause, four times the pause between sentences..

If time attribute is invalid, medium setting is used. Mikropuhe does not support numeric value defined in VoiceXML/SSML.

<audio src="file.wav"/>
<audio src="file.wav">Text to speak if file not found</audio>
Plays given wav-file in middle of synthesized text. Converts sample size and sample frequency suitable to current used audio format. If file name does not contain folder (characters / and \), files is loaded from Mikropuhe folder.

<beep freq="frequency" length="milliseconds" volume="percent"/>
Empty tag. 100% volume is at full volume.

<emph>words to emphasize</emph>
Emphasizes every word. (Not very strong)

<bookmark mark="numero"/>
Not used in Linux version.

<silence msec="500"/>
Empty tag. Adds given amount of silence (use rather break-tag, because it depends on speech rate).

<pitch absmiddle="value">
Scoped or empty tag. Sets the pitch compared to voice's default pitch. Value between -10..0..10.

<pitch middle="value">
Scoped or empty tag. Sets the pitch compared to current pitch. Very useful if you want to <pitch middle="3">lift the pitch</pitch> for a moment. Value between -10..0..10.

<rate absspeed="value">
<rate speed="value">
As pitch but sets the speech rate.

<spell>
Scoped or empty tag. Spells every character between tags <spell> and </spell> according to spelling list from file *.tfg. For example:
Alphabets are: <spell>abcdefg jne.</spell>

<volume level="value">
Scoped or empty tag. Sets the volume in percents. 100 is the maximum (default) and zero is complete silence. Volume settings does not affect the mixer but only the digital signal.

<voice name="name">
Scoped or empty tag. Selects voice. Name is one of the voice files (*.pu5) found from Mikrouhe folder. Linux-version keeps two last used voices in memory. This makes switching between two voices quick. For example:
<voice name="petteri">Petterin ni </voice><voice name="saga">Sagan ni </voice>
<voice name="petteri"/>Petteri's voice set to default.

<mikropuhe svt="value"/>
Empty tag. Sets pause between words. Value between 0..24

<mikropuhe rau="value"/>
Ei lopputagia. Asettaa puheen rauhallisuuden kuten Mikropuheen parametri -RAU. Arvo vlill -12..0..12.
Empty tag. Sets intonation. Value between (changing) -12..0..12 (monotonic).

<mikropuhe ven="value"/>
Empty tag. Sets "stretching" parameter. Value between -50..0..50.

<mikropuhe sat="value"/>
Empty tag. Sets "random" parameter. Value between 0..100.

<mikropuhe aku="value"/>
Empty tag. Sets "donald duck" parameter. Value between 0..1.

<mikropuhe rob="value"/>
Empty tag. Sets "robot" parameter. Value between 0..2.

<mikropuhetul name="name">
Empty or scoper tag. Changes text interpretation file (*.tul). If name is empty, no interpretation is used at all. If name does not contain folder (recommeded), the file is loaded from Mikropuhe's folder. Do not give file extension (.tul). For example:
<mikropuhe tul="normaali"/>
<mikropuhe tul=""/>


<mpstereo pan="value">
Empty or scoper tag.  Changes stereomode depending on value:
0=(Double) mono
1=Only left
2=Only right
3=Double mono another channel in different phase (useful for headphones?)


TAGGING SAMPLE TEXT

<mpstereo pan="0">Puhun keskell.</mpstereo>
<mpstereo pan="1">Vasemmalta.</mpstereo>
<mpstereo pan="2">Oikealta.</mpstereo>
<mpstereo pan="3">Ja toinen kanava eri vaiheessa.</mpstereo>
Esittelen nyt tulkinnan parametreja. Oletustulkinta: A, b, c, <mikropuhetul name="">Ei tulkintaa: A, b, c, </mikropuhetul>Paluu oletukseen: a, b, c, <mikropuhetul name="sanokaik.tul"/>Sanokaik.tul: A, b, c, <reset/>reset tuli: A, b, c. Sitten uusi break-parametri <break time="small"/>lyhyt ja <break time="x-large"> pitk.
<beep freq="400" length="100" volume="100"/><beep freq="423" length="100" volume="85"/><beep freq="449" length="100" volume="70"/><beep freq="476" length="100" volume="55"/><beep freq="504" length="100" volume="40"/><beep freq="534" length="100" volume="25"/><beep freq="566" length="100" volume="10"/><beep freq="599" length="100" volume="25"/><beep freq="635" length="100" volume="40"/><beep freq="673" length="100" volume="55"/><beep freq="713" length="100" volume="75"/><beep freq="755" length="100" volume="85"/><beep freq="800" length="100" volume="100"/>
Osaan piipityksen lisksi mys soittaa wav-tiedostoja.
Tss stereoni. <audio src="esimerkki.wav">Yritin juuri soittaa esimerkki.wav</audio>

Seuraava <emph>painotus on jo kuuluva</emph>. Pidn sekunnin paussin <silence msec="1000"/> juuri sken ja lyhyen <break time="small"/> sken. Asetan <pitch absmiddle="5">korkeuden arvoon 5. </pitch> Palasin hetkeksi peruskorkeuteen, jonka <pitch middle="-4">pudotin</pitch> hetkeksi nelj pykl alaspin. Nyt esittelen nopeuteni<rate absspeed="10">tm on aivan sairaan nopeaa puhetta, 123 123 <rate absspeed="-10"> ja tm taas todella hidasta.</rate></rate> sken kytettiin siskkisi nopeustageja ja nyt puhun taas perusnopeudella. Tavaan joutessani vhn aakkosten alkua, <spell>abcdef</spell>, jonka jlkeen voisinkin olla <volume level="25">hetken varsin hiljaa.</volume> Niin ja puheni lytyy: <voice name="mikko">Mikko, </voice><voice name="petteri">Petteri, </voice><voice name="saga"> ja Saga.</voice> <mikropuhe svt="20">Nin lopuksi listn sanavlitaukoa<mikropuhe svt="0"><mikropuhe rau="99"> ja asetetaan synteesi monotoniseksi. <mikropuhe rau="0">Nille Mikropuheen omille sdille voidaan mys rakentaa lopputagit, jos tarvetta on. 

Venytyssdll voi saada miellyttvmmn puhenen. Esittelen tekijiden suosikkinen:
<voice name="riku">
<mikropuhe ven="15"/>
<mikropuhe rau="-8">
<pitch absmiddle="-3">

Tss on venytys arvossa 15, rauhallisuutta on -8, eli korkeus vaihtelee runsaasti ja peruskorkeus on arvossa -3.

</pitch>
</voice>
<mikropuhe ven="0"/>
<mikropuhe rau="0">

<voice name="saga">Kiitos kuuntelemisesta <voice name="petteri">minunkin puolestani.</voice></voice>
