git.gag.com Git - debian/freetts/blob - tools/FestVoxToFreeTTS/README.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
   2
   3 <!--
   4
   5 /**
   6  * Copyright 2003 Sun Microsystems, Inc.
   7  *
   8  * See the file "license.terms" for information on usage and
   9  * redistribution of this file, and for a DISCLAIMER OF ALL
  10  * WARRANTIES.
  11  */
  12
  13 -->
  14
  15 <html>
  16     <head><title>FestVox to FreeTTS</title></head>
  17     <body>
  18         <center>
  19             <table bgcolor="#FFCC66" width="100%">
  20                 <tr>
  21                     <td align=center width="100%">
  22                         <h1>FestVox To FreeTTS</h1>
  23                     </td>
  24                 </tr>
  25             </table>
  26         </center>
  27
  28         <p>As of FreeTTS 1.2, FreeTTS provides support to import voice
  29         data directly from FestVox.  The process currently works well
  30         for US English voices, but you are definitely encouraged to
  31         try to help us make it work for other locales.  This page
  32         describes the overall process for doing the import.</p>
  33
  34         <h3>Creating a Voice</h3>
  35         <p>You must first create a voice using
  36         <a href="http://festvox.org">FestVox</a>.  We've had success
  37         using FestVox 2.0 on both Linux (RedHat 9.0) and Solaris (use
  38         gcc 3.2.2 to compile FestVox and Festival on Solaris).
  39         <b>NOTE that we did not create FestVox, nor can we provide
  40         support for it.</b>  The creators of FestVox, however, did a
  41         great job and you can refer to their documentation for where
  42         to send any questions or comments.</p>
  43
  44         <p>FestVox currently provides support for creating two types
  45         of voices:  diphone and unit selection.  The diphone voices
  46         support general domain synthesis (i.e., they try to speak any
  47         text you throw at them).  They are time consuming to create,
  48         and are usually not a good first choice when learning how to
  49         create voices.  The unit selection, or limited domain, voices
  50         only support a limited somain (e.g., telling the time), and
  51         generally sound very good.</p>
  52
  53         <p>If you want to experiment with voice creation and
  54         conversion, we recommend you start with creating a time
  55         telling voice.</p>
  56
  57         <p>Please refer to the <a href="http://festvox.org/bsv/">
  58         FestVox Documentation</a> for information on creating a voice.
  59         <a href="http://www.festvox.org/bsv/bsv-usukdiphone-ch.html">
  60         Section IV.19</a> of the FestVox documentation provides a
  61         good tutorial on making a US Diphone voice, and
  62         <a href="http://www.festvox.org/bsv/x1003.html">
  63         Section II.5.6</a> provides a good tutorial on recording a
  64         cluster unit voice for the limited domain of telling
  65         the time.  <a href="http://www.festvox.org/bsv/bsv-ldom-ch.html">
  66         Section II.5</a> provides a good general explanation of
  67         creating a limited domain voice in general.</p>
  68
  69         <h3>Importing a FestVox Voice into FreeTTS</h3>
  70         <p>FreeTTS follows many of the same steps that
  71         <a href="http://cmuflite.org">Flite</a> follows for importing
  72         voices.  For a more detailed description of the process,
  73         please read
  74         <a href="http://www.speech.cs.cmu.edu/flite/doc/flite_8.html#SEC14">
  75         Section 8</a> of the
  76         <a href="http://www.speech.cs.cmu.edu/flite/doc/index.html">
  77         Flite documentation</a>.
  78
  79         <p>To import a voice into FreeTTS, you first need to do the
  80         following things:
  81         <ol>
  82             <li>Compile <a href="http://festvox.org">Festival 1.4.3 and
  83             FestVox 2.0</a> as well as the speech tools that come with
  84             Festival.  Refer to the Festival documentation for details
  85             of setting this up on your system.  We've only built
  86             Festival and FestVox on RedHat 9.0 and Solaris.  For both
  87             systems, we used gcc 3.2.2.
  88
  89             <li>"festival", "ant", "java", and "javac" must be in your path.
  90             For example, we used the following command under bash on
  91             RedHat (modify appropriately):
  92             <ul>
  93                <p><code>export
  94                PATH=/usr/java/j2sdk1.4.2/bin:/home/jim/festival/bin:/usr/java/apache-ant-1.5.4/bin:$PATH</code>
  95             </ul>
  96
  97             <li>You must set the ESTDIR environment variable to point
  98             to the speech tools.  For example:
  99             <ul>
 100                <p><code>export
 101                ESTDIR=/home/jim/speech_tools</code>
 102             </ul>
 103         </ol>
 104
 105         <p>To convert a voice, run the
 106         <code>FestVoxToFreeTTS.sh</code> script from a command line
 107         prompt located in the <code>tools/FestVoxToFreeTTS</code>
 108         directory:
 109         <ul>
 110            <p><code>FestVoxToFreeTTS.sh &lt;voicedir></code>
 111         </ul>
 112         <p>where &lt;voicedir> is the directory the FestVox voice
 113         resides in.  The contents of <voicedir> will looks something
 114         like the following:</p>
 115         <ul>
 116 <pre>
 117 bin/  etc/       FreeTTS/  lpc/     prompt-cep/  recording/  wav/
 118 cep/  f0/        group/    mcep/    prompt-lab/  scratch/    wavn/
 119 dic/  festival/  lab/      pm/      prompt-utt/  sts/        wrd/
 120 emu/  festvox/   lar/      pm_lab/  prompt-wav/  versions/
 121 </pre>
 122         </ul>
 123
 124         <p>The script will automatically detect whether it is a
 125         cluster unit voice or a diphone voice by looking at the
 126         &lt;voicedir>/etc/voice.defs file.  If no such file exists,
 127         you will need to create it.  An example for a time-telling
 128         voice would be something like the following:
 129
 130         <ul>
 131 <pre>
 132 FV_INST=sun
 133 FV_LANG=time
 134 FV_NAME=dtv
 135 FV_TYPE=ldom
 136 FV_VOICENAME=$FV_INST"_"$FV_LANG"_"$FV_NAME
 137 FV_FULLVOICENAME=$FV_VOICENAME"_"$FV_TYPE
 138 </pre>
 139         </ul>
 140
 141         <p>If possible, you can let festival automatically generate
 142         this for you.  Try
 143         &lt;<code>festvoxdir>/src/general/guess_voice_defs</code>.
 144
 145         <p>FreeTTS will create a new directory
 146         <code>&lt;voicedir>/FreeTTS/</code>.  In that directory is the
 147         text which contains all the data for the voice (along with a
 148         few other intermediate files).  The voice file will have a
 149         name such as <code>sun_time_dtv.txt</code>.
 150
 151         <p>The various stages of the conversion process can be called
 152         directly by passing a second argument to
 153         <code>FestVoxToFreeTTS.sh</code> such as "sts" or "mcep".
 154         These should be used carefully.  More information on these
 155         stages can be found in the Flite documentation.
 156
 157         <p>If you do not pass a second argument (recommended) the
 158         conversion tool will run the processing stages in the
 159         following order: "lpc", "sts", "mcep" (if a cluster unit
 160         voice), "idx", "install", and "compile".  The "install" and
 161         "compile" are specific to FreeTTS and are not mentioned in
 162         the Flite documentation.  They are the stages that construct
 163         the framework for the voice within freetts and compile the
 164         result.
 165
 166         <p>When the process gets to the install phase, you will
 167         encounter a menu.  The install phase only knows how to handle
 168         US English voices.  If you have any other languages/locales,
 169         then you should probably exit at this step.  Unfortunately
 170         adding new languages or locales is beyond the scope of this
 171         document.
 172
 173         <p>The menu allows you to define various features about the
 174         voice:
 175         <ul>
 176             <li><b>Name</b>: The name you want to call this voice.
 177             For example "kevin", "kevin16", "alan", or "dave".
 178
 179             <li><b>Gender</b>: The gender of the voice.  Select
 180             help from the menu for a full listing of genders.
 181
 182             <li><b>Age</b>: The age of the voice.  Select help
 183             from the menu for a full listing of ages.
 184
 185             <li><b>Description</b>: A sentence or so that
 186             describes this voice for others.
 187
 188             <li><b>Full Name</b>: - The name that will be used to
 189             name the voice files and directory.  DON'T USE SPACES.
 190             It must be unique
 191             to this installation of FreeTTS as well as any other
 192             copy of FreeTTS you expect to use this voice.  For the
 193             sake of similarity to other voices, it is highly
 194             recommended to not change this property unless it
 195             conflicts with an existing voice.  The format for the
 196             name follows the convention:
 197             <code>&lt;domain>_&lt;locale>_&lt;name></code>.
 198             The &lt;name> does not have to match the Name
 199             property.  The domain generally matches an Internet
 200             domain or some other globally unique identity.  For
 201             limited domain voices, you might use the limited domain
 202             name instead of locale.  Example names include
 203             <code>cmu_us_kal</code>, <code>cmu_time_awb</code>,
 204             and <code>sun_us_dtv</code>.
 205
 206             <li><b>Domain</b>: The domain if this is a limited
 207             (ldom) voice, otherwise it must be set to "general".
 208
 209             <li><b>Organization</b>: The organization which
 210             recorded the voice.  For example "cmu" or "sun".
 211         </ul>
 212
 213         <p>If there already exists a voice with the same Full Name,
 214         you are given the option to over-write it, cancel, or change
 215         the properties.
 216
 217         <p>When this is done, the voice is put into the FreeTTS
 218         directory structure
 219         <code>&lt;FreeTTSdir>/com/sun/speech/freetts/en/us/&lt;voice
 220         Full Name></code>.  It is recommended to visit this directory
 221         and confirm that everything looks correct; there should be
 222         four files similar to the following:
 223         <pre>
 224     README                 - Information about the voice
 225     sun_time_dtv.txt       - The imported voice data in ASCII format
 226     voice.Manifest         - The Manifest file with which to create the jar file
 227     DtvVoiceDirectory.java - The VoiceDirectory for this new voice
 228         </pre>
 229
 230         <p>If this is a
 231         limited domain voice for something other than the cmu time
 232         domain, then you will likely have to make some changes to make
 233         it look at the correct lexicon.
 234
 235         <p>As part of the import process, the FestVoxToFreeTTS.sh
 236         script will create the jar file for the voice.  If you wish
 237         to create the jar file manually, you can run one of the
 238         following commands, depending upon the type of voice you
 239         have imported (substitute the Full Name of the voice you
 240         imported):
 241         <pre>
 242     ant -Dclunit_voice=sun_time_dtv -find build.xml
 243     ant -Ddiphone_voice=sun_us_dtv -find build.xml
 244         </pre>
 245
 246         <p>The compiled voice is put in
 247         <code>&lt;FreeTTSdir>/lib/&lt;voice Full Name>.jar</code>.
 248
 249         <p>The voice will automatically be added to the list of
 250         available voices for FreeTTS.
 251
 252         <p>You can now test your voice with:
 253         <ul>
 254             <p><code>java -jar lib/freetts.jar myvoicename</code>
 255             (general domain)
 256             <p><code>java -jar bin/JTime.jar myvoicename</code>
 257             (time domain)
 258         </ul>
 259         <p>where myvoicename is the name property you assigned
 260         to your voice in the "install" phase.  If you've forgotten
 261         the name, you can always retrieve it by executing the jar
 262         file for your voice:
 263         <ul>
 264             <p><code>java -jar lib/&lt;voice Full Name>.jar</code>
 265         </ul>
 266
 267
 268         <h3>Files in this directory</h3>
 269         <ul>
 270             <li><b>FestVoxToFreeTTS.sh</b>: The bash script that
 271             performs the conversion process.
 272             <li><b>FestVoxClunitsToFreeTTS.scm</b>: Performs the idx
 273             stage of the conversion for cluster unit voices.
 274             <li><b>FestVoxDiphoneToFreeTTS.scm</b>: Performs the idx
 275             stage of the conversion for diphone voices.
 276             <li><b>qsort.scm</b>: A simple quicksort implementation in
 277             scheme.
 278             <li><b>FindSTS.java</b>: Generates the sts file for a
 279             given recording.  Used by FestVoxToFreeTTS.sh.
 280             <li><b>FindSTS.jar</b>: A compiled version of FindSTS.java
 281             (automatically generated)
 282             <li><b>README</b>: This file.
 283             <li><b>CMU_USDiphoneTemplate.java</b>: A template voice
 284             directory for en/us diphone voices.
 285             <li><b>CMU_USTimeTemplate.java</b>: A template voice
 286             directory for en/us time limited domain cluster unit
 287             voices.
 288             <li><b>VoiceMakefileTemplate.txt</b>: A template Makefile for
 289             both ldom and diphone voices.
 290         </ul>
 291
 292         <hr>
 293
 294         <p>See the <a href="../../license.terms">license terms</a>
 295         and <a href="../../acknowledgments.txt">acknowledgments</a>.
 296         <br>
 297         Copyright 2003 Sun Microsystems, Inc.  All Rights
 298         Reserved.  Use is subject to license terms.</p>
 299     </body>
 300 </html>