How to Use Models from SphinxTrain in Sphinx-4

Using new models is easy, you just need to configure the recognizer properly. It usually includes three steps:

Take existing configuration file

As a base for building your application config take any existing configuration. If you are trained small vocabulary task, take configuration from HelloWorld demo, if you trained large vocabulary task, take config from Lattice demo.

Define a dictionary and a language model.

You can use the same phonetic dictionary and the same model as was used for training and initial testing. They are located located in the folder <your_training_folder>/etc/ and have names like <your_model_name>.dic and <your_model_name>.lm.DMP. If you don't have LM yet, you can create it with cmuclmtk and later convert to DMP format with sphinx_lm_convert from sphinxbase package. Do the following changes in model and dictionary configuration, just point to the files:
  <component name="trigramModel" type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel">
    <property name="unigramWeight" value="0.7"/>
    <property name="maxDepth" value="3"/>
    <property name="logMath" value="logMath"/>
    <property name="dictionary" value="dictionary"/>
    <property name="location" 
	value="the name of the language model file 
	       for example <your_training_folder>/etc/<your_model_name>.lm.DMP"/>
  </component>

  <component name="dictionary" type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">
    <property name="dictionaryPath"
	 value="the name of the dictionary file 
	       for example <your_training_folder>/etc/<your_model_name>.dic"/>
    <property name="fillerPath" 
	 value="the name of the filler file 
	       for example <your_training_folder>/etc/<your_model_name>.filler"/>
    <property name="addSilEndingPronunciation" value="false"/>
    <property name="allowMissingWords" value="false"/>
    <property name="unitManager" value="unitManager"/>
  </component>

Define an acoustic model

Next is the acoustic model. During training several models are created, you need one of them. For large vocabulary task cd (context dependent) model is located in <your_training_folder>/model_parameters/<your_db_name>.cd_cont_<number of senones>.

For small vocabulary task it's enough to take ci (context independent model). It's located in <your_training_folder>/model_parameters/<your_db_name>.ci_cont.

This folder should include several files, like means, variances, feat.params, mdef. There will be also folders for different number of gaussians like _2 _4 _8, they are intermediate ones and you don't need them.

Again, let's define a model in config file:

  <component name="sphinx3Loader"
   	     type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
	<property name="logMath" value="logMath"/>
	<property name="unitManager" value="unitManager"/>
    <property name="location" value="the path to the model folder
	       for example <your_training_folder>/model_parameters/<your_model_name>.cd_cont_<senones>"/>
  </component>

  <component name="acousticModel" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
    <property name="loader" value="sphinx3Loader"/>
    <property name="unitManager" value="unitManager"/>
  </component>

Please note that path value is just URI, so it could start with URI prefix like http://

Note that for MLLT you probably also want change vectorLength property. Otherwise it's not needed.

Optionally. Configure the frontend. Skip this step for usual 16 kHz model.

If you trained 8 kHz model or MLLT model, you need to change the frontend accordingly. Here are required changes:

  <component name="mfcFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
    <propertylist name="pipeline">
       ....
      <ite>melFilterBank</item>
       ....
      <item>lda</item>
    </propertylist>
  </component>
  
  <component name="melFilterBank" type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank">
    <property name="numberFilters" value="31"/>
    <property name="minimumFrequency" value="200"/>
    <property name="maximumFrequency" value="3500"/>
  </component>

  <component name="lda" type="edu.cmu.sphinx.frontend.feature.LDA">
      <property name="loader" value="sphinx3Loader"/>
  </component>

melFilterBank params here are changed for default 8kHz frequences and lda component is introduced to transform feature space with MLLT matrix.

That's all. Start your application with a new configuration and it will recognize using new model.

For more information on configuration see Javadoc and Programmer's Documentation.

Models in JAR

Optionally you can pack models into JAR file. The advantage of having it in a JAR file is that the JAR file can simply be included in the classpath and referenced in the configuration file for it to be used in a Sphinx-4 application. Once you did so, don't forget to include the JAR into the classpath. To configure loading form the jars, Sphinx4 allows URIs to contain resource:<acoustic or language model path> which allows XML config files to easily reference models in JAR files. Scheme resource:/path causes Sphinx4 to search on the classpath for the path. See our demos for example on how WSJ model files are loaded from WSJ jar.