NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.

C5000 TI embedded speech recognizer (TIesr) FAQ

From Texas Instruments Wiki
Jump to: navigation, search

Where can I find the C5535 ezdsp TIesr TI design?[edit]

Speech Recognition Reference Design for the C5535 eZdsp can be found at TIDEP-0066
The source code for the demo is located in the C5000 CSL which can be downloaded from C5000 CSL. Once installed, the source files are located at C:\ti\c55_lp\c55_csl_3.07\demos\TIesr\c5535

How do you modify the trigger phrase?[edit]

The trigger phrase can be modified as need. See section 5.3.3 of the TI design guide

What does _Fill mean?[edit]

The _Fill word is a "filler" meant to explain all speech except for the trigger phrase.

I want to create a dictionary for _Fil. How do I do that?[edit]

The C55 demo reference design uses the TIesr recognizer with a dictionary and model set that includes the special word "_FILL" which is defined as pronounced by using a special _fil "phone model". The "_FILL" word can be used in developing a user's specific TIesr grammar network text file for recognition given the model and dictionary set supplied with the demo. The documentation describes how to create a TIesr grammar network text file.

To create a new _fil "phone model", for example for another language, requires detailed knowledge of the TIesr recognizer and support tools. One must create a new model and dictionary set with support for a _fil model, and train the _fil model over a large speech corpus. Creating such sets are described in the documentation and examples that accompany the open source TIesr recognizer TIesr. The exact training procedure and form of the _fil model must be defined by the user. Implementing such training assumes that the TIesr user is familiar with automatic speech recognition processing, and development of such systems.

As per my understanding, _fil is modeled just like any other phonetic. So I should consider _fil as a word and a phonetic in my dictionary and model it with audio recordings?[edit]

There are a large number of ways to create a filler/garbage model capability. One way that I did this was to add a word, "_Fill" to the dictionary with the pronunciation using a single "_fil" phone. I modeled the _fil phone as an HMM with a single emitting state, and a number of Gaussian mixture components (hence a Gaussian mixture model). To train the _fil phone, I did a special training in HTK where in the training master label file I replaced all phone models with _fil, and then ran several passes of training. The idea was that this would train the _fil model to represent the broad acoustic classes of all speech in the database. Then to perform recognition in TIesr I could place the _Fill word in the grammar network wherever I wanted to perform out-of-vocabulary discrimination.

Can I use more than 1 trigger phrase?[edit]

Yes, it possible to do so. Below is an example of 2 trigger phrases as written in build_ files.sh ("TI voice trigger" and "hello world").

     "start( WakeGram ).
          WakeGram ---> [_Fill] Phrase [_Fill].
          Phrase ---> (T I voice trigger) | (hello world) | _Fill."
          Data/GramDir \
          Data/filler_model \
          English \
          2 0 1 0 0 0 0

What do each of the flags in build_files.sh meant?[edit]

Dist/LinuxDebugGnu/bin/testtiesrflex \
"start( WakeGram ).
WakeGram ---> ( [_Fill] Phrase [_Fill] ) | _Fill.
Phrase ---> t i voice trigger." \
Data/GramDir \
Data/filler_model \
English \
2 0 1 0 0 0 0

In the above example, 2 0 1 0 0 0 0 is mapped from left to right as follows:

max_pron: Maximum pronunciations per word to include in output grammar network. The “2” defines the maximum number of different pronunciations of a word from the dictionary that you wish to include in the active grammar. For example, the word “the” can be pronounced two ways, “th uh” or “th ee”. Both pronunciations are in the dictionary, with the more common pronunciation of “th uh” coming first. If you set the parameter to 1, then only the pronunciation “th uh” is used in the recognition grammar. If it is set to 2, then both pronunciations are used. Of course, the number of pronunciations used for a word assumes that the pronunciations are available in the dictionary. If there are fewer pronunciations in the dictionary than the parameter requests, it uses all available pronunciations found in the dictionary. If the word is not even in the dictionary, then as a backup the TIesrDT module guesses the pronunciation from the word spelling. The number of actual active words during recognition is defined solely by the words required by the grammar network. The maximum number of different pronunciations of any word is specified by max_pron.

inc_rule: Flag indicating to include decision tree rule pronunciation auto_sil: Flag indicating to include optional silence between words lit_end: Flag output files in little endian format byte_mean: Output acoustic probability mean vectors as byte data byte_var: Output acoustic probability variance vectors as byte data add_close: (optional; default enabled) Add closure phones prior to stop consonants

Why does TIesr have false triggers?[edit]

Acoustically rich keywords are the best for response reliability. These are words that have multiple syllables and unique sounds (e.g start recording, or capture complete). Operation would not be reliable at very high or very low gains. Have to experiment for a sweet-spot. Enabling filters or integrating noise reduction algorithms will also help improve accuracy.

Also note that TIesr does not work well when English trigger phrases are spoken with a heavy accent.

Can I use TIesr in my product that needs voice triggering?[edit]

Yes, this is absolutely possible. The code was released with a license that allows for customer to use the source code without having to pay TI any royalties.

Please bear in mind that TIesr is "demo" quality and was not intended to be used in a production environment. It was purely to demonstrate the voice triggering capabilities of the C5000 DSPs.
Sensory has a production quality voice triggering engine which can be run on the C5000 family of DSPs. Please contact Sensory or your local TI representative for more details.

How do I increase the accuracy of TIesr?[edit]

The TIesr recognizer consists of two main parts TIesr_Flex which creates word pronunciation networks and model sets based on an input grammar, and TIesr_SI which is the recognizer itself. The accuracy of TIesr depends on the acoustic models that TIesr_Flex uses. These acoustic models must be trained for the language and accents that will be encountered in an application. TIesr comes with acoustic models for general American English only. New models may need to be trained.

You must be careful if you wish to change sampling rates. This requires that you also specify the proper frequency parameters of the front-end.

To operate well in a keyword spotting capacity, consider creating a unique Filler model that covers general sounds of the language that operates as an optional explanation of the input speech in the recognition grammar.

To obtain good recognition of single words, consider creating acoustic models that are word-based rather than the default TIesr phonetic based models. The documentation and examples that come with the TIesr open source project explain how this can be done.

TIesr includes the capabilities to change from one grammar to another at run time, by switching between different grammar and model set files. This may not be included in the C55 design TIDEP-0066. TIesr can also dynamically create grammars from input grammar text strings if TIesr_Flex is included in the executable, but that does require memory resources to store the acoustic models, dictionary and decision trees as well as the code.

Can TIesr be run on C6000?[edit]

TIesr is only supported on the C5000 DSPs (16-bit, fixed point).
There currently aren't plans to extend TIesr to also function on the C6000 lin of DSPs.

When I build TIesrDemoC55, I get an error saying "Tag_Memory_Model attribute value of "2" that is different than one". How do I resolve this error?[edit]

This is due to a memory model mismatch between C55XXCSL_LP and TIesrDemoC55. The memory model needs to be HUGE for both.
You can change it by:

1) In the C55XXCSL_LP project, Navigate to Properties?Build?C5500 Compiler?Processor Options . Change Specify memory model to huge.
2) Change Specify type size to hold results of pointer math to 32 if not specified in Properties?Build?C5500 Compiler?Advanced Options?Runtime Model Options.
3) Uncheck use large memory model if checked.
4) Rebuild C55XXCSL_LP.
5) Rebuild TIesrDemoC55.

E2e.jpg {{
  1. switchcategory:MultiCore=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article C5000 TI embedded speech recognizer (TIesr) FAQ here.

Keystone=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article C5000 TI embedded speech recognizer (TIesr) FAQ here.

C2000=For technical support on the C2000 please post your questions on The C2000 Forum. Please post only comments about the article C5000 TI embedded speech recognizer (TIesr) FAQ here. DaVinci=For technical support on DaVincoplease post your questions on The DaVinci Forum. Please post only comments about the article C5000 TI embedded speech recognizer (TIesr) FAQ here. MSP430=For technical support on MSP430 please post your questions on The MSP430 Forum. Please post only comments about the article C5000 TI embedded speech recognizer (TIesr) FAQ here. OMAP35x=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article C5000 TI embedded speech recognizer (TIesr) FAQ here. OMAPL1=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article C5000 TI embedded speech recognizer (TIesr) FAQ here. MAVRK=For technical support on MAVRK please post your questions on The MAVRK Toolbox Forum. Please post only comments about the article C5000 TI embedded speech recognizer (TIesr) FAQ here. For technical support please post your questions at http://e2e.ti.com. Please post only comments about the article C5000 TI embedded speech recognizer (TIesr) FAQ here.

}}

Hyperlink blue.png Links

Amplifiers & Linear
Audio
Broadband RF/IF & Digital Radio
Clocks & Timers
Data Converters

DLP & MEMS
High-Reliability
Interface
Logic
Power Management

Processors

Switches & Multiplexers
Temperature Sensors & Control ICs
Wireless Connectivity