Accepting Kinect Speech Commands after a specific level of confidence

By | January 23, 2014

speech confidence

In my Kinect for Windows SDK Tips series, over the last few posts I was discussing about speech recognition using Kinect for Windows SDK. You have seen how we can load / unload multiple grammar, how to use wildcard with grammar builder or even getting list of recognized words from Kinect. This post is related with the confidence level of recognized words and I think this is required for all types of speech enabled application using Kinect.

In a speech enabled Kinect application, whenever the speech is recognized, usually we invoke a method to parse the command; and perform the action based on the recognized commands. But before we parse it:

 we  have to ensure if the voice has been recognized with a certain level of confidence and can be used by your application.

The RecognitionResult class has a property called ConfidenceLevel, that ensure how good the sound has been recognized by the Kinect sensor. The ConfidenceLevel is the value assigned by the speech recognizer on which the recognition engine accepts the speech command. This range of this float value is from 0 to 1.

Related Read : Kinect for Windows SDK Tips and Tricks

The speech recognizer also provides all the information based on the confidence level of the sound source on the speech that was identified. If the speech is detected but does not match properly or is of very low confidence level, the SpeechRecognitionRejected event handler will fire.
Confidence Level Check - Kinect Speech Commands
Here is how you can check the confidence level values before invoking the command parser. As shown in the following code block, the threshold value for confidence is set to 0.75; you can change it as per your needs and the required clarity.

private void SreSpeechRecognized(object sender,SpeechRecognizedEventArgs e)
int confidencethreshold= 0.675
if (e.Result.Confidence > confidencethreshold)
Dispatcher.BeginInvoke(new Action<SpeechRecognizedEventArgs>(CommandsParser), e);

The CommandParser method accepts the SpeechRecognizedEventArgs class as argument, which has a property called Result. In the CommandParser method you have to parse each and every word to match (Refer my previous few post on Kinect Speech for the details of this parsing) .
This ensure your application accepting your right quality of speech.