c# - 为什么 Microsoft Speech Recognition Semantic Value.Confidence 值始终为 1？

我正在尝试使用带有自定义语法的 SpeechRecognizer 来处理以下模式:

“你能打开{item}吗？”其中 {item} 使用 DictationGrammar。

我正在使用 Vista 和 .NET 4.0 中内置的语音引擎。

我希望能够获得返回的 SemanticValues 的置信度。请参见下面的示例。

如果我简单地使用“recognizer.AddGrammar( new DictationGrammar() )”，我可以浏览 e.Results.Alternates 并查看每个替代项的置信度值。如果 DictationGrammar 处于顶层，这将起作用。

组成的例子:

你能打开 Firefox 吗？ .95
你能打开 Fairfax 吗？ .93
你能打开文件传真吗？ .72
你会写 Firefox 吗？ .85
你能钉住费尔法克斯吗？ .63

但是如果我构建一个语法来查找“Can you open {semanticValue Key='item' GrammarBuilder=new DictationGrammar()}?”，那么我会得到这个:

你能打开 Firefox 吗？ .91 - 语义 = {GrammarBuilder.Name = "can you open"}
你能打开 Fairfax 吗？ .91 - 语义 = {GrammarBuilder.Name = "can you open"}
你能打开文件传真吗？ .91 - 语义 = {GrammarBuilder.Name = "can you open"}
你会写 Firefox 吗？ .85 - 语义 = null
你能钉住费尔法克斯吗？ .63 - 语义 = null

.91 向我展示了它与“Can you open {item}?”模式相匹配的置信度。但没有进一步区分。

但是，如果我随后查看 e.Result.Alternates.Semantics.Where( s => s.Key == "item")，并查看他们的置信度，我会得到:

火狐 1.0
费尔法克斯 1.0
归档传真 1.0

这对我帮助不大。

当我查看匹配的 SemanticValues 的置信度时，我真正想要的是这样的:

Firefox .95
费尔法克斯 .93
归档传真.85

看起来它应该这样工作......

我做错了什么吗？在 Speech 框架内甚至有办法做到这一点吗？

我希望有一些内置机制，以便我可以以“正确”的方式进行操作。

至于另一种可能有效的方法...

使用 SemanticValue 方法匹配模式
对于匹配该模式的任何内容，提取 {item} 的原始音频(使用 RecognitionResult.Words 和 RecognitionResult.GetAudioForWordRange)
通过带有 DictationGrammar 的 SpeechRecognizer 运行 {item} 的原始音频以获得置信度

...但这比我真正想做的处理更多。

最佳答案

我认为听写语法只做转录。它在不提取语义的情况下对文本进行语音处理，因为根据定义，听写语法支持所有单词，并且对您的特定语义映射没有任何线索。您需要使用自定义语法来提取语义。如果您提供 SRGS 语法或在代码中构建一个语法或使用 SpeechServer 工具，则可以为某些单词和短语指定语义映射。然后识别器可以提取语义并给你语义信心。

您应该能够从识别器的识别中获得置信度值，尝试 System.Speech.Recognition.RecognitionResult.Confidence。

Microsoft Server Speech Platform 10.2 SDK 附带的帮助文件有更多详细信息。 (这是用于服务器应用程序的 Microsoft.Speech API，它与用于客户端应用程序的 System.Speech API 非常相似)参见 (http://www.microsoft.com/downloads/en/details.aspx?FamilyID=1b1604d3-4f66 -4241-9a21-90a294a5c9a4.) 或位于 http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.semanticvalue(v=office.13).aspx 的 Microsoft.Speech 文档

对于 SemanticValue 类，它说:

All Speech platform-based recognition engines output provide valid instances of SemanticValue for all recognized output, even phrases with no explicit semantic structure.

The SemanticValue instance for a phrase is obtained using the Semantics property on the RecognizedPhrase object (or objects which inherit from it, such as RecognitionResult).

SemanticValue objects obtained for recognized phrases without semantic structure are characterized by:

Having no children (Count is 0)

The Value property is null.

An artificial confidence level of 1.0 (returned by Confidence)

Typically, applications create instance of SemanticValue indirectly, adding them to Grammar objects by using SemanticResultValue, and SemanticResultKey instances in conjunction with, Choices and GrammarBuilder objects.

Direct construction of an SemanticValue is useful during the creation of strongly typed grammars

当您在语法中使用 SemanticValue 功能时，您通常会尝试将不同的短语映射到一个单一的含义。在您的情况下，短语“I.E”或“Internet Explorer”应该都映射到相同的语义。您在语法中设置选项以理解可以映射到特定含义的每个短语。这是一个简单的 Winform 示例:

private void btnTest_Click(object sender, EventArgs e)
{
    SpeechRecognitionEngine myRecognizer = new SpeechRecognitionEngine();

    Grammar testGrammar = CreateTestGrammar();  
    myRecognizer.LoadGrammar(testGrammar);

    // use microphone
    try
    {
        myRecognizer.SetInputToDefaultAudioDevice();
        WriteTextOuput("");
        RecognitionResult result = myRecognizer.Recognize();              

        string item = null;
        float confidence = 0.0F;
        if (result.Semantics.ContainsKey("item"))
        {
            item = result.Semantics["item"].Value.ToString();
            confidence = result.Semantics["item"].Confidence;
            WriteTextOuput(String.Format("Item is '{0}' with confidence {1}.", item, confidence));
        }

    }
    catch (InvalidOperationException exception)
    {
        WriteTextOuput(String.Format("Could not recognize input from default aduio device. Is a microphone or sound card available?\r\n{0} - {1}.", exception.Source, exception.Message));
        myRecognizer.UnloadAllGrammars();
    }

}

private Grammar CreateTestGrammar()
{                        
    // item
    Choices item = new Choices();
    SemanticResultValue itemSRV;
    itemSRV = new SemanticResultValue("I E", "explorer");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("explorer", "explorer");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("firefox", "firefox");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("mozilla", "firefox");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("chrome", "chrome");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("google chrome", "chrome");
    item.Add(itemSRV);
    SemanticResultKey itemSemKey = new SemanticResultKey("item", item);

    //build the permutations of choices...
    GrammarBuilder gb = new GrammarBuilder();
    gb.Append(itemSemKey);

    //now build the complete pattern...
    GrammarBuilder itemRequest = new GrammarBuilder();
    //pre-amble "[I'd like] a"
    itemRequest.Append(new Choices("Can you open", "Open", "Please open"));

    itemRequest.Append(gb, 0, 1);

    Grammar TestGrammar = new Grammar(itemRequest);
    return TestGrammar;
}

关于c# - 为什么 Microsoft Speech Recognition Semantic Value.Confidence 值始终为 1？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5415262/

c# - 为什么 Microsoft Speech Recognition Semantic Value.Confidence 值始终为 1？

上一篇：c# - 如何只显示非可选的 xml 属性？在 C# 中

下一篇：c# - 在使用命名空间前缀的文档上使用 Linq to XML 时出现问题