c# - System.Speech.Recognition 备选匹配项和置信度值

标签 c# .net speech-recognition

我正在使用 System.Speech.Recognition命名空间来识别口头句子。我对识别器提供的替代句子及其置信度分数感兴趣。来自 [RecognitionResult.Alternates][1] 的文档属性:

Recognition Alternates are ordered by the values of their Confidence properties. The confidence value of a given phrase indicates the probability that the phrase matches the input. The phrase with the highest confidence value is the phrase that most likely matches the input.

Each Confidence value should be evaluated individually and without reference to the confidence values of other Alternates.

但是,当我以其置信度打印识别出的文本,并以其置信度打印备选文本时,我面临两个我无法理解的属性:首先,备选方案未根据置信度排序(尽管第一个确实如此) match the recognized text),其次,这对我来说是一个更大的问题,识别的文本不是得分最高的备选方案,这似乎与我上面引用的文档相矛盾。

我的(不完整的)代码示例来自 SpeechRecognized事件处理程序:

Console.WriteLine("Recognized text =  {0}, score = {1}", e.Result.Text, e.Result.Confidence); 
// Display the recognition alternates for the result.
foreach (RecognizedPhrase phrase in e.Result.Alternates)
{
    Console.WriteLine(" alt({0}) {1}", phrase.Confidence, phrase.Text);
}

和相应的输出:

Recognized text =  She had said that fit and Gracie Wachtel are all year, score = 0.287724
alt(0.287724) She had said that fit and Gracie Wachtel are all year
alt(0.287724) she had said that fit and gracie wachtel are all year
alt(0.2955212) she had said that faith and gracie wachtel are all year
alt(0.287133) she had said that fit and gracie Wachtell are all year
alt(0.1644379) she had said that fit and gracie wachtel earlier
alt(0.3254312) jihad said that fit and gracie wachtel are all year
alt(0.2726361) she had said that fit and gracie wachtel are only are
alt(0.2867217) she had said that fail and gracie wachtel are all year
alt(0.2565451) she had said that fit and gracie watchful are all year
alt(0.2854537) she had said that fate and gracie wachtel are all year

编辑 要阐明置信度分数的含义,并说明为什么我的结果与文档相矛盾,请参阅 RecognizedPhrase.Confidence Property 文档中的以下信息.粗体部分是我的补充:

Confidence scores do not indicate the absolute likelihood that a phrase was recognized correctly. Instead, confidence scores provide a mechanism for comparing the relative accuracy of multiple recognition alternates for a given input. This facilitates returning the most accurate recognition result. For example, if a recognized phrase has a confidence score of 0.8, this does not mean that the phrase has an 80% chance of being the correct match for the input. It means that the phrase is more likely to be the correct match for the input than other results that have confidence scores less than 0.8.

A confidence score on its own is not meaningful unless you have alternative results to compare against, either from the same recognition operation or from previous recognitions of the same input. The values are used to rank alternative candidate phrases returned by the Alternates property on RecognitionResult objects.

Confidence values are relative and unique to each recognition engine. Confidence values returned by two different recognition engines cannot be meaningfully compared.

A speech recognition engine may assign a low confidence score to spoken input for various reasons, including background interference, inarticulate speech, or unanticipated words or word sequences. If your application is using a SpeechRecognitionEngine instance, you can modify the confidence level at which speech input is accepted or rejected with one of the UpdateRecognizerSetting methods. Confidence thresholds for the shared recognizer, managed by SpeechRecognizer, are associated with a user profile and stored in the Windows registry. Applications should not write changes to the registry for the properties of the shared recognizer.

The Alternates property of the RecognitionResult object contains an ordered collection of RecognizedPhrase objects, each of which is a possible match for the input to the recognizer. The alternates are ordered from highest to lowest confidence.

最佳答案

我只能给你一个笼统的答案(我不知道微软语音识别的代码) 识别使用许多算法来逼近最佳解决方案。在一个完美的世界中,每个算法都应该能够对转换后的句子的置信度得分进行加权。事实上,几乎从来没有这种情况:

每种算法都存在缺陷,并且给出其对转换信心的确切影响可能会让人非常头疼。

全局句子置信度是其中各部分的算术组合。通常比内部置信度模式简单得多。

使用的一些算法,如专有名词识别不一定会明显改变置信度(特别是在单个孤立的句子中)。

置信度是在多个层面(语音、单词、句子结构......)进行测量的。在句子结构不一致的情况下,完美语音识别的置信度是多少?

将更好的识别移到列表顶部的排序算法通常不会改变置信度,而只会排序/排除替代项。

因此文档是正确的,不能比较备选方案之间的置信度。

置信度的潜在用途是什么(除了作者想告诉我们的事实:我们可以让您轻松使用非常复杂和近似的技术)。几乎没有。您可以消除太低的置信度(低于某个阈值),除非没有置信度达到此阈值。

关于c# - System.Speech.Recognition 备选匹配项和置信度值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36965176/

相关文章:

c# - 应用程序的 GTK# 结构

c# - 使用 C# 对 SQL Server 语句中的列执行字符串函数

python - 通过 websockets 流式传输音频 IBM 不工作

.net - 在 global.asax.cs 重定向路由

.net - 添加到 Nhibernate 集合而不初始化集合

java - 在 CMU sphinx4 中获取语言分数

ios - iOS 中的语音识别何时受到限制,即 requestAuthorization 返回 `SFSpeechRecognizerAuthorizationStatusRestricted`

c# - 如何从 ASP.NET MVC 中的 404 Not Found 错误中删除查询字符串

c# - 如何对中间层 .NET 应用程序类/方法进行版本控制

c# - 强制在属性上使用一个属性,如果它们已经有另一个属性