我能够从 Bing 语音识别的 REST API 获得简短的听写答案。我的目标是获得超过 15-30 秒(又名长听写模式)的音频文件的响应。因此,我为获得简短答案所做的事情如下(我正在开发一个 HTML uwp 应用程序):
- 从音频文件 (wav) 生成
ArrayBuffer
- 通过访问 token 进行身份验证
- 使用以下设置将音频数据发送到 REST API:
var accessToken = [[accessTocken]];
var url = 'https://speech.platform.bing.com/recognize?';
var params = {
'version': '3.0',
'format': 'json',
'locale': 'en-US',
'device.os': 'Windows OS',
'scenarios': 'smd',
'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
'requestid': guid(),
'instanceid': guid()
};
var options = {
url: url + $.param(params),
type: "POST",
headers: {
'Authorization': 'Bearer ' + accessToken,
'Content-Type': 'audio/wav; samplerate=16000'
},
data: data
};
return WinJS.xhr(options);
所以这有效!但是对于长听写场景我该如何做到这一点?
请不要引用 JavaScript GitHub 存储库 https://github.com/microsoft/Cognitive-Speech-STT-Javascript 。这仅适用于简短听写,并且不适用于 Edge 浏览器。
最佳答案
来自 API 文档:
Your application must endpoint the audio to determine start and end of speech, which in turn is used by the service to determine the start and end of the request. You may not upload more than 10 seconds of audio in any one request and the total request duration cannot exceed 14 seconds.
也许您需要实现客户端库才能使用不同的模式。
ShortPhrase mode: an utterance up to 15 seconds long. As data is sent to the server, the client will receive multiple partial results and one final multiple N-best choice result.
LongDictation mode: an utterance up to 2 minutes long. As data is sent to the server, the client will receive multiple partial results and multiple final results, based on where the server indicates sentence pauses.
Intent detection: The server returns additional structured information about the speech input. To use Intent you will need to first train a model. See details here.
关于javascript - 如何从 Microsoft 认知服务的 REST 语音识别 API 获取长听写结果?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38309283/