javascript - 如何从 Microsoft 认知服务的 REST 语音识别 API 获取长听写结果?

标签 javascript speech-recognition azure-cognitive-services

我能够从 Bing 语音识别的 REST API 获得简短的听写答案。我的目标是获得超过 15-30 秒(又名长听写模式)的音频文件的响应。因此,我为获得简短答案所做的事情如下(我正在开发一个 HTML uwp 应用程序):

  1. 从音频文件 (wav) 生成 ArrayBuffer
  2. 通过访问 token 进行身份验证
  3. 使用以下设置将音频数据发送到 REST API:
var accessToken = [[accessTocken]];
var url = 'https://speech.platform.bing.com/recognize?'; 
var params = {
    'version': '3.0',
    'format': 'json',
    'locale': 'en-US',
    'device.os': 'Windows OS',
    'scenarios': 'smd',
    'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
    'requestid': guid(),
    'instanceid': guid()
};
var options = {
    url: url + $.param(params),
    type: "POST",
    headers: {
        'Authorization': 'Bearer ' + accessToken,
        'Content-Type': 'audio/wav; samplerate=16000'
    },
    data: data
};
return WinJS.xhr(options);

所以这有效!但是对于长听写场景我该如何做到这一点?

请不要引用 JavaScript GitHub 存储库 https://github.com/microsoft/Cognitive-Speech-STT-Javascript 。这仅适用于简短听写,并且不适用于 Edge 浏览器。

最佳答案

来自 API 文档:

Your application must endpoint the audio to determine start and end of speech, which in turn is used by the service to determine the start and end of the request. You may not upload more than 10 seconds of audio in any one request and the total request duration cannot exceed 14 seconds.

引用: https://www.microsoft.com/cognitive-services/en-us/Speech-api/documentation/API-Reference-REST/BingVoiceRecognition

也许您需要实现客户端库才能使用不同的模式。

ShortPhrase mode: an utterance up to 15 seconds long. As data is sent to the server, the client will receive multiple partial results and one final multiple N-best choice result.

LongDictation mode: an utterance up to 2 minutes long. As data is sent to the server, the client will receive multiple partial results and multiple final results, based on where the server indicates sentence pauses.

Intent detection: The server returns additional structured information about the speech input. To use Intent you will need to first train a model. See details here.

引用: https://www.microsoft.com/cognitive-services/en-us/Speech-api/documentation/GetStarted/GetStartedCSharpDesktop

关于javascript - 如何从 Microsoft 认知服务的 REST 语音识别 API 获取长听写结果?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38309283/

相关文章:

Android startActivityForResult 请求代码与启动谷歌语音到文本时给出的不同

python - 几分钟后python音频监听器质量下降

api - Bing 图像搜索 API 返回重复结果

javascript - 从vue中声明的组件获取数据

audio - 检测用户的音频并将其转换为文本以在Unity中命令AI机器人

azure - 您还没有该区域的 key

python - 如何在生成样本时阻止 Azure TTS 播放音频?

javascript - Cordova Android - 禁用定位服务时,getCurrentPosition 不会触发错误回调

javascript - Modal快速消失+Bootstrap+Javascript

javascript - 如何使用 JQuery 仅获取 'clip' CSS 属性中的一个参数?