node.js - 将 mediarecorder blob 转换为谷歌语音到文本可以转录的类型

我正在制作一个应用程序，其中用户浏览器记录用户说话并将其发送到服务器，然后将其传递给谷歌语音到文本界面。我正在使用 mediaRecorder 获取发送到服务器的 1 秒 blob。在服务器端，我将这些 blob 发送到 Google 语音到文本界面。但是，我得到一个空的转录。

我知道问题是什么。 Mediarecorder 的默认 Mime Type id audio/WebM codec=opus，不被 google 的语音转文本 API 接受。在做了一些研究之后，我意识到我需要使用 ffmpeg 将 blob 转换为 LInear16。但是，ffmpeg 只接受音频文件，我希望能够转换 BLOBS。然后我可以将生成的转换后的 blob 发送到 API 接口(interface)。

server.js

wsserver.on('connection', socket => {
    console.log("Listening on port 3002")
    audio = {
        content: null
    }
  socket.on('message',function(message){
        // const buffer = new Int16Array(message, 0, Math.floor(data.byteLength / 2));
        // console.log(`received from a client: ${new Uint8Array(message)}`);
        // console.log(message);
        audio.content = message.toString('base64')
        console.log(audio.content);
        livetranscriber.createRequest(audio).then(request => {
            livetranscriber.recognizeStream(request);
        });


  });
});

实时转录员

module.exports = {
    createRequest: function(audio){
        const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';
        return new Promise((resolve, reject, err) =>{
            if (err){
                reject(err)
            }
            else{
                const request = {
                    audio: audio,
                    config: {
                      encoding: encoding,
                      sampleRateHertz: sampleRateHertz,
                      languageCode: languageCode,
                    },
                    interimResults: false, // If you want interim results, set this to true
                  };
                  resolve(request);
            }
        });

    },
    recognizeStream: async function(request){
        const [response] = await client.recognize(request)
        const transcription = response.results
            .map(result => result.alternatives[0].transcript)
            .join('\n');
        console.log(`Transcription: ${transcription}`);
        // console.log(message);
        // message.pipe(recognizeStream);
    },

}

客户

 recorder.ondataavailable = function(e) {
            console.log('Data', e.data);

            var ws = new WebSocket('ws://localhost:3002/websocket');
            ws.onopen = function() {
              console.log("opening connection");

              // const stream = websocketStream(ws)
              // const duplex = WebSocket.createWebSocketStream(ws, { encoding: 'utf8' });
              var blob = new Blob(e, { 'type' : 'audio/wav; base64' });
              ws.send(blob.data);
              // e.data).pipe(stream); 
              // console.log(e.data);
              console.log("Sent the message")
            };

            // chunks.push(e.data);
            // socket.emit('data', e.data);
        }

最佳答案

几年前我写了一个类似的脚本。但是，我使用了 JS 前端和 Python 后端而不是 NodeJS。我记得使用 sox 转换器将音频输入转换为 Google Speech API 可以使用的输出。
也许这可能对你有用。
https://github.com/bitnahian/speech-transcriptor/blob/9f186e5416566aa8a6959fc1363d2e398b902822/app.py#L27
TLDR:
使用 ffmpeg 和 sox 从 .wav 格式转换为 .raw 格式。

关于node.js - 将 mediarecorder blob 转换为谷歌语音到文本可以转录的类型，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57159487/

node.js - 将 mediarecorder blob 转换为谷歌语音到文本可以转录的类型

上一篇：node.js - NPM 未满足对我尝试安装的每个软件包的对等依赖

下一篇：node.js - 使用 node-vault 访问 HashiCorp Vault KV key