azure - Azure PullAudioInputStream无法与Twilio Voice一起正确使用

标签 azure audio twilio azure-cognitive-services

我将Twilio Media流与Azure认知服务(语音到文本)集成在一起。我继承了speechsdk.audio.PullAudioInputStreamCallback类,将音频块发送到服务器。

import azure.cognitiveservices.speech as speechsdk
import queue

class SocketReaderCallback(speechsdk.audio.PullAudioInputStreamCallback):

    def __init__(self):
        super().__init__()
        self._q = queue.Queue()

    def read(self, buffer: memoryview) -> int:
        chunk = self._q.get()
        buffer[:len(chunk)] = chunk
        return len(chunk)

    def has_bytes(self):
        return True if self._q.qsize() > 0 else False

    def queueup(self,chunk):
        self._q.put(chunk)

    def close(self):
        print("AZ.Callback.Closed")
下面是转录器类的代码。在这里add_request方法将音频块添加到上述回调类的Queue中。回调类从队列中选择大块并上传到Azure服务器以进行转录。
import azure.cognitiveservices.speech as speechsdk
import queue
from rule_engine.medium.azure_transcribe.azure_calback import SocketReaderCallback

class AzureTranscribe:

    def __init__(self, speech_config, on_response, user_id):
        self._on_response = on_response
        self.callback = SocketReaderCallback()
        wave_format = speechsdk.audio.AudioStreamFormat(samples_per_second=8000, bits_per_sample=8, channels=1)
        self._stream = speechsdk.audio.PullAudioInputStream(self.callback,wave_format)
        audio_config = speechsdk.audio.AudioConfig(stream=self._stream)
        self._speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, language="en-IN", audio_config=audio_config)
        self._ended = False
        self.user_id = user_id
        self.initialize_once()
        self.state = None

    def initialize_once(self):
        # Connect callbacks to the events fired by the speech recognizer
        self._speech_recognizer.recognizing.connect(lambda evt: print('AZ.RECOGNIZING: {}'.format(evt)))
        self._speech_recognizer.recognized.connect(lambda evt: print('AZ.RECOGNIZED: {}'.format(evt)))
        self._speech_recognizer.session_started.connect(lambda evt: print('AZ.SESSION STARTED: {}'.format(evt)))
        self._speech_recognizer.session_stopped.connect(lambda evt: print('AZ.SESSION STOPPED {}'.format(evt)))
        self._speech_recognizer.canceled.connect(lambda evt: print('AZ.CANCELED {}'.format(evt)))
        self._speech_recognizer.start_continuous_recognition()

    def add_request(self, buffer):
        # buffer, self.state =  audioop.ratecv(bytes(buffer), 2, 2, 8000, 16000, self.state)
        self.callback.queueup(bytes(buffer))
    
    def terminate(self):    
        self._ended = True
        self._speech_recognizer.stop_continuous_recognition()
  • 如果我从音频文件上传音频块,则转录为
    准确。
  • 如果我从twilio call 上传音频块,则转录为
    很坏。

  • Twilio's sample rate is 8 kHz while Azure's expected sample rate is 16 kHz. Yet Azure works with both sample rates and provides poor quality transcription for both.

    最佳答案

    请使用语音服务SDK压缩音频输入流API通过PullStream或PushStream将压缩音频流传输到语音服务。
    我们建议您将音频转换为支持的格式。
    •您可以使用FFMpeg进行音频格式转换。音频文件的正确格式为16kHz,16Bit和Mono。正确的目标格式的命令行为:
    ffmpeg.exe -i inputfile.wav -sample_fmt s16 -ac 1 -ar 16000 outputfile.wav
    •文档引用了SoX,请参见https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-test-data#audio-data-for-testing
    请遵循doc以获取压缩的音频输入流。

    关于azure - Azure PullAudioInputStream无法与Twilio Voice一起正确使用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64316086/

    相关文章:

    c# - 使用 IoT 中心发送 C2D 消息时无法设置 ExpiryTimeUtc 字段

    windows - 如果 OnStart() 引发异常,我可以要求 Azure 运行时不要重新启动我的角色吗?

    java - 我的线程恰好循环了 43 次然后抛出异常

    html - Firefox 中音频标签中的 MP3

    firefox - Firefox播放Ogg

    php - 如何使用 PHP REST API Wrapper 调用带有分机号的号码?

    azure - 创建的 AKS 群集没有外部 IP 地址

    azure - 将消息放入队列的代码(可能)有问题

    twilio - Twilio 能否支持 40+ 用户的 session ?

    javascript - 在 twilio 可编程聊天中获取所有新私有(private) channel 的通知