google-speech-api - 第400章指定MP3编码来匹配音频文件

我正在尝试使用 google-speech2text api，但是，即使我已将代码设置为遍历所有可用的编码器，我仍然收到“指定 MP3 编码以匹配音频文件”的消息。

This是我尝试使用的文件

我必须添加，如果我将文件上传到 their UI我可以获得输出。所以我认为源文件没有任何问题。

from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')

speech_file = 'chunk7.mp3'

import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types


with io.open(speech_file, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

import wave

ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16, 
            enums.RecognitionConfig.AudioEncoding.FLAC,
            enums.RecognitionConfig.AudioEncoding.MULAW,
            enums.RecognitionConfig.AudioEncoding.AMR,
            enums.RecognitionConfig.AudioEncoding.AMR_WB,
            enums.RecognitionConfig.AudioEncoding.OGG_OPUS, 
            enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]

SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
    for rate in SAMPLE_RATE_HERTZ:
        config = types.RecognitionConfig(
            encoding=enco,
            sample_rate_hertz=rate,
            language_code='en-US')

        # Detects speech in the audio file
        response = []

        print(response)
        try:
            response = client.recognize(config, audio)
            print(response)
        except:
            pass
        print("-----------------------------------------------------")
        print(str(rate) + "   " + str(enco))
        print("response: ", str(response))

或者，还有另一个文件 here波斯语(“fa-IR”)-我面临类似的问题。我最初放了奥巴马的文件，因为它更容易理解。如果您也用第二个文件测试您的答案，我将不胜感激。

最佳答案

您似乎将 encoding 设置为等于 API 提供的所有可能属性。我发现:

encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED

适用于 mp3 文件。所以试试这个:

from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
speech_file = 'chunk7.mp3'


def sample_recognize(local_file_path):
    """
    Transcribe a short audio file using synchronous speech recognition

    Args:
      local_file_path Path to local audio file, e.g. /path/audio.wav
    """

    client = speech_v1.SpeechClient()

    # local_file_path = 'resources/brooklyn_bridge.raw'

    # The language of the supplied audio
    language_code = "en-US"

    # Sample rate in Hertz of the audio data sent
    sample_rate_hertz = 16000   
    # If this fails try sample_rate_hertz = [8000, 12000, 16000, 24000, 48000]


    # Encoding of audio data sent. This sample sets this explicitly.
    # This field is optional for FLAC and WAV audio formats.
    encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
    config = {
        "language_code": language_code,
        "sample_rate_hertz": sample_rate_hertz,
        "encoding": encoding,
    }
    with io.open(local_file_path, "rb") as f:
        content = f.read()
    audio = {"content": content}

    response = client.recognize(config, audio)
    for result in response.results:
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        print(u"Transcript: {}".format(alternative.transcript))

sample_recognize(speech_file)

上面的代码是对 speech-to-text docs 中的示例稍作修改。。如果这不起作用，请尝试更深入地研究 encoding文档和best practices 。祝你好运。

关于google-speech-api - 第400章指定MP3编码来匹配音频文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57501402/

google-speech-api - 第400章指定MP3编码来匹配音频文件

上一篇：sqlite - Delphi SqLite 日期加载到 TDateEdit 错误

下一篇：.net - 是否可以向 ExpandoObject 实例的生成成员添加属性？

google-speech-api - 第400章 指定MP3编码来匹配音频文件

上一篇：sqlite - Delphi SqLite 日期加载到 TDateEdit 错误

下一篇：.net - 是否可以向 ExpandoObject 实例的生成成员添加属性？

google-speech-api - 第400章指定MP3编码来匹配音频文件