google-speech-api - 第400章 指定MP3编码来匹配音频文件

标签 google-speech-api google-cloud-speech

我正在尝试使用 google-speech2text api,但是,即使我已将代码设置为遍历所有可用的编码器,我仍然收到“指定 MP3 编码以匹配音频文件”的消息。

This是我尝试使用的文件

我必须添加,如果我将文件上传到 their UI我可以获得输出。所以我认为源文件没有任何问题。

from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')

speech_file = 'chunk7.mp3'

import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types


with io.open(speech_file, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

import wave

ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16, 
            enums.RecognitionConfig.AudioEncoding.FLAC,
            enums.RecognitionConfig.AudioEncoding.MULAW,
            enums.RecognitionConfig.AudioEncoding.AMR,
            enums.RecognitionConfig.AudioEncoding.AMR_WB,
            enums.RecognitionConfig.AudioEncoding.OGG_OPUS, 
            enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]

SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
    for rate in SAMPLE_RATE_HERTZ:
        config = types.RecognitionConfig(
            encoding=enco,
            sample_rate_hertz=rate,
            language_code='en-US')

        # Detects speech in the audio file
        response = []

        print(response)
        try:
            response = client.recognize(config, audio)
            print(response)
        except:
            pass
        print("-----------------------------------------------------")
        print(str(rate) + "   " + str(enco))
        print("response: ", str(response))

或者,还有另一个文件 here波斯语(“fa-IR”)-我面临类似的问题。我最初放了奥巴马的文件,因为它更容易理解。如果您也用第二个文件测试您的答案,我将不胜感激。

最佳答案

您似乎将 encoding 设置为等于 API 提供的所有可能属性。我发现:

encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED

适用于 mp3 文件。所以试试这个:

from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
speech_file = 'chunk7.mp3'


def sample_recognize(local_file_path):
    """
    Transcribe a short audio file using synchronous speech recognition

    Args:
      local_file_path Path to local audio file, e.g. /path/audio.wav
    """

    client = speech_v1.SpeechClient()

    # local_file_path = 'resources/brooklyn_bridge.raw'

    # The language of the supplied audio
    language_code = "en-US"

    # Sample rate in Hertz of the audio data sent
    sample_rate_hertz = 16000   
    # If this fails try sample_rate_hertz = [8000, 12000, 16000, 24000, 48000]


    # Encoding of audio data sent. This sample sets this explicitly.
    # This field is optional for FLAC and WAV audio formats.
    encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
    config = {
        "language_code": language_code,
        "sample_rate_hertz": sample_rate_hertz,
        "encoding": encoding,
    }
    with io.open(local_file_path, "rb") as f:
        content = f.read()
    audio = {"content": content}

    response = client.recognize(config, audio)
    for result in response.results:
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        print(u"Transcript: {}".format(alternative.transcript))

sample_recognize(speech_file)

上面的代码是对 speech-to-text docs 中的示例稍作修改。 。如果这不起作用,请尝试更深入地研究 encoding文档和best practices 。祝你好运。

关于google-speech-api - 第400章 指定MP3编码来匹配音频文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57501402/

相关文章:

python - 为什么我无法访问 Google 语音请求的结果?

python-2.7 - 导入错误 : No module named google. 云

python - 如何从Python服务器将谷歌语音识别结果发送到Web客户端

java - Google 语音 API 凭据

google-cloud-speech - 有没有办法使用 Google Cloud Speech 生成 SRT 文件(或类似文件)?

python - Pycharm:为谷歌服务帐户 key (json 凭据)设置环境变量

google-cloud-speech - 如何获取 Google Cloud Speech-to-Text 转换后的音频持续时间

node.js - 在 Firebase CLI 中激活 Google Speech API?

java - GRPC : call was half-closed error

go - 有没有办法通过谷歌云语音记录添加业务特定的元数据与存储在谷歌云上的音频文件一起存储?