我正在尝试使用 google-speech2text api,但是,即使我已将代码设置为遍历所有可用的编码器,我仍然收到“指定 MP3 编码以匹配音频文件”的消息。
This是我尝试使用的文件
我必须添加,如果我将文件上传到 their UI我可以获得输出。所以我认为源文件没有任何问题。
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')
speech_file = 'chunk7.mp3'
import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
import wave
ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16,
enums.RecognitionConfig.AudioEncoding.FLAC,
enums.RecognitionConfig.AudioEncoding.MULAW,
enums.RecognitionConfig.AudioEncoding.AMR,
enums.RecognitionConfig.AudioEncoding.AMR_WB,
enums.RecognitionConfig.AudioEncoding.OGG_OPUS,
enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]
SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
for rate in SAMPLE_RATE_HERTZ:
config = types.RecognitionConfig(
encoding=enco,
sample_rate_hertz=rate,
language_code='en-US')
# Detects speech in the audio file
response = []
print(response)
try:
response = client.recognize(config, audio)
print(response)
except:
pass
print("-----------------------------------------------------")
print(str(rate) + " " + str(enco))
print("response: ", str(response))
或者,还有另一个文件 here波斯语(“fa-IR”)-我面临类似的问题。我最初放了奥巴马的文件,因为它更容易理解。如果您也用第二个文件测试您的答案,我将不胜感激。
最佳答案
您似乎将 encoding
设置为等于 API 提供的所有可能属性。我发现:
encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
适用于 mp3 文件。所以试试这个:
from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
speech_file = 'chunk7.mp3'
def sample_recognize(local_file_path):
"""
Transcribe a short audio file using synchronous speech recognition
Args:
local_file_path Path to local audio file, e.g. /path/audio.wav
"""
client = speech_v1.SpeechClient()
# local_file_path = 'resources/brooklyn_bridge.raw'
# The language of the supplied audio
language_code = "en-US"
# Sample rate in Hertz of the audio data sent
sample_rate_hertz = 16000
# If this fails try sample_rate_hertz = [8000, 12000, 16000, 24000, 48000]
# Encoding of audio data sent. This sample sets this explicitly.
# This field is optional for FLAC and WAV audio formats.
encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
config = {
"language_code": language_code,
"sample_rate_hertz": sample_rate_hertz,
"encoding": encoding,
}
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
response = client.recognize(config, audio)
for result in response.results:
# First alternative is the most probable result
alternative = result.alternatives[0]
print(u"Transcript: {}".format(alternative.transcript))
sample_recognize(speech_file)
上面的代码是对 speech-to-text docs 中的示例稍作修改。 。如果这不起作用,请尝试更深入地研究 encoding文档和best practices 。祝你好运。
关于google-speech-api - 第400章 指定MP3编码来匹配音频文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57501402/