2021 年 3 月 1 日,Google Text-to-speech released beta features ,包括对 ssml 的支持 <voice>
用 name
标记或 lang
属性。
我希望使用这些测试版功能,但我不知道它们发布到哪个 channel 或如何访问它们。我没有在文档中找到任何可以引导我找到它们的面包屑。
我注意到在 TTS product home page 上, 演示功能使用 v1beta1
, 但不支持 <voice>
标签。
Screenshot of json from TTS demo stripping out the voice tag
也就是说,对于 ssml:
<speak>
Blah Blah English Text. <voice name="ko-KR-Wavenet-D"> Blah Blah Korean Text.</voice> <break time="400ms" /> Blah Blah English Text.
</speak>
演示显示以下 json 请求正文:
{
"audioConfig": {
"audioEncoding": "LINEAR16",
"pitch": 0,
"speakingRate": 1
},
"input": {
"ssml": "<speak> Blah Blah English Text. Blah Blah Korean Text. <break time=\"400ms\" /> Blah Blah English Text. </speak>"
},
"voice": {
"languageCode": "en-US",
"name": "en-US-Wavenet-D"
}
}
我们的尝试
在我们自己的脚本中,使用 google 文本到语音转换 api 从 csv 提示表生成音频,我们一直使用通用版本。当我们更改为 v1beta1
时脚本仍然有效,但是 <voice>
标签仍然不起作用。我们正在使用符号链接(symbolic link)到 nodejs-text-to-speech 的 npm 包大师。
我们的脚本使用:
const textToSpeech = require('@google-cloud/text-to-speech');
和
一般发布const client = new textToSpeech.TextToSpeechClient();
我们一直在尝试访问 3 月 1 日的测试版功能
const client = new textToSpeech.v1beta1.TextToSpeechClient();
最佳答案
根据 Text-to-Speech API 的发行说明 <voice>
标签按预期工作。我尝试使用 Node.js 客户端库 在我这边复制场景,它按预期工作。
SSML的文档说 <voice>
标签允许您在单个 SSML 请求中使用多个语音。在我的代码中,我使用默认语音作为英语男性,而对于另一种语音,我使用了 <voice name="hi-IN-Wavenet-D">
。这是一个女声,我的 output.mp3 文件中有两种不同的声音。
您可以引用下面的Node.js代码和输出音频文件。
tts1.js
// Imports the Google Cloud client library
const textToSpeech = require('@google-cloud/text-to-speech');
// Import other required libraries
const fs = require('fs');
const util = require('util');
// Creates a client
const client = new textToSpeech.v1beta1.TextToSpeechClient();
async function quickStart() {
// The text to synthesize
const ssml = '<speak>And then she asked, <voice name="en-IN-Wavenet-D"> where were you yesterday </voice><break time="250ms"/> in her sweet and gentle voice.</speak>'
// Construct the request
const request = {
input: {ssml: ssml},
// Select the language and SSML voice gender (optional)
voice: {languageCode: 'en-US', ssmlGender: 'NEUTRAL'},
// select the type of audio encoding
audioConfig: {audioEncoding: 'MP3'},
};
// Performs the text-to-speech request
const [response] = await client.synthesizeSpeech(request);
// Write the binary audio content to a local file
const writeFile = util.promisify(fs.writeFile);
await writeFile('output.mp3', response.audioContent, 'binary');
console.log('Audio content written to file: output.mp3');
}
quickStart();
输出 mp3 文件: output1 (using v1beta1)
我也尝试过在 node.js 中不使用 v1beta1 版本,它工作正常。
tts2.js:
// Imports the Google Cloud client library
const textToSpeech = require('@google-cloud/text-to-speech');
// Import other required libraries
const fs = require('fs');
const util = require('util');
// Creates a client
const client = new textToSpeech.TextToSpeechClient();
async function quickStart() {
// The text to synthesize
const ssml = '<speak>And then she asked, <voice name="en-IN-Wavenet-D"> where were you yesterday </voice><break time="250ms"/> in her sweet and gentle voice.</speak>'
// Construct the request
const request = {
input: {ssml: ssml},
// Select the language and SSML voice gender (optional)
voice: {languageCode: 'en-US', ssmlGender: 'NEUTRAL'},
// select the type of audio encoding
audioConfig: {audioEncoding: 'MP3'},
};
// Performs the text-to-speech request
const [response] = await client.synthesizeSpeech(request);
// Write the binary audio content to a local file
const writeFile = util.promisify(fs.writeFile);
await writeFile('output.mp3', response.audioContent, 'binary');
console.log('Audio content written to file: output.mp3');
}
quickStart();
输出 mp3 文件: output (without v1beta1 version)
除此之外,我想通知您,我也尝试过使用 Python 客户端库,它也按预期工作。
file1.py
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(
ssml= '<speak>And then she asked, <voice name="en-IN-Wavenet-D"> where were you yesterday</voice><break time="250ms"/> in her sweet and gentle voice.</speak>'
)
# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)
# Select the type of audio file you want returned
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# The response's audio_content is binary.
with open("output.mp3", "wb") as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
输出文件: output (using Python)
关于beta - 如何访问 Google 文本转语音测试版功能(2021 年 3 月 1 日发布),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66712081/