python - Speechmatics 提交一份没有音频参数的作业

我已经使用本文档 with the code 中给出的 API 实现了 SpeechMatics 语音转文本应用程序。如下:

from speechmatics.models import ConnectionSettings
from speechmatics.batch_client import BatchClient
from httpx import HTTPStatusError 

API_KEY = "YOUR_API_KEY"
PATH_TO_FILE = "example.wav"
LANGUAGE = "en"

settings = ConnectionSettings(
    url="https://asr.api.speechmatics.com/v2",
    auth_token=API_KEY,
)

# Define transcription parameters
conf = {
    "type": "transcription",
    "transcription_config": { 
        "language": LANGUAGE 
    }
}

# Open the client using a context manager
with BatchClient(settings) as client:
    try:
        job_id = client.submit_job(
            audio=PATH_TO_FILE,
            transcription_config=conf,
        )
        print(f'job {job_id} submitted successfully, waiting for transcript')

        # Note that in production, you should set up notifications instead of polling. 
        # Notifications are described here: https://docs.speechmatics.com/features-other/notifications
        transcript = client.wait_for_completion(job_id, transcription_format='txt')
        # To see the full output, try setting transcription_format='json-v2'.
        print(transcript)
    except HTTPStatusError:
        print('Invalid API key - Check your API_KEY at the top of the code!')

代码使用文件作为submit_job函数的参数。我想提交一个作业，其中 fetch_data 使用 URL 而不是本地文件。

但是，submit_job 函数需要音频参数。

我只想使用给定的 fetch_data 选项 here并且没有音频参数，如下所示:

conf = {
  "type": "transcription",
  "transcription_config": {
    "language": "en",
    "diarization": "speaker"
  },
  "fetch_data": {
    "url": "${URL}/{FILENAME}"
  }
}

如何使用上面给出的 fetch_data 配置并能够在没有音频文件作为参数的情况下使用 Submit_job 函数？

最佳答案

不幸的是，我认为语音学 python 客户端当前不支持使用 fetch_data 功能。我是 Speechmatics 的一名高级软件工程师，这是我们正在研究的一个已知问题。

可以使用空音频文件将 fetch_data 发送到服务器，但它会被拒绝并出现 400 错误，因为它无法同时接受两个输入，因此目前没有使用 SDK 的解决方案。

但是，SDK 实际上只是 RESTful API 的一个薄包装。可以编写一个简单的 python 脚本，使用 requests 模块来实现相同的功能。我编写了下面的脚本并针对维基媒体音频文件对其进行了测试，结果工作正常。

它只是发送一个基本的http post请求，然后使用job_id轮询作业状态，直到状态完成运行。然后它获取转录本(默认为 json 格式)并将其打印出来(作为原始字符串，而不是 json - 但可以使用 json.loads() 转换为 json)。代码如下:

import requests
import json
import time

API_KEY = "YOUR_API_KEY"
LANGUAGE = "en"
AUDIO_URL = "YOUR_URL"

conf = {
    "type": "transcription",
    "transcription_config": {"language": LANGUAGE, "diarization": "speaker"},
    "fetch_data": {"url": AUDIO_URL},
}

response = requests.post(
    "https://asr.api.speechmatics.com/v2/jobs",
    data={"config": json.dumps(conf).encode()},
    files=dict(config=None),
    headers={"Authorization": f"Bearer {API_KEY}"},
)

print(response.content)
job_id = json.loads(response.content)["id"]

job = requests.get(
    f"https://asr.api.speechmatics.com/v2/jobs/{job_id}",
    headers={"Authorization": f"Bearer {API_KEY}"},
)
status = json.loads(job.content)["job"]["status"]

while status == "running":
    time.sleep(10)
    job = requests.get(
        f"https://asr.api.speechmatics.com/v2/jobs/{job_id}",
        headers={"Authorization": f"Bearer {API_KEY}"},
    )
    status = json.loads(job.content)["job"]["status"]
    print(status)

transcript = requests.get(
    f"https://asr.api.speechmatics.com/v2/jobs/{job_id}/transcript",
    headers={"Authorization": f"Bearer {API_KEY}"},
)
print(transcript.content)

我发送了一个空的"file"字典，以强制请求为 multipart/form-data mime 类型(如果您想知道为什么会出现这种情况，服务器只接受 multipart/form-data)。您可以阅读更多相关信息 here

希望 SDK 能够尽快修复，但目前这是最好的可用方法。希望有帮助!

附注github 中已经存在一个 Unresolved 问题从二月份就开始讨论这个问题，但我们还没有时间讨论它:(

更新 - 23 月 19 日

我们终于开始修复并发布这个错误 - 好极了!您现在应该能够使用 python 客户端获取数据，如上面给出的示例所示，您只需设置 audio=None 即可。以下是使用 wikimedia 文件的示例:

from speechmatics.models import ConnectionSettings
from speechmatics.batch_client import BatchClient
from httpx import HTTPStatusError 

# Define transcription parameters
conf = {
  "type": "transcription",
  "transcription_config": {
    "language": "en",
    "diarization": "speaker"
  },
  "fetch_data": {
    "url": "https://upload.wikimedia.org/wikipedia/commons/8/83/%28eng%29-%28US%29-Man-of-war.wav"
  }
}

# Open the client using a context manager
with BatchClient() as client:
    try:
        job_id = client.submit_job(
            audio=None,
            transcription_config=conf,
        )
        print(f'job {job_id} submitted successfully, waiting for transcript')
        transcript = client.wait_for_completion(job_id, transcription_format='txt')
        print(transcript)
    except HTTPStatusError:
        print('Invalid API key - Check your API_KEY at the top of the code!')

值得注意的是，这个示例还利用了其他一些最近的更改，这就是为什么它的配置步骤比以前的要少。 python 客户端现在将从本地 toml 文件读取身份验证和 url 配置，该文件可以使用 CLI 命令(例如 speechmatics config set --{arg_name} {arg_value})进行设置。也可以按照之前的方式提供配置。

关于python - Speechmatics 提交一份没有音频参数的作业，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/75872125/

python - Speechmatics 提交一份没有音频参数的作业

上一篇：javascript - React 在接收到多个并发事件后如何重新渲染是否有明确定义的行为？

下一篇：google-chrome - 在 Chrome 扩展程序中向 Salesforce 进行身份验证