python - 无法在Python中使用youtube API v3下载视频字幕

标签 python youtube-api google-api-python-client youtube-data-api closed-captions

我正在尝试下载此公共(public) YouTube 视频的隐藏式字幕(仅用于测试)https://www.youtube.com/watch?v=Txvud7wPbv4

我正在使用下面的代码示例(captions.py),我从这个链接https://developers.google.com/youtube/v3/docs/captions/download获得

我已经将 client-secrets.json(oauth2 authentification) 和 youtube-v3-api-captions.json 存储在同一目录中(在示例代码中询问)

我将此代码行放入 cmd 中: python Captions.py --videoid='Txvud7wPbv4' --action='download'

我收到此错误: enter image description here 我不知道为什么它无法识别这个公开视频的视频 ID。

有人遇到过类似的问题吗?

提前谢谢大家。

代码示例:

# Usage example:
# python captions.py --videoid='<video_id>' --name='<name>' --file='<file>' --language='<language>' --action='action'

import httplib2
import os
import sys

from apiclient.discovery import build_from_document
from apiclient.errors import HttpError
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client.tools import argparser, run_flow


# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains

# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the {{ Google Cloud Console }} at
# {{ https://cloud.google.com/console }}.
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
#   https://developers.google.com/youtube/v3/guides/authentication
# For more information about the client_secrets.json file format, see:
#   https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
CLIENT_SECRETS_FILE = "client_secrets.json"

# This OAuth 2.0 access scope allows for full read/write access to the
# authenticated user's account and requires requests to use an SSL connection.
YOUTUBE_READ_WRITE_SSL_SCOPE = "https://www.googleapis.com/auth/youtube.force-ssl"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"

# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0

To make this sample run you will need to populate the client_secrets.json file
found at:
   %s
with information from the APIs Console
https://console.developers.google.com

For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
                                   CLIENT_SECRETS_FILE))

# Authorize the request and store authorization credentials.
def get_authenticated_service(args):
  flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, scope=YOUTUBE_READ_WRITE_SSL_SCOPE,
    message=MISSING_CLIENT_SECRETS_MESSAGE)

  storage = Storage("%s-oauth2.json" % sys.argv[0])
  credentials = storage.get()

  if credentials is None or credentials.invalid:
    credentials = run_flow(flow, storage, args)

  # Trusted testers can download this discovery document from the developers page
  # and it should be in the same directory with the code.
  with open("youtube-v3-api-captions.json", "r") as f:
    doc = f.read()
    return build_from_document(doc, http=credentials.authorize(httplib2.Http()))


# Call the API's captions.list method to list the existing caption tracks.
def list_captions(youtube, video_id):
  results = youtube.captions().list(
    part="snippet",
    videoId=video_id
  ).execute()

  for item in results["items"]:
    id = item["id"]
    name = item["snippet"]["name"]
    language = item["snippet"]["language"]
    print "Caption track '%s(%s)' in '%s' language." % (name, id, language)

  return results["items"]


# Call the API's captions.insert method to upload a caption track in draft status.
def upload_caption(youtube, video_id, language, name, file):
  insert_result = youtube.captions().insert(
    part="snippet",
    body=dict(
      snippet=dict(
        videoId=video_id,
        language=language,
        name=name,
        isDraft=True
      )
    ),
    media_body=file
  ).execute()

  id = insert_result["id"]
  name = insert_result["snippet"]["name"]
  language = insert_result["snippet"]["language"]
  status = insert_result["snippet"]["status"]
  print "Uploaded caption track '%s(%s) in '%s' language, '%s' status." % (name,
      id, language, status)


# Call the API's captions.update method to update an existing caption track's draft status
# and publish it. If a new binary file is present, update the track with the file as well.
def update_caption(youtube, caption_id, file):
  update_result = youtube.captions().update(
    part="snippet",
    body=dict(
      id=caption_id,
      snippet=dict(
        isDraft=False
      )
    ),
    media_body=file
  ).execute()

  name = update_result["snippet"]["name"]
  isDraft = update_result["snippet"]["isDraft"]
  print "Updated caption track '%s' draft status to be: '%s'" % (name, isDraft)
  if file:
    print "and updated the track with the new uploaded file."


# Call the API's captions.download method to download an existing caption track.
def download_caption(youtube, caption_id, tfmt):
  subtitle = youtube.captions().download(
    id=caption_id,
    tfmt=tfmt
  ).execute()

  print "First line of caption track: %s" % (subtitle)

# Call the API's captions.delete method to delete an existing caption track.
def delete_caption(youtube, caption_id):
  youtube.captions().delete(
    id=caption_id
  ).execute()

  print "caption track '%s' deleted succesfully" % (caption_id)


if __name__ == "__main__":
  # The "videoid" option specifies the YouTube video ID that uniquely
  # identifies the video for which the caption track will be uploaded.
  argparser.add_argument("--videoid",
    help="Required; ID for video for which the caption track will be uploaded.")
  # The "name" option specifies the name of the caption trackto be used.
  argparser.add_argument("--name", help="Caption track name", default="YouTube for Developers")
  # The "file" option specifies the binary file to be uploaded as a caption track.
  argparser.add_argument("--file", help="Captions track file to upload")
  # The "language" option specifies the language of the caption track to be uploaded.
  argparser.add_argument("--language", help="Caption track language", default="en")
  # The "captionid" option specifies the ID of the caption track to be processed.
  argparser.add_argument("--captionid", help="Required; ID of the caption track to be processed")
  # The "action" option specifies the action to be processed.
  argparser.add_argument("--action", help="Action", default="all")


  args = argparser.parse_args()

  if (args.action in ('upload', 'list', 'all')):
    if not args.videoid:
          exit("Please specify videoid using the --videoid= parameter.")

  if (args.action in ('update', 'download', 'delete')):
    if not args.captionid:
          exit("Please specify captionid using the --captionid= parameter.")

  if (args.action in ('upload', 'all')):
    if not args.file:
      exit("Please specify a caption track file using the --file= parameter.")
    if not os.path.exists(args.file):
      exit("Please specify a valid file using the --file= parameter.")

  youtube = get_authenticated_service(args)
  try:
    if args.action == 'upload':
      upload_caption(youtube, args.videoid, args.language, args.name, args.file)
    elif args.action == 'list':
      list_captions(youtube, args.videoid)
    elif args.action == 'update':
      update_caption(youtube, args.captionid, args.file);
    elif args.action == 'download':
      download_caption(youtube, args.captionid, 'srt')
    elif args.action == 'delete':
      delete_caption(youtube, args.captionid);
    else:
      # All the available methods are used in sequence just for the sake of an example.
      upload_caption(youtube, args.videoid, args.language, args.name, args.file)
      captions = list_captions(youtube, args.videoid)

      if captions:
        first_caption_id = captions[0]['id'];
        update_caption(youtube, first_caption_id, None);
        download_caption(youtube, first_caption_id, 'srt')
        delete_caption(youtube, first_caption_id);
  except HttpError, e:
    print "An HTTP error %d occurred:\n%s" % (e.resp.status, e.content)
  else:
    print "Created and managed caption tracks."

最佳答案

您的应用程序似乎过于复杂...它的结构能够执行通过字幕可以完成的所有操作,而不仅仅是下载。这使得调试变得更加困难,所以我编写了一个删节版(Python 2 或 3),它只下载并显示字幕:

更新示例(2022 年 5 月)(新的 Python 身份验证库)

from __future__ import print_function
import os

from google.auth.transport.requests import Request
from google.oauth2 import credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient import discovery

creds = None
SCOPES = 'https://www.googleapis.com/auth/youtube.force-ssl'
TOKENS = 'storage.json'
if os.path.exists(TOKENS):
    creds = credentials.Credentials.from_authorized_user_file(TOKENS)
if not (creds and creds.valid):
    if creds and creds.expired and creds.refresh_token:
        creds.refresh(Request())
    else:
        flow = InstalledAppFlow.from_client_secrets_file('client_secret.json', SCOPES)
        creds = flow.run_local_server()
with open(TOKENS, 'w') as token:
    token.write(creds.to_json())
YOUTUBE = discovery.build('youtube', 'v3', credentials=creds)

def process(vid):
    caption_info = YOUTUBE.captions().list(part='id',
            videoId=vid).execute().get('items', [])
    caption_str = YOUTUBE.captions().download(id=caption_info[0]['id'],
            tfmt='srt').execute().decode('utf-8')
    caption_data = caption_str.split('\n\n')
    for line in caption_data:
        if line.count('\n') > 1:
            i, timecode, caption = line.split('\n', 2)
            print('%02d) [%s] %s' % (
                    int(i), timecode, ' '.join(caption.split())))

if __name__ == '__main__':
    import sys
    if len(sys.argv) == 2:
        vid = sys.argv[1]
        process(vid)
    else:
        print('Usage: python captions-download.py VIDEO_ID')

原始示例(2017 年 3 月)

from __future__ import print_function

from googleapiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/youtube.force-ssl'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
YOUTUBE = discovery.build('youtube', 'v3', http=creds.authorize(Http()))

def process(vid):
    caption_info = YOUTUBE.captions().list(
            part='id', videoId=vid).execute().get('items', [])
    caption_str = YOUTUBE.captions().download(
            id=caption_info[0]['id'], tfmt='srt').execute()
    caption_data = caption_str.split('\n\n')
    for line in caption_data:
        if line.count('\n') > 1:
            i, cap_time, caption = line.split('\n', 2)
            print('%02d) [%s] %s' % (
                    int(i), cap_time, ' '.join(caption.split())))

if __name__ == '__main__':
    import sys
    if len(sys.argv) == 2:
        vid = sys.argv[1]
        process(vid)
    else:
        print('Usage: python captions-download.py VIDEO_ID')

它的工作原理是这样的:

  1. 您传入视频 ID (VID) 作为唯一参数 ( sys.argv[1] )
  2. 它使用该 VID 来查找标题 ID YOUTUBE.captions().list()
  3. 假设视频(至少)有一个字幕轨道,我会获取其 ID ( caption_info[0]['id'] )
  4. 然后它调用 YOUTUBE.captions().download()带有请求 srt 的标题 ID track format
  5. 所有单独的标题均由双换行符分隔,因此请分开
  6. 循环遍历每个标题;如果该行中至少有 2 个 NEWLINE 则有数据,因此只有 split()第一对
  7. 显示标题#、其出现的时间线,然后显示标题本身,将所有剩余的 NEWLINE 更改为空格

当我运行它时,我得到了预期的结果......在我拥有的视频上:

$ python captions-download.py MY_VIDEO_ID
01) [00:00:06,390 --> 00:00:09,280] iterator cool but that's cool
02) [00:00:09,280 --> 00:00:12,280] your the moment
03) [00:00:13,380 --> 00:00:16,380] and sellers very thrilled
    :

有几件事...

  1. 我认为您需要成为您尝试下载字幕的视频的所有者。
  • 我在您的视频上尝试了我的脚本,但收到 403 HTTP Forbidden 错误
  • 这里是other errors您可以从 API 获取
  1. 就您的情况而言,您传入的视频 ID 似乎有些困惑。
  • 它认为您正在给予它 <code></code> (注意十六进制 0x3c 和 0x3e 值)...富文本?
  • 无论如何,这就是我编写自己的较短版本的原因......这样我就有了一个更受控的环境来进行实验。

FWIW,由于您不熟悉使用 Google API,因此我在 this playlist 中制作了一些介绍视频,以便让开发人员开始使用 Google API。 。授权码是最难的,因此请重点关注该播放列表中的视频 3 和 4,以帮助您适应。

我确实没有任何涉及 YouTube API 的视频(因为我更关注 G Suite API),尽管我有一个 Google Apps Script示例(播放列表中的视频 22);如果您不熟悉 Apps 脚本,则需要先回顾一下您的 JavaScript,然后先观看视频 5。希望这有帮助!

关于python - 无法在Python中使用youtube API v3下载视频字幕,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41935427/

相关文章:

python - Django 教程中的 choice_set.all 是什么

python - 为什么 twistd 不能从当前工作目录中导入模块?

Youtube - 如何获取 YouTube channel 中的视频列表并将它们显示为列表?

android - 在对 Python 的 Android 后端调用中验证 Id token

python - 来自 Colab 的 Google API 问题

python - 从 keras.backend.tensorflow_backend 导入 set_session

python - gzip.open ("file.tar.gz"、 "rb") 与 tarfile.open ("file.tar.gz");提取所有()

php - 如何从 YouTube API 获取 YouTube 视频缩略图?

ruby-on-rails - 从 Ruby on Rails 应用程序上传视频到 youtube

python - OAuth 2.0 服务器到服务器凭据授权失败(Search Console - 网站管理员工具)