json - 如何使用 youtube-dl 从播放列表中的 Youtube 视频中提取上传日期、标题、URL 和持续时间?

标签 json python-3.x windows youtube youtube-dl

我正在尝试提取 Upload Dates , Titles , URLsDurations来自特定播放列表的所有 Youtube 视频 youtube-dl ,我不需要视频——只需要上面的数据。
到目前为止,我已经测试了 Alen Paul Varghese 在此处建议的以下两种方法:
Youtube-dl's GitHub Doc Used as reference
The Playlist URL used for testing
方法#1

youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD > example.json

方法#2
youtube-dl --get-upload_date https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD > example.txt
方法#1 输出一个完整的 json 转储——每个视频大约 3000 行——处理大量 Youtube 视频播放列表非常不方便——但它包含 4 个所需的数据。
方法#2 返回以下错误:
youtube-dl: error: no such option: --get-upload_date
我要宠方法#2 将输出数据限制为仅需要的数据( upload datesTitlesURLsDurations ),遵循 Alen Paul Varghese's 2nd suggestion并在检查 upload_date 后是有效的 youtube-dl选项在这里Youtube-dl's GitHub Doc Used as reference .
为什么不upload_data选项得到验证?
有什么替代方案可以限制数据?
我非常感谢您提供的有用建议。
这是 json 转储文件:
example.json

编辑(感谢@PIERPY 伟大的指导 -
完整记录的免费过程 -
对其他人有帮助):

我已经成功安装 Chocolatey NuGetAdmin CMD使用 chocolatey install jq 安装 jq 1.5按照 Download jq - Windows 的要求
我的 Chocolatey NuGet安装输出:
    Microsoft Windows [Version 10.0.19042.867]
(c) 2020 Microsoft Corporation. All rights reserved.
C:\WINDOWS\system32>@"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))" && SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"                                                         
Forcing web requests to allow TLS v1.2 (Required for requests to Chocolatey.org)                                        
Getting latest version of the Chocolatey package for download.                                                          
Not using proxy.
Getting Chocolatey from https://community.chocolatey.org/api/v2/package/chocolatey/0.10.15.
Downloading https://community.chocolatey.org/api/v2/package/chocolatey/0.10.15 to C:\Users\###\AppData\Local\Temp\chocolatey\chocoInstall\chocolatey.zip
Not using proxy.
Extracting C:\Users\###\AppData\Local\Temp\chocolatey\chocoInstall\chocolatey.zip to C:\Users\###\AppData\Local\Temp\chocolatey\chocoInstall
Installing Chocolatey on the local machine
Creating ChocolateyInstall as an environment variable (targeting 'Machine')
  Setting ChocolateyInstall to 'C:\ProgramData\chocolatey'
WARNING: It's very likely you will need to close and reopen your shell
  before you can use choco.
Restricting write permissions to Administrators
We are setting up the Chocolatey package repository.
The packages themselves go to 'C:\ProgramData\chocolatey\lib'
  (i.e. C:\ProgramData\chocolatey\lib\yourPackageName).
A shim file for the command line goes to 'C:\ProgramData\chocolatey\bin'
  and points to an executable in 'C:\ProgramData\chocolatey\lib\yourPackageName'.

Creating Chocolatey folders if they do not already exist.

WARNING: You can safely ignore errors related to missing log files when
  upgrading from a version of Chocolatey less than 0.9.9.
  'Batch file could not be found' is also safe to ignore.
  'The system cannot find the file specified' - also safe.
chocolatey.nupkg file not installed in lib.
 Attempting to locate it from bootstrapper.
PATH environment variable does not have C:\ProgramData\chocolatey\bin in it. Adding...
WARNING: Not setting tab completion: Profile file does not exist at 'C:\Users\###\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1'.
Chocolatey (choco.exe) is now ready.
You can call choco from anywhere, command line or powershell by typing choco.
Run choco /? for a list of functions.
You may need to shut down and restart powershell and/or consoles
 first prior to using choco.
Ensuring Chocolatey commands are on the path
Ensuring chocolatey.nupkg is in the lib folder

C:\WINDOWS\system32>
然后我跑了chocolatey install jq并成功安装:
我的 jq安装输出:
    C:\WINDOWS\system32>chocolatey install jq
Chocolatey v0.10.15
Installing the following packages:
jq
By installing you accept licenses for the packages.
Progress: Downloading jq 1.6... 100%

jq v1.6 [Approved]
jq package files install completed. Performing other installation steps.
The package jq wants to run 'chocolateyinstall.ps1'.
Note: If you don't run this script, the installation will fail.
Note: To confirm automatically next time, use '-y' or consider:
choco feature enable -n allowGlobalConfirmation
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): Y

Downloading jq 64 bit
  from 'https://github.com/stedolan/jq/releases/download/jq-1.6/jq-win64.exe'
Progress: 100% - Completed download of C:\ProgramData\chocolatey\lib\jq\tools\jq.exe (3.36 MB).
Download of jq.exe (3.36 MB) completed.
Hashes match.
C:\ProgramData\chocolatey\lib\jq\tools\jq.exe
 ShimGen has successfully created a shim for jq.exe
 The install of jq was successful.
  Software install location not explicitly set, could be in package or
  default install location if installer.

Chocolatey installed 1/1 packages.
 See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).
然后我跑了你的@pierpy youtube-dl 命令:
youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq '{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}'
并得到一个语法错误,输出如下:
    Microsoft Windows [Version 10.0.19042.867]
(c) 2020 Microsoft Corporation. All rights reserved.

C:\Users\###>cd documents

C:\Users\###\Documents>cd youtube-dl

C:\Users\###\Documents\youtube-dl>youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq '{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}'
jq: error: syntax error, unexpected INVALID_CHARACTER, expecting $end (Windows cmd shell quoting issues?) at <top-level>, line 1:
'{date:
jq: 1 compile error
Traceback (most recent call last):
  File "__main__.py", line 19, in <module>
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\__init__.py", line 475, in main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\__init__.py", line 465, in _real_main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 2060, in download
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 799, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 806, in wrapper
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 838, in __extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 924, in process_ie_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1058, in __process_playlist
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 806, in wrapper
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1068, in __process_iterable_entry
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 910, in process_ie_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 872, in process_ie_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1683, in process_video_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1793, in process_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 1765, in __forced_printings
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 520, in to_stdout
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\YoutubeDL.py", line 509, in _write_string
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpwt56m8wg\build\youtube_dl\utils.py", line 3180, in write_string
OSError: [Errno 22] Invalid argument

C:\Users\###\Documents\youtube-dl>
然后我用谷歌搜索了错误 jq: error: syntax error, unexpected INVALID_CHARACTER, expecting $end (Windows cmd shell quoting issues?) 并从这个建议中找到了见解:
It's all about the quoting
然后我相应地调整了您的 @pierpy youtube-dl 命令单引号到双引号:
youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}"
现在它输出数据 Upload Dates , Titles , URLsDurations刚好需要。
最终输出:
C:\Users\###\Documents\youtube-dl>youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}"
{
  "date": "20150717",
  "title": "3.1: Flow (setup and draw) - Processing Tutorial",
  "URL": "https://r1---sn-n0ogpnx-b85s.googlevideo.com/videoplayback?expire=1617730292&ei=lEZsYKDoEZmAp-oP3ayk8AI&ip=188.154.162.181&id=o-AHFxnOR5c5xqmgtu1JG4FbL6lJW0gz1pJQN77cr2-27T&itag=22&source=youtube&requiressl=yes&mh=m6&mm=31%2C29&mn=sn-n0ogpnx-b85s%2Csn-1gieen7e&ms=au%2Crdu&mv=m&mvi=1&pl=23&initcwndbps=1578750&vprv=1&mime=video%2Fmp4&ns=r3pR-nwt6FkDQa33iQQu-qgF&ratebypass=yes&dur=944.007&lmt=1607684088067796&mt=1617708538&fvip=5&fexp=24001373%2C24007246&beids=9466585&c=WEB&txp=5432434&n=3P6HQoLfY8ktFLG5&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAMiNOv8QDjfsn7yxicEOtSjcEYjZlX3CfrI8D-HGBd63AiEA4E6rKv_kYti6rAeieJzPAdTYjoh05Az_11Kcxt-0jBg%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRAIgD43F71OxMExfQyN9FeNWfZX_aiGAD3SKlKOLNR14NT8CICEuD_Ry0oymKZmFfHuP4F6v9MKCrmRI0x27sLG8fvyG",
  "duration": 944
}
{
  "date": "20150717",
  "title": "3.2: Built-in Variables (mouseX, mouseY) - Processing Tutorial",
  "URL": "https://r4---sn-n0ogpnx-b85l.googlevideo.com/videoplayback?expire=1617730293&ei=lEZsYMO2OczSWaPiueAC&ip=188.154.162.181&id=o-ANuT73vsKQLvQqynOeh00stVP-zqbq3x-iUrdDiYwg8E&itag=22&source=youtube&requiressl=yes&mh=kE&mm=31%2C29&mn=sn-n0ogpnx-b85l%2Csn-1gieen7e&ms=au%2Crdu&mv=m&mvi=4&pl=23&initcwndbps=1617500&vprv=1&mime=video%2Fmp4&ns=tPtC_l82gq-yi-rk_oQXatAF&cnr=14&ratebypass=yes&dur=814.207&lmt=1551720899437893&mt=1617708538&fvip=5&fexp=24001373%2C24007246&beids=9466585&c=WEB&txp=5432432&n=LhJHXWU8TGNOrD9u&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRAIgSHTlBPN0j49hoB02SYDeF3-9fe1iSz1KRiv9iFy8nj0CIHEafdAOBefsos8kO5FGhDljsKpOV7ZQ9dY1BEzQQ0n0&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRgIhAJkd-9posqapJekca_35YNG0g3nLgxTfW06EqRM-a3wDAiEApSrsS5wPlMPXjlI_bvOh53cjxlrHfNSKD4XbhyDyZ6w%3D",
  "duration": 815
}
{
  "date": "20150717",
  "title": "3.3: Events (mousePressed, keyPressed) - Processing Tutorial",
  "URL": "https://r4---sn-n0ogpnx-b85l.googlevideo.com/videoplayback?expire=1617730293&ei=lUZsYK6WJ4TeWaeflbgF&ip=188.154.162.181&id=o-AD1WgS46WiFogy00v3aHRp6aZXkd_ACN-_m76lPoQvA8&itag=22&source=youtube&requiressl=yes&mh=it&mm=31%2C29&mn=sn-n0ogpnx-b85l%2Csn-1gieen7e&ms=au%2Crdu&mv=m&mvi=4&pl=23&initcwndbps=1617500&vprv=1&mime=video%2Fmp4&ns=AlyS4uv2BH5ENfp_nP53I-sF&cnr=14&ratebypass=yes&dur=441.225&lmt=1472343659978757&mt=1617708538&fvip=4&fexp=24001373%2C24007246&beids=9466585&c=WEB&n=np6rmmeSKhYEvG1K&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAIRmvxmY-VidN3LPhnzCNQ2TLsUB_7i1yU0QOMBVUS6AAiEAm9DE-Kk6cCNb8FC0we4c2O8299n2_2jGnQfzYzz0igo%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIgZzrGEwMcb0Vrj9FleanW2apPMu_55OdH2SRdw66DQ1QCIQCDsAz7X5RxczKtWzokBhyUNcyXLXeZF-ENufpjA0BP2Q%3D%3D",
  "duration": 442
}

C:\Users\###\Documents\youtube-dl>

最后一期:

得到的 URLs不要显示标准视频。
为什么不?

Youtube-dl's GitHub Doc Used as reference它指出:
url (string): Video URL
如何检索标准的 Youtube 视频 URL?
上一期答案:
我刚刚查看了我的 example.json昨天生成的文件,发现标准的 Youtube 视频 URL 接受 webpage_url代替 url .

最终的 YouTube-DL 输出:
C:\Users\###\Documents\youtube-dl>youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .webpage_url,"duration": .duration}"
{
  "date": "20150717",
  "title": "3.1: Flow (setup and draw) - Processing Tutorial",
  "URL": "https://www.youtube.com/watch?v=o8dffrZ86gs",
  "duration": 944
}
{
  "date": "20150717",
  "title": "3.2: Built-in Variables (mouseX, mouseY) - Processing Tutorial",
  "URL": "https://www.youtube.com/watch?v=ibW4oA7-n8I",
  "duration": 815
}
{
  "date": "20150717",
  "title": "3.3: Events (mousePressed, keyPressed) - Processing Tutorial",
  "URL": "https://www.youtube.com/watch?v=UvSjtiW-RH8",
  "duration": 442
}

C:\Users\###\Documents\youtube-dl>
获取 JSON 文件中的最终输出:
youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq "{"date": .upload_date,"title": .title,"URL": .webpage_url,"duration": .duration}" > example.json

最佳答案

您需要使用方便的工具过滤输出,例如 jq :
粘贴此命令行:youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq '{"date": .upload_date,"title": .title,"URL": .url,"duration": .duration}'您可以获得jq来自 https://stedolan.github.io/jq/download/
更新 :
key "webpage_url"保存标准 YouTube 网址(如果需要)。
有关各种可能键的完整列表,请运行:youtube-dl --skip-download --print-json https://www.youtube.com/playlist?list=PLRqwX-V7Uu6by61pbhdvyEpIeymlmnXzD | jq keys这给出了原始 JSON 中的完整键名。

关于json - 如何使用 youtube-dl 从播放列表中的 Youtube 视频中提取上传日期、标题、URL 和持续时间?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66960551/

相关文章:

javascript - 我如何根据给定的 stage 数组循环 json

python - 无法获得引号内的输出

python - 这是我从python multiprocess可以得到的最多的东西吗?

c++ - 即使使用系统服务中的 RegOpenCurrentUser 也无法获取用户级注册表值

java - Java JRE (7, 8) 的 Windows tar.gz 文件是什么?

java - 应为 BEGIN_ARRAY,但在第 1 行第 5 列为 STRING

android - JsonSubTypes,多态对象列表和 Parcelable

php - 如何处理从 pgsql 返回的数组?

python - Webdriver.get() 不会导航到另一个页面,直到我在调用之前添加等待

Python:导航到便携设备目录 (Windows 7)