我目前正在尝试使用 BeautifulSoup 从 1001TrackLists(一个列出 DJ 混音轨道的网站)中抓取数据。
如果混音中的轨道未进行 ID 标识,1001TrackLists 会将其保留为数据表中的“ID - ID”,这会在抓取的代码中显示为空白条目,并弄乱我的 for 循环。
如何让 Python 跳过轨道列表中的“空白”ID 并继续抓取空白 ID 之后的数据?
到目前为止我的代码:
headers = {'User-Agent': 'Chrome/51.0.2704.103'}
page_link = 'https://www.1001tracklists.com/tracklist/7mzt0y9/boddika-joy-orbison-rinse-fm-hessle-audio-cover-show-2014-01-16.html'
page_response = requests.get(page_link, headers=headers)
soup = bs(page_response.content, "html.parser")
tracknumbers = []
tracknames = []
artistnames = []
mixnames = []
dates = []
tracknames_scrape = soup.find_all("div", class_="tlToogleData", div=True)
artistnames_scrape = soup.find_all("meta", itemprop="byArtist")
for (i, track) in enumerate(tracknames_scrape):
tracknumbers.append(i+1)
trackname = track.meta['content']
tracknames.append(trackname)
print(str(i+1) + str(". ") + trackname)
目前,我能够返回所有轨道,直到我遇到空白条目,然后出现以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-de6ecd3caa59> in <module>
1 for (i, track) in enumerate(tracknames_scrape):
2 tracknumbers.append(i+1)
----> 3 trackname = track.meta['content']
TypeError: 'NoneType' object is not subscriptable
如果我使用没有空白轨道 ID 的 URL,该脚本将完美运行。
最佳答案
使用以下 CSS 选择器来获取轨道名称。
import requests
from bs4 import BeautifulSoup as bs
headers = {'User-Agent': 'Chrome/51.0.2704.103'}
page_link = 'https://www.1001tracklists.com/tracklist/7mzt0y9/boddika-joy-orbison-rinse-fm-hessle-audio-cover-show-2014-01-16.html'
page_response = requests.get(page_link, headers=headers)
soup = bs(page_response.content, "html.parser")
tracknumbers = []
tracknames = []
artistnames = []
mixnames = []
dates = []
tracknames_scrape =soup.select('div[itemprop="tracks"]>[itemprop="name"]')
#artistnames_scrape = soup.find_all("meta", itemprop="byArtist")
for (i, track) in enumerate(tracknames_scrape):
tracknumbers.append(i+1)
trackname = track['content']
tracknames.append(trackname)
print(str(i+1) + str(". ") + trackname)
输出:
1. Soft Machine - Snodland
2. Craig Leon - The Customs Of The Age Disturbed
3. Seven Davis Jr. - Thanks
4. Gadi Mizrahi - I'll Set Your House
5. Baby Ford & The iFach Collective - Word For Word
6. Panzer Knacker - Rollin' On The Side Of Psycho
7. 69 - Poi Beats
8. Midi Rain - Shine (DJ Pierre Chicago House Mix)
9. Sunpeople - Check Your Buddha (Sven Väth Remix)
10. Eduardo De La Calle - Madhusudhana
11. Aardvarck - The Antdance
12. Boddika & Joy Orbison - In Here
13. Mike Parker - Lustrations Eight (Contours)
14. Peter Van Hoesen - Axis Mundi
15. Sleeparchive - Bleep 01
16. Conforce - When It Appeared
17. Brommage Dub - Fettwise
18. Matrixxman - Protocol
19. JuJu & Jordash - Powwow
20. Gesloten Cirkel - Yamagic
21. Mike Dehnert - Mischkaa
22. Jerome Sydenham & Joe Claussell - Rhythm
23. Ratchett Traxxx - Nut On U
24. Kenny Dope & Terry Hunter pres. Mass Destruction - No Hook
25. Radio Slave - Don't Stop No Sleep
26. Truncate - Focus
27. Maurizio - Domina (Maurizio Mix Edit)
28. Shed - Atmo - Action
29. AFX - Boxing Day
30. Boddika & Joy Orbison - More Maim
关于python - 跳过 BeautifulSoup 中的空白行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60065505/