我正在尝试使用 Python 进行一些实验,并尝试构建一个抓取工具。我已有的代码打印在下面。
import requests
from bs4 import BeautifulSoup
import csv
url = "http://www.grammy.com/nominees/search"
r = requests.get(url)
soup = BeautifulSoup(r.content)
g_data = soup.find_all("div", {"class": "view-content"})
f = csv.writer(open("file.csv", "w"))
f.writerow(["Year", "Category", "Title", "Winner"])
for item in g_data:
for year in item.find_all("td", {"class": "views-field-year"}):
year = year.contents[0]
for category in item.find_all("td", {"class": "views-field-category-code"}):
category = category.contents[0]
for title in item.find_all("td", {"class": "views-field-field-nominee-work"}):
title = title.contents[0]
for winner in item.find_all("td", {"class": "views-field-field-nominee-extended"}):
winner = winner.contents[0]
f.writerow([year, category, title, winner])
由于某种原因,CSV 文件只有 1 行,是随机的。我如何才能访问 for
范围之外的所有这些值?
最佳答案
这不仅仅是你最后的writerow()
没有正确缩进(它应该位于循环体下方)。另外,您需要迭代 tr
元素(代表所需 table
中包含数据的每一行),获取 td
每个 tr
的元素在循环中找到。
我也会避免检查 class
td
的属性值循环中的元素,只需通过索引获取它们 - 换句话说,找到所有 td
每个 tr
的元素并获取text
.
修复和改进版本(仅 2 行代码):
for item in soup.select("div.view-content table tr")[1:]:
f.writerow([td.get_text(strip=True).encode("utf-8") for td in item.find_all("td")])
file.csv
的内容运行代码后:
Year,Category,Title,Winner
2014,Record Of The Year,Stay With Me (Darkchild Version),"Sam Smith, artist. Steve Fitzmaurice, Rodney Jerkins & Jimmy Napes, producers. Matthew Champlin, Steve Fitzmaurice, Jimmy Napes & Steve Price, engineers/mixers. Tom Coyne, mastering engineer."
2014,Album Of The Year,Morning Phase,"Beck Hansen, producer; Tom Elmhirst, David Greenbaum, Cole Marsden Greif-Neill, Florian Lagatta, Robbie Nelson, Darrell Thorp, Cassidy Turbin & Joe Visciano, engineers/mixers; Bob Ludwig, mastering engineer."
2014,Song Of The Year,Stay With Me (Darkchild Version),"James Napier, William Phillips &Sam Smith, songwriters."
...
2014,Best Rap Song,I,"K. Duckworth, Ronald Isley & C. Smith, songwriters."
2014,Best Rap Album,The Marshall Mathers LP2,"Eminem, artist. Tony Campana, Joe Strange & Mike Strange, engineers/mixers."
关于python - 访问 python for 循环值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29994589/