python - 访问 python for 循环值

标签 python csv web-scraping beautifulsoup export-to-csv

我正在尝试使用 Python 进行一些实验,并尝试构建一个抓取工具。我已有的代码打印在下面。

import requests
from bs4 import BeautifulSoup
import csv

url = "http://www.grammy.com/nominees/search"
r = requests.get(url)

soup = BeautifulSoup(r.content)

g_data = soup.find_all("div", {"class": "view-content"})

f = csv.writer(open("file.csv", "w"))
f.writerow(["Year", "Category", "Title", "Winner"])

for item in g_data:
  for year in item.find_all("td", {"class": "views-field-year"}):
    year = year.contents[0]

  for category in item.find_all("td", {"class": "views-field-category-code"}):
    category = category.contents[0]

  for title in item.find_all("td", {"class": "views-field-field-nominee-work"}):
    title = title.contents[0]

  for winner in item.find_all("td", {"class": "views-field-field-nominee-extended"}):
    winner = winner.contents[0]

f.writerow([year, category, title, winner])

由于某种原因,CSV 文件只有 1 行,是随机的。我如何才能访问 for 范围之外的所有这些值?

最佳答案

这不仅仅是你最后的writerow()没有正确缩进(它应该位于循环体下方)。另外,您需要迭代 tr元素(代表所需 table 中包含数据的每一行),获取 td每个 tr 的元素在循环中找到。

我也会避免检查 class td 的属性值循环中的元素,只需通过索引获取它们 - 换句话说,找到所有 td每个 tr 的元素并获取text .

修复和改进版本(仅 2 行代码):

for item in soup.select("div.view-content table tr")[1:]:
    f.writerow([td.get_text(strip=True).encode("utf-8") for td in item.find_all("td")])

file.csv的内容运行代码后:

Year,Category,Title,Winner
2014,Record Of The Year,Stay With Me (Darkchild Version),"Sam Smith, artist. Steve Fitzmaurice, Rodney Jerkins & Jimmy Napes, producers. Matthew Champlin, Steve Fitzmaurice, Jimmy Napes & Steve Price, engineers/mixers. Tom Coyne, mastering engineer."
2014,Album Of The Year,Morning Phase,"Beck Hansen, producer; Tom Elmhirst, David Greenbaum, Cole Marsden Greif-Neill, Florian Lagatta, Robbie Nelson, Darrell Thorp, Cassidy Turbin & Joe Visciano, engineers/mixers; Bob Ludwig, mastering engineer."
2014,Song Of The Year,Stay With Me (Darkchild Version),"James Napier, William Phillips &Sam Smith, songwriters."
...
2014,Best Rap Song,I,"K. Duckworth, Ronald Isley & C. Smith, songwriters."
2014,Best Rap Album,The Marshall Mathers LP2,"Eminem, artist. Tony Campana, Joe Strange & Mike Strange, engineers/mixers."

关于python - 访问 python for 循环值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29994589/

相关文章:

loops - 如何摆脱 Stata 循环中的扩展?

Python BeautifulSoup 从网页中抓取表格

javascript - 从命令行执行 HTML Javascript

python - 相当于R中的python dict

python - 根据条件在 Pandas 系列中分配值?

python - 测试使用 mysql 的 python 应用程序

java - 哪个是处理大型 CSV 文件的最佳方式(Java、MySQL、MongoDB)

MySQL 将错误的数据类型导入 VARCHAR 列

c - 如何使用 C 抓取网页?

python - 从 Pandas DataFrame 中的 YYYYMMDD 列中提取年份