python - 将抓取的结果集保存到 CSV 文件中

标签 python csv parsing screen-scraping

我编写了一个小脚本,它采用 eBay 结果集并将每个字段存储在不同的变量中:链接、价格、出价。

如何获取变量并将每个拍卖项目的每个结果保存到 CSV 文件中,其中每一行代表不同的拍卖项目?

例如:链接、价格、出价

这是迄今为止我的代码:

import requests, bs4
import csv
import requests
import pandas as pd
res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH_Sold=1&_from=R40&_sacat=0&_nkw=gerald%20ford%20autograph&rt=nc&LH_Auction=1&_trksid=p2045573.m1684')
res.raise_for_status()
soup=bs4.BeautifulSoup(res.text)

# grabs the link, selling price, and # of bids from historical auctions
links = soup.find_all(class_="vip")
prices = soup.find_all("span", "bold bidsold")
bids = soup.find_all("li", "lvformat")

最佳答案

这应该可以完成工作:

import csv
import requests
import bs4

res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH_Sold=1&_from=R40&_sacat=0&_nkw=gerald%20ford%20autograph&rt=nc&LH_Auction=1&_trksid=p2045573.m1684')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)

# grab all the links and store its href destinations in a list
links = [e['href'] for e in soup.find_all(class_="vip")]

# grab all the bid spans and split its contents in order to get the number only
bids = [e.span.contents[0].split(' ')[0] for e in soup.find_all("li", "lvformat")]

# grab all the prices and store those in a list
prices = [e.contents[0] for e in soup.find_all("span", "bold bidsold")]

# zip each entry out of the lists we generated before in order to combine the entries
# belonging to each other and write the zipped elements to a list
l = [e for e in zip(links, prices, bids)]

# write each entry of the rowlist `l` to the csv output file
with open('ebay.csv', 'w') as csvfile:
    w = csv.writer(csvfile)
    for e in l:
        w.writerow(e)

因此,您将获得一个 csv 文件,其中以 ,(逗号)作为分隔符。

关于python - 将抓取的结果集保存到 CSV 文件中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33964785/

相关文章:

python - 如何使用重复参数更新完美流程?

python - 是否建议不要在来自不同 conda channel 的 conda 环境中混合包?

python - 如何在 scikit 中加载 CSV 数据并将其用于朴素贝叶斯分类

mysql - 将大型 csv 文件导入 mysql

python - 我可以使用 python、selenium 和 lxml 解析 xpath 吗?

php - 是否可以从 php 中的转发电子邮件中检索原始 header

python - 在 Python 中从较大的方阵 (n,n) 复制较小的方阵 (m,m)

python - 如何验证 SQLAlchemy ORM 中的列数据类型?

mysql - 当我将 CSV 导入 MySQL Workbench 时,会出现两列额外的列

parsing - 斯卡拉 2.9 : is there an easy way to log all ParseResults?