Python 抓取维基百科表然后导出到 csv

标签 python web-scraping beautifulsoup python-requests export-to-csv

我已按照教程抓取表格,然后将数据导出到 csv 文件。当我尝试执行文件时,我通过 PyCharm 收到错误

"回溯(最近一次调用最后一次): 文件“I:/Scrape/MediumCode.py”,第 1 行,位于 导入请求 ModuleNotFoundError:没有名为“请求”的模块“

我还假设代码及其逻辑中存在其他错误,但这是我遇到的第一个问题,如果不了解为什么该库无法识别,就无法进一步解决

成功运行 pip install 请求

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://en.wikipedia.org/wiki/Public_holidays_in_Switzerland'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

containers = page_soup.findAll("table", {"class":"wikitable"})

filename = "holidays.csv"
f = open(filename, "w")

headers = "holiday, holiday_date"

f.write(headers)

for container in containers:
    holiday = container.table.tbody.tr.td.a["title"]

    name_container = container.findAll("a", {"class":"title"})
    holiday_name = name_container[0].text

    date_container = container.findAll("td")
    date = date_container[0].text.strip()

    print("holiday: " + brand)
    print("holiday_name: " + holiday_name)
    print("date: " + date)

    f.write(holiday + "," + holiday_name.replace(",", "|") + "," + date + "\n")

    f.close()

最佳答案

使用pandas

  • .read_html( ) - 将 HTML 表读入 DataFrame 对象列表中。
  • .to_csv() - 将对象写入逗号分隔值 (csv) 文件。
<小时/>
import requests
import pandas as pd

url = 'https://en.wikipedia.org/wiki/Public_holidays_in_Switzerland'
response = requests.get(url)

tables = pd.read_html(response.text)

# write holiday table data into `holiday_data` csv file
tables[0].to_csv("holiday_data.csv")

Install pandas library

pip3 install pandas

如果 requests 库仍未在您的系统中抛出错误,请尝试以下操作:

from urllib.request import urlopen as uReq
import pandas as pd

url = 'https://en.wikipedia.org/wiki/Public_holidays_in_Switzerland'
response = uReq(url)
tables = pd.read_html(response.read())
#select only holiday column
select_table_column = ["Holiday"]
'''
    #or select multiple columns 
    select_table_column = ["Holiday","Date"]

'''
# filter table data by selected columns
holiday = tables[0][select_table_column]

# # write holiday table data into `holiday_data` csv file and set csv header
holiday.to_csv("holiday_data.csv",header=True)

关于Python 抓取维基百科表然后导出到 csv,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56656093/

相关文章:

python - 在 docker-compose 中运行 Django

python - 在不删除其他组和数据集的情况下将更多数据集附加到现有 Hdf5 文件中

python - 在单列上使用 `apply` 加速分组

Python 和 Selenium - 从多个 sibling 中抓取数据

javascript - 网页抓取/从 Tizen 网站获取数据

python - 如何存储解析后的html结果?

python - 使用 BeautifulSoup 根据其中包含的字符串提取 li 元素

Python 比较运算符优先级

python - 使用带有 UTF-8 的 soup.get_text()

python - 获取div属性val和div文本主体