python - 抓取的网站数据未写入 CSV

标签 python selenium csv beautifulsoup webdriverwait

我正在尝试抓取网站以获取信息并将其输出到 CSV 文件。对于我尝试提取的数据,终端有一个输出,但我需要将其保存在 CSV 文件中。

我尝试了几种不同的方法,但找不到解决方案。 CSV 文件已创建,但它只是空的。可能有一些非常简单的东西。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import csv
import time
from bs4 import BeautifulSoup

DRIVER_PATH = '/Users/jasonbeedle/Desktop/snaviescraper/chromedriver'

options = Options()
options.page_load_strategy = 'normal'

# Navigate to url
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.get("http://best4sport.tv/2hd/2020-12-10/")
options.add_argument("--window-size=1920x1080")
results = driver.find_element_by_class_name('program1_content_container')
soup = BeautifulSoup(results.text, 'html.parser')

# results = driver.find_element_by_class_name('program1_content_container')
p_data1 = soup.find_all("div", {"class_name": "program1_content_container"})
p_data2 = soup.find_all("div", {"class_name": "program_time"})
p_data3 = soup.find_all("div", {"class_name": "sport"})
p_data4 = soup.find_all("div", {"class": "program_text"})

print("Here is your data, I am off ot sleep now see ya ")
print(results.text)
# Create csv
programme_list = []
# Programme List
for item in p_data1:
    try:
        name = item.contents[1].find_all(
            "div", {"class": "program1_content_container"})[0].text
    except:
        name = ''

    p_data1 = [time]
    programme_list.append(p_data1)

# Programme Time
for item in p_data2:
    try:
        time = item.contents[1].find_all(
            "div", {"class": "program_time"})[0].text
    except:
        time = ''

    p_data2 = [time]
    programme_list.append(p_data2)

# Which sport
for item in p_data3:
    try:
        time = item.contents[1].find_all(
            "div", {"class": "sport"})[0].text
    except:
        time = ''

    p_data3 = [time]
    programme_list.append(p_data3)

with open('sport.csv', 'w') as file:
    writer = csv.writer(file)
    for row in programme_list:
        writer.writerow(row)

我刚刚尝试添加一个名为 data_output 的对象 然后我尝试打印 data_output

data_output = [p_data1, p_data2, p_data3, p_data4]
...
print(data_output)

终端的输出是

最佳答案

将数据加载到 pandas dataframe 并导出到 csv。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from bs4 import BeautifulSoup

DRIVER_PATH = '/Users/jasonbeedle/Desktop/snaviescraper/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get("http://best4sport.tv/2hd/2020-12-10/")
results =WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".program1_content_container")))
soup = BeautifulSoup(results.get_attribute("outerHTML"), 'html.parser')
program_time=[]
sport=[]
program_text=[]
program_info=[]
for item in soup.select(".program_details "):
    if item.find_next(class_='program_time'):
        program_time.append(item.find_next(class_='program_time').text.strip())
    else:
        program_time.append("Nan")
    if item.find_next(class_='sport'):
        sport.append(item.find_next(class_='sport').text.strip())
    else:
        sport.append("Nan")
    if item.find_next(class_='program_text'):
        program_text.append(item.find_next(class_='program_text').text.strip())
    else:
        program_text.append("Nan")
    if item.find_next(class_='program_info'):
        program_info.append(item.find_next(class_='program_info').text.strip())
    else:
        program_info.append("Nan")

df=pd.DataFrame({"program_time":program_time,"sport":sport,"program_text":program_text,"program_info":program_info})
print(df)
df.to_csv("sport.csv")

创建后的 csv 快照

enter image description here

如果您没有 pandas,则需要安装它。

pip install pandas

关于python - 抓取的网站数据未写入 CSV,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65320909/

相关文章:

azure - 从数据湖原始摄取层中的 CSV 文件推断架构的最佳实践?

python根据日期打印出csv数据

使用 XSL 转换将 XML 转换为 CSV

python - Azure 存储访问被拒绝

python - perl如何设置编码?我想创建一个具有特殊名称的目录

python - 如何向 tkinter 窗口添加填充,而不需要 tkinter 将小部件居中?

Python Pandas Period 日期差异在 * MonthEnds> 中,如何将其转换为 INT 值

Java:Selenium 将文本发送到错误的字段

c# - 尝试在 Selenium 的下拉列表中选择项目

selenium - 在 Selenium 中使用文本单击按钮