python - 从记录列表创建数据框

标签 python pandas selenium

此代码的目的是打开一个包含多页表格的网页,脚本必须抓取整个表格并最终将其转换为 pandas 数据框。

一切都很顺利,直到数据框部分。

当我尝试在将其转换为数据帧之前打印它时,它为我提供了每个原始数据作为列表,如下所示:

['Release Date', 'Time', 'Actual', 'Forecast', 'Previous', '']
['Jan 27, 2020', '00:30', ' ', ' ', '47.8%', '']
['Jan 20, 2020', '00:30', '47.8%', ' ', '43.0%', '']
['Jan 13, 2020', '00:30', '43.0%', ' ', '31.5%', '']
['Jan 07, 2020', '00:30', '31.5%', ' ', '29.9%', '']

当我尝试将其转换为数据帧时,它给了我这个:

0     1     2     3     4     5     6     7     8     9    10    11
0  A     p     r           0     6     ,           2     0     1     4
1  0     5     :     0     0  None  None  None  None  None  None  None
2  4     0     .     3     %  None  None  None  None  None  None  None
3     None  None  None  None  None  None  None  None  None  None  None
4     None  None  None  None  None  None  None  None  None  None  None

这是代码:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'

driver = webdriver.Chrome(r"D:\Projects\Driver\chromedriver.exe")
driver.get(url)
wait = WebDriverWait(driver, 10)

while True:
    try:
        item = wait.until(ec.visibility_of_element_located((By.XPATH, '//*[contains(@id,"showMoreHistory")]/a')))
        driver.execute_script("arguments[0].click();", item)
    except TimeoutException:
        break
for table in wait.until(
        ec.visibility_of_all_elements_located((By.XPATH, '//*[contains(@id,"eventHistoryTable")]//tr'))):
    data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
df = pd.DataFrame.from_records(data)
print(df.head())

driver.quit()

最佳答案

您没有读取行中的数据。您的代码只需要进行微小的更改:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'

driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver, 10)

while True:
    try:
        item = wait.until(ec.visibility_of_element_located((By.XPATH, '//*[contains(@id,"showMoreHistory")]/a')))
        driver.execute_script("arguments[0].click();", item)
    except TimeoutException:
        break
data = []
for table in wait.until(
    ec.visibility_of_all_elements_located((By.XPATH, '//*[contains(@id,"eventHistoryTable")]//tr'))):
    line = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
    data.append(line)
df = pd.DataFrame.from_records(data)
print(df.head())

driver.quit()

输出:

0  Release Date   Time  Actual  Forecast  Previous
1  Jan 27, 2020  00:30                       47.8%
2  Jan 20, 2020  00:30   47.8%               43.0%
3  Jan 13, 2020  00:30   43.0%               31.5%
4  Jan 07, 2020  00:30   31.5%               29.9%

关于python - 从记录列表创建数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59903046/

相关文章:

Python pandas excel输出不是决定形式

python - 使用自定义损失函数创建 Keras 模型的函数只能运行一次

python - Pandas DataFrame 迭代切片

python - Gmail 应用程序 - 快速入门,错误 : redirect_uri_mismatch

适用于 Google Cloud Storage 和大文件的 Python 客户端

python - 如何通过 2x2 平均内核对 Pandas 数据帧进行下采样

python - 使用多列的 Pandas groupby 函数

grails - Geb:元素不再附加到 waitFor 内的 DOM

java - 在selenium cucumber集成场景中插入 "enum Identifier"来完成EnumHeader

docker - 无法在 Jenkins 上不同容器的卷之间共享数据