python - 从记录列表创建数据框

此代码的目的是打开一个包含多页表格的网页，脚本必须抓取整个表格并最终将其转换为 pandas 数据框。

一切都很顺利，直到数据框部分。

当我尝试在将其转换为数据帧之前打印它时，它为我提供了每个原始数据作为列表，如下所示:

['Release Date', 'Time', 'Actual', 'Forecast', 'Previous', '']
['Jan 27, 2020', '00:30', ' ', ' ', '47.8%', '']
['Jan 20, 2020', '00:30', '47.8%', ' ', '43.0%', '']
['Jan 13, 2020', '00:30', '43.0%', ' ', '31.5%', '']
['Jan 07, 2020', '00:30', '31.5%', ' ', '29.9%', '']

当我尝试将其转换为数据帧时，它给了我这个:

0     1     2     3     4     5     6     7     8     9    10    11
0  A     p     r           0     6     ,           2     0     1     4
1  0     5     :     0     0  None  None  None  None  None  None  None
2  4     0     .     3     %  None  None  None  None  None  None  None
3     None  None  None  None  None  None  None  None  None  None  None
4     None  None  None  None  None  None  None  None  None  None  None

这是代码:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'

driver = webdriver.Chrome(r"D:\Projects\Driver\chromedriver.exe")
driver.get(url)
wait = WebDriverWait(driver, 10)

while True:
    try:
        item = wait.until(ec.visibility_of_element_located((By.XPATH, '//*[contains(@id,"showMoreHistory")]/a')))
        driver.execute_script("arguments[0].click();", item)
    except TimeoutException:
        break
for table in wait.until(
        ec.visibility_of_all_elements_located((By.XPATH, '//*[contains(@id,"eventHistoryTable")]//tr'))):
    data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
df = pd.DataFrame.from_records(data)
print(df.head())

driver.quit()

最佳答案

您没有读取行中的数据。您的代码只需要进行微小的更改:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'

driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver, 10)

while True:
    try:
        item = wait.until(ec.visibility_of_element_located((By.XPATH, '//*[contains(@id,"showMoreHistory")]/a')))
        driver.execute_script("arguments[0].click();", item)
    except TimeoutException:
        break
data = []
for table in wait.until(
    ec.visibility_of_all_elements_located((By.XPATH, '//*[contains(@id,"eventHistoryTable")]//tr'))):
    line = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
    data.append(line)
df = pd.DataFrame.from_records(data)
print(df.head())

driver.quit()

输出:

0  Release Date   Time  Actual  Forecast  Previous
1  Jan 27, 2020  00:30                       47.8%
2  Jan 20, 2020  00:30   47.8%               43.0%
3  Jan 13, 2020  00:30   43.0%               31.5%
4  Jan 07, 2020  00:30   31.5%               29.9%

关于python - 从记录列表创建数据框，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59903046/

python - 从记录列表创建数据框

上一篇：python - 关于变量作用域的棘手 Python 问题

下一篇：python - Django - 如何将自定义对象分配为模型属性并在该对象中获取该模型实例？