此代码的目的是打开一个包含多页表格的网页,脚本必须抓取整个表格并最终将其转换为 pandas 数据框。
一切都很顺利,直到数据框部分。
当我尝试在将其转换为数据帧之前打印它时,它为我提供了每个原始数据作为列表,如下所示:
['Release Date', 'Time', 'Actual', 'Forecast', 'Previous', '']
['Jan 27, 2020', '00:30', ' ', ' ', '47.8%', '']
['Jan 20, 2020', '00:30', '47.8%', ' ', '43.0%', '']
['Jan 13, 2020', '00:30', '43.0%', ' ', '31.5%', '']
['Jan 07, 2020', '00:30', '31.5%', ' ', '29.9%', '']
当我尝试将其转换为数据帧时,它给了我这个:
0 1 2 3 4 5 6 7 8 9 10 11
0 A p r 0 6 , 2 0 1 4
1 0 5 : 0 0 None None None None None None None
2 4 0 . 3 % None None None None None None None
3 None None None None None None None None None None None
4 None None None None None None None None None None None
这是代码:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'
driver = webdriver.Chrome(r"D:\Projects\Driver\chromedriver.exe")
driver.get(url)
wait = WebDriverWait(driver, 10)
while True:
try:
item = wait.until(ec.visibility_of_element_located((By.XPATH, '//*[contains(@id,"showMoreHistory")]/a')))
driver.execute_script("arguments[0].click();", item)
except TimeoutException:
break
for table in wait.until(
ec.visibility_of_all_elements_located((By.XPATH, '//*[contains(@id,"eventHistoryTable")]//tr'))):
data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
df = pd.DataFrame.from_records(data)
print(df.head())
driver.quit()
最佳答案
您没有读取行中的数据。您的代码只需要进行微小的更改:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'
driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver, 10)
while True:
try:
item = wait.until(ec.visibility_of_element_located((By.XPATH, '//*[contains(@id,"showMoreHistory")]/a')))
driver.execute_script("arguments[0].click();", item)
except TimeoutException:
break
data = []
for table in wait.until(
ec.visibility_of_all_elements_located((By.XPATH, '//*[contains(@id,"eventHistoryTable")]//tr'))):
line = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
data.append(line)
df = pd.DataFrame.from_records(data)
print(df.head())
driver.quit()
输出:
0 Release Date Time Actual Forecast Previous
1 Jan 27, 2020 00:30 47.8%
2 Jan 20, 2020 00:30 47.8% 43.0%
3 Jan 13, 2020 00:30 43.0% 31.5%
4 Jan 07, 2020 00:30 31.5% 29.9%
关于python - 从记录列表创建数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59903046/