pandas - 使用 beautifulsoup 将 selenium html 表放入 pandas 数据框中

标签 pandas selenium beautifulsoup

我已成功使用 selenium 来抓取 html 表格,该表格需要在抓取之前单击按钮。

所以selenium可以工作,并成功地将html表存储为变量“r”。

然而,我在将其解析为 pandas 数据帧时遇到了困难。

与 URL 一样,数据框应有 5 列和大约 30 行。

谁能看出哪里出了问题吗?

import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup

browser = webdriver.Firefox(executable_path=r'/Users/computer_name/Documents/python/web_drivers/geckodriver')
browser.get('https://www.investing.com/equities/exxon-mobil-income-statement')
linkElem = browser.find_element_by_link_text('Annual')
linkElem.click()
r = browser.find_element_by_css_selector("#rrtable > table").get_attribute('innerHTML')
browser.quit()

soup = BeautifulSoup(r, 'html.parser')

df = pd.DataFrame(soup)
print(df)

非常感谢

最佳答案

一旦获得soup元素,然后使用pd.read_html()您需要使用outerHTML而不是innerHTML

r = browser.find_element_by_css_selector("#rrtable > table").get_attribute('outerHTML')
browser.quit()
soup = BeautifulSoup(r, 'html.parser')
df = pd.read_html(str(soup))[0]
print(df)

输出:

                                     Period Ending:  ...                                          201631/12
0                                       Total Revenue  ...                                             200628
1   Revenue 255583 279332 237162 200628  Other Rev...  ...  Revenue 255583 279332 237162 200628  Other Rev...
2                                             Revenue  ...                                             200628
3                                Other Revenue, Total  ...                                                  -
4                              Cost of Revenue, Total  ...                                             136098
5                                        Gross Profit  ...                                              64530
6                            Total Operating Expenses  ...                                             199692
7   Selling/General/Admin. Expenses, Total 41923 4...  ...  Selling/General/Admin. Expenses, Total 41923 4...
8              Selling/General/Admin. Expenses, Total  ...                                              39819
9                              Research & Development  ...                                               1467
10                        Depreciation / Amortization  ...                                              22308
11          Interest Expense (Income) - Net Operating  ...                                                  -
12                           Unusual Expense (Income)  ...                                                  -
13                    Other Operating Expenses, Total  ...                                                  -
14                                   Operating Income  ...                                                936
15       Interest Income (Expense), Net Non-Operating  ...                                               4353
16                      Gain (Loss) on Sale of Assets  ...                                                  -
17                                         Other, Net  ...                                               2680
18                            Net Income Before Taxes  ...                                               7969
19                         Provision for Income Taxes  ...                                               -406
20                             Net Income After Taxes  ...                                               8375
21                                  Minority Interest  ...                                               -535
22                               Equity In Affiliates  ...                                                  -
23                                U.S GAAP Adjustment  ...                                                  -
24              Net Income Before Extraordinary Items  ...                                               7840
25                          Total Extraordinary Items  ...                                                  -
26                                         Net Income  ...                                               7840
27                    Total Adjustments to Net Income  ...                                                  -
28  Income Available to Common Excluding Extraordi...  ...                                               7840
29                                Dilution Adjustment  ...                                                  -
30                                 Diluted Net Income  ...                                               7840
31                    Diluted Weighted Average Shares  ...                                               4177
32          Diluted EPS Excluding Extraordinary Items  ...                                               1.88
33                   DPS - Common Stock Primary Issue  ...                                               2.98
34                             Diluted Normalized EPS  ...                                               1.88

关于pandas - 使用 beautifulsoup 将 selenium html 表放入 pandas 数据框中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61018158/

相关文章:

python - 比较Python中的列表和获取索引

python - 如果条件满足,Pandas Dataframe 找到第一个出现的位置

python - 大于或等于python中的分箱

c# - Selenium Chrome 驱动器不工作

java - 检查 WebElement 对象中是否存在元素

python - 如何在 Python 中使用 BeautifulSoup 创建链接?

python - 根据组绘制条形图

python - 如何使用 Python 在 Selenium Webdriver 中选择一个具有通用类名的元素?

python - 在 BeautifulSoup 中选择具有多个部件类的标签

Python 使用 Selenium 和 Beautiful Soup 抓取 JavaScript