python - 使用 BeautifulSoup python 3.6 抓取数据时缺少网页值

标签 python python-3.x selenium web-scraping beautifulsoup

我正在使用以下脚本从 http://fortune.com/fortune500/xcel-energy/ 中删除“STOCK QUOTE”数据, 但它给空白。

我也使用过 selenium 驱动程序,但同样的问题。请对此提供帮助。

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

r = requests.get('http://fortune.com/fortune500/xcel-energy/')
soup = bs(r.content, 'lxml') # tried: 'html.parser

data = pd.DataFrame(columns=['C1','C2','C3','C4'], dtype='object', index=range(0,11))
for table in soup.find_all('div', {'class': 'stock-quote row'}):
    row_marker = 0
    for row in table.find_all('li'):
    column_marker = 0
    columns = row.find_all('span')
    for column in columns:
        data.iat[row_marker, column_marker] = column.get_text()
        column_marker += 1
    row_marker += 1
print(data)

输出获取:

              C1    C2   C3   C4
0       Previous Close:         NaN  NaN
1           Market Cap:   NaNB  NaN    B
2   Next Earnings Date:         NaN  NaN
3                 High:         NaN  NaN
4                  Low:         NaN  NaN
5         52 Week High:         NaN  NaN
6          52 Week Low:         NaN  NaN
7     52 Week Change %:   0.00  NaN  NaN
8            P/E Ratio:    n/a  NaN  NaN
9                  EPS:         NaN  NaN
10      Dividend Yield:    n/a  NaN  NaN

Screen shot of source

最佳答案

您要查找的数据似乎可以在这个 API endpoint 找到:

import requests

response = requests.get("http://fortune.com/api/v2/company/xel/expand/1")
data = response.json()
print(data['ticker'])

仅供引用,在 selenium 自动浏览器中打开页面时,您只需要确保您 wait for the desired data to appear before parsing the HTML ,工作代码:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd


url = 'http://fortune.com/fortune500/xcel-energy/'
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)

driver.get(url)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".stock-quote")))

page_source = driver.page_source
driver.close()

# HTML parsing part
soup = BeautifulSoup(page_source, 'lxml') # tried: 'html.parser

data = pd.DataFrame(columns=['C1','C2','C3','C4'], dtype='object', index=range(0,11))
for table in soup.find_all('div', {'class': 'stock-quote'}):
    row_marker = 0
    for row in table.find_all('li'):
        column_marker = 0
        columns = row.find_all('span')
        for column in columns:
            data.iat[row_marker, column_marker] = column.get_text()
            column_marker += 1
        row_marker += 1
print(data)

关于python - 使用 BeautifulSoup python 3.6 抓取数据时缺少网页值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45533571/

相关文章:

python - Websocket 握手状态 200 异常

python - WTForm "OR"条件验证器? (电子邮件或电话)

python - 图像顶部的热图

python-3.x - 为图像形态Python创建结构元素

python - 无法点击元素 : ElementClickInterceptedException in Splinter/Selenium

python - 如何检测用户是否输入了任何数据作为控制台输入

python - 如何使用 Python 将 Gaia 天体测量数据绘制为 TESS 图像?

python - 列表元素列表的组合

c# - Geckodriver 0.16.0 使用 flash 播放器启动 firefox

python - 如何在 Selenium Webdriver 2 Python 中获取当前 URL?