python - 使用 BeautifulSoup 抓取动态加载的表格

我的代码可以返回前两个标签的值，但后面的不会在每个标签中。
网址:
enter image description here
我的代码:
将 bs4 导入为 bs
进口请求

resp = requests.get('https://q.stock.sohu.com/cn/bk_4401.shtml')
resp.encoding = 'gb2312'
soup = bs.BeautifulSoup(resp.text, 'lxml')
tab_sgtsc_list = soup.find('table').find('tbody').find_all('tr')

for tab_sgtsc in tab_sgtsc_list:
    print('**************************************')
    print(tab_sgtsc.find_all('td')[0].text)
    print(tab_sgtsc.find_all('td')[1].text)
    print(tab_sgtsc.find_all('td')[2].text)
    print(tab_sgtsc.find_all('td')[3].text)
    print('**************************************')

结果:
enter image description here

最佳答案

该表由 JavaScript 动态呈现所以你不会从纯粹的 HTML 得到太多.
然而，selenium和 pandas快来救援吧!
必需的:

Chrome driver

selenium

pip install pandas

就是这样:

import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)

driver.get("https://q.stock.sohu.com/cn/bk_4401.shtml")

wait = WebDriverWait(driver, 10)
element = wait.until(
    EC.visibility_of_element_located((By.CSS_SELECTOR, 'table.tableMSB'))
).text.replace("点击按代码排序查询", "").split()

table = [element[i:i + 12] for i in range(0, len(element), 12)]
pd.DataFrame(table[1:], columns=table[0]).to_csv("your_table_data.csv", index=False)

输出:

关于python - 使用 BeautifulSoup 抓取动态加载的表格，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66902951/

python - 使用 BeautifulSoup 抓取动态加载的表格

上一篇：python - 将时间序列数据集中的随机值设为零

下一篇：regex - 转义 [] 的正则表达式是什么？