我的代码可以返回前两个标签的值,但后面的不会在每个标签中。
网址:
enter image description here
我的代码:
将 bs4 导入为 bs
进口请求
resp = requests.get('https://q.stock.sohu.com/cn/bk_4401.shtml')
resp.encoding = 'gb2312'
soup = bs.BeautifulSoup(resp.text, 'lxml')
tab_sgtsc_list = soup.find('table').find('tbody').find_all('tr')
for tab_sgtsc in tab_sgtsc_list:
print('**************************************')
print(tab_sgtsc.find_all('td')[0].text)
print(tab_sgtsc.find_all('td')[1].text)
print(tab_sgtsc.find_all('td')[2].text)
print(tab_sgtsc.find_all('td')[3].text)
print('**************************************')
结果:enter image description here
最佳答案
该表由 JavaScript
动态呈现所以你不会从纯粹的 HTML
得到太多.
然而,selenium
和 pandas
快来救援吧!
必需的:
pip install pandas
就是这样:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get("https://q.stock.sohu.com/cn/bk_4401.shtml")
wait = WebDriverWait(driver, 10)
element = wait.until(
EC.visibility_of_element_located((By.CSS_SELECTOR, 'table.tableMSB'))
).text.replace("点击按代码排序查询", "").split()
table = [element[i:i + 12] for i in range(0, len(element), 12)]
pd.DataFrame(table[1:], columns=table[0]).to_csv("your_table_data.csv", index=False)
输出:关于python - 使用 BeautifulSoup 抓取动态加载的表格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66902951/