在 intraday.pro有一个在线状态,在特定时间段后会重复更新。该元素是在 javascript
insideHTML 代码中动态生成的。
我使用浏览器的 Inspect Element 检查了 html 代码,这是代码:
<div id="is_online">
<font color="green">Online</font>
</div>
我使用下面的代码,但它返回 None
并且找不到在线状态。
from bs4 import BeautifulSoup
import requests
r = requests.get("http://intraday.pro/")
soup = BeautifulSoup(r.text, 'html.parser')
is_online = True
while is_online:
items = soup.find_all("div", {"id": "is_online"})[0].decode_contents()
if items:
print(items)
is_online = False
我还使用过:
items = soup.find_all("font")
for item in items:
print(item.get_text())
但是我再也找不到在线状态了。
这也是生成在线状态的 javascript
代码:
<script type="text/javascript">
var errtime = 0;
var ftime = 1;
var lastPair = '';
function subscribe(url) {
var xhr = new XMLHttpRequest();
if(ftime == 1)
xhr.open('GET', '/script/table.php?ft=1', true);
else
xhr.open('GET', '/script/table.php', true);
xhr.send();
xhr.onreadystatechange = function()
{
if (xhr.readyState != 4) return;
var isonline = document.getElementById('is_online');
if (xhr.status != 200) {
errtime += 1;
if(errtime < 3)
{
setTimeout( subscribe('/script/table.php') , 30000);
} else {
// offline
isonline.innerHTML = "<font color='red'><b>Offline</b>. Please refresh this page after few minutes</font>";
}
} else {
// online
isonline.innerHTML = "<font color='green'>online</font>";
var result = JSON.parse(xhr.responseText);
var stat24h = document.getElementById('stat24h');
stat24h.innerHTML = result.stat;
var table1 = result.table;
var last1 = result.last;
var tsumm = 0;
for(var i=3;i<21;i++)
{
for(var j=1;j<14;j++)
{
tsumm = 100*i + j;
var test = document.getElementById(i+"_"+j);
if(table1[tsumm] != null && test)
{
test.innerHTML = table1[tsumm];
} else {
if(test)
test.innerHTML = " ";
}
}
}
errtime = 0;
ftime = 2;
subscribe('/script/table.php');
if(lastPair != last1 && lastPair != "")
{
lastPair = last1;
soundClick();
} else {
lastPair = last1;
}
}
}
}
function soundClick() {
var audio = new Audio();
audio.src = '/libs/sounds/sound1.mp3';
audio.autoplay = true;
}
</script>
BeautifulSoup
中是否有任何解决方案能够在 javascript
生成 html
元素时获取该元素?
_谢谢
最佳答案
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
import time
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get('http://intraday.pro/')
time.sleep(3)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
status = soup.find('div', {'id': 'is_online'})
print(status.text)
driver.quit()
输出:
online
关于javascript - 如何使用 BeautifulSoup 读取定期生成的 innerHTML 元素?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59338432/