仅当我尝试收集(“抓取”)表的内容两次时才会出现此问题。我第一次读取表的内容成功,但第二次总是失败。
只有 Chrome 浏览器(带有相应 chromedriver 的版本 74)才会出现这种情况。我在 FireFox 上尝试了同样的操作,但从未发生过。我在 Chrome 中找到了某种解决方法,这没有任何意义,但它可以完成工作。
当我“转到”包含表格的屏幕以外的其他屏幕然后返回时,表格抓取成功。
以下是我用来收集表格的函数:
def Get_Faults_List(Port_Number=None, PSU=None, Retries=5):
for attempt in range(Retries):
try:
if Port_Number:
# Show the Faults view in the context of "Port_Number"
Device_Panel_Frame.Click_Port(self, Port_Number)
elif PSU:
if not Device_Panel_Frame.Click_PSU(self, PSU):
return None
Left_Panel_Frame.Click_Fault(self)
self.driver.switch_to_default_content()
Main_Body = self.driver.find_element_by_name('main_page')
self.driver.switch_to.frame(Main_Body)
alarms_tab = self.driver.find_element_by_id('tab_alarms')
alarms_tab.click()
Fault_Screen = self.driver.find_element_by_name('faults')
self.driver.switch_to.frame(Fault_Screen)
# the rows that the following variable collect are automatically
# the relevant fault lines. The XPATH that was used omits the two
# irrelevant lines
faultTable_rows = WebDriverWait(self.driver, timeout=3, poll_frequency=0.5).until(
EC.presence_of_all_elements_located((By.XPATH, "//table[@id='faultTab']//tr[not(@id or @style)]")))
current_faults = []
row_index = 0
for row in faultTable_rows: # Go through each of the rows
current_faults.append([])
# Collect all the column elements of a certain row into a list
faultTable_row_cols = row.find_elements_by_tag_name("td")
for col in faultTable_row_cols:
# Each row of the Faults table is separated into 5 columns each column holds a string
current_faults[row_index].append(col.text)
row_index += 1
break
except:
print(attempt + 1, 'attempt failed', Retries - (attempt + 1), 'to go')
self.Refresh_Screen()
sleep(5)
continue
如果我打开一个新的浏览器,我也将成功收集表格的内容。顺便说一句,失败总是发生在下表的第一行(标题之后)。该行是 current_faults[row_index].append(col.text) ,我不明白为什么。异常(exception)没有任何意义。
还有其他方法可以有效地抓取表格的内容吗?
最佳答案
参见this answer因为你得到 Stale Element Reference Exception
.
A Stale Element Reference Exception occurs when an element:
- Has been deleted
- Is no longer attached to the DOM (as in your case)
- Has changed
From the docs:
You should discard the current reference you hold and replace it, possibly by locating the element again once it is attached to the DOM.
即:再次“查找”该元素。
block 引用>我的建议是捕获 HTML 并循环它:
您可以使用
driver.page_source
然后BeautifulSoup
像这样:html = driver.page_source soup = BeautifulSoup(html, "lxml")
这应该在切换框架后实现。
希望这对您有帮助!
关于python - 当我 "read"列的文本内容时,为什么会收到 StaleElementReferenceException?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56206359/