Python Selenium 在遍历表时只获得第一行

我正在尝试从以下新闻网站中提取最新的头条新闻: http://news.sina.com.cn/hotnews/

#save ids of relevant buttons that need to be clicked on the site
buttons_ids = ['Tab21' , 'Tab22', 'Tab32']

#save ids of relevant subsections
con_ids = ['Con11']

#start webdriver, go to site, hover over buttons
driver = webdriver.Chrome()
driver.get("http://news.sina.com.cn/hotnews/")
time.sleep(3)
for button_id in buttons_ids:
    button = driver.find_element_by_id(button_id)
    ActionChains(driver).move_to_element(button).perform()

然后我遍历我感兴趣的每个部分，并在每个部分中遍历所有标题，这些标题是 HTML 表格中的行。但是，在每次迭代中，它都会返回第一个元素

for con_id in con_ids:
    for news_id in range(2,10):
        print(news_id)
        headline = driver.find_element_by_xpath("//div[@id='"+con_id+"']/table/tbody/tr["+str(news_id)+"]")
        text = headline.find_element_by_xpath("//td[2]/a")
        print(text.get_attribute("innerText"))
        print(text.get_attribute("href"))
        com_no = comment.find_element_by_xpath("//td[3]/a")
        print(com_no.get_attribute("innerText"))

我还尝试了以下方法，主要是将表保存为列表，然后遍历行:

for con_id in con_ids:
    table = driver.find_elements_by_xpath("//div[@id='"+con_id+"']/table/tbody/tr")
    for headline in table:
        text = headline.find_element_by_xpath("//td[2]/a")
        print(text.get_attribute("innerText"))
        print(text.get_attribute("href"))
        com_no = comment.find_element_by_xpath("//td[3]/a")
        print(com_no.get_attribute("innerText"))

在第二种情况下，我得到了该部分标题的确切数量，因此它显然正确地获取了行数。但是，它仍然只返回所有迭代的第一行。我哪里错了？我知道这里有人问过类似的问题:Selenium Python iterate over a table of rows it is stopping at the first row但我仍然无法弄清楚我哪里出错了。

最佳答案

在 XPath 中，以 // 开头的查询将相对于文档根进行搜索；因此，即使您在正确的容器元素上调用 find_element_by_xpath()，您也会跳出该范围，从而每次执行相同的全局搜索并产生相同的结果。

要将查询限制为当前元素的后代，请以 .// 开始查询，例如:

text = headline.find_element_by_xpath(".//td[2]/a")

关于Python Selenium 在遍历表时只获得第一行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48812282/

Python Selenium 在遍历表时只获得第一行

上一篇：python - os.environ 不返回新的和导出的环境变量

下一篇：python - 创建一个既是类方法又是实例方法的方法