python - 如何使用 Selenium 从搜索结果中提取 Google 链接的 href？

最终我只是想获取第一个链接的 href 到 google 的搜索结果

我需要的信息也存在于“a”元素中，但它存储在“data-href”属性中，我不知道如何从中提取数据(get_attribute('data-href') 返回 None)。

我正在使用 Phantomjs，但也尝试过使用 Firefox 网络驱动程序

href 显示在 cite 中谷歌搜索中的标签(可以通过检查谷歌搜索结果中每个链接下的绿色小链接文本找到)。

引用元素显然是在 Selenium 中找到的，但返回的文本( element.text 或 get_attribute('innerHTML') 或( text ))不是 html 中显示的内容。

例如，有一个引用标签<cite class="_Rm">www.fcv.org.br/</cite> ，但是element.text显示“wikimapia.org/.../Fundação-Cristiano-Varella-Hospital...”

我试图用 by_css_selector 检索引用元素, tag_name , class_name , 和 xpath 具有相同的结果。

links = driver.find_elements_by_css_selector('div.g') # div[class="g"]
link = links[0] # I am looking for the first link in the main links section
next = link.find_element_by_css_selector('div[class="s"]') # location of cite tag
nextB = next.find_element_by_tag_name('cite')

包含cite标签的div(div中只有一个)

    <div class="s">
         <div>
             <div class="f kv _SWb" style="white-space:nowrap">
                  <cite class="_Rm">www.fcv.org.br/</cite>

最佳答案

在每个搜索结果中找到第一个 a 元素并获取它的 href 属性值:

from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get("https://www.google.com/search?q=test")

results = driver.find_elements_by_css_selector('div.g')
link = results[0].find_element_by_tag_name("a")
href = link.get_attribute("href")

然后你可以extract the actual url from the href value with urlparse :

import urlparse

print(urlparse.parse_qs(urlparse.urlparse(href).query)["q"])

打印:

[u'http://www.speedtest.net/']

关于python - 如何使用 Selenium 从搜索结果中提取 Google 链接的 href？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35241230/

python - 如何使用 Selenium 从搜索结果中提取 Google 链接的 href？

上一篇：python - 查找两个列表的索引最大值

下一篇：python - numba @njit 更新一个大字典