javascript - ruby selenium web driver - 获取谷歌知识图谱内容

标签 javascript css ruby selenium web-scraping

我正在使用 ruby​​ selenium 网络驱动程序,并试图从 <div class="xpdopen"> 中第一个谷歌搜索结果页面的搜索结果中获取位于右上角网站的谷歌知识图的内容。

@driver = Selenium::WebDriver.for :phantomjs
@driver.manage.timeouts.implicit_wait = 10
@driver.get "http://google.com"
element = @driver.find_element :name => "q"
element.send_keys "BMW"
element.submit
content = @driver.find_element(:class, 'xpdopen')

但是 selenium 找不到这个元素并出现错误

#<Selenium::WebDriver::Error::NoSuchElementError: {"errorMessage":"Unable to find element with class name 'xpdopen'"

当我在 chrome js 控制台中尝试时 $('.xpdopen')它立即找到了这个元素

我也试过

@driver.execute_script("return document.getElementsByClassName('xpdopen');")

但是找不到这个元素

我也绑定(bind)了@driver.page_source<div class="xpdopen">不在页面源代码中,但我可以在 chrome 控制台中看到它。为什么?

我怎样才能用 Selenium 得到这个元素?

以下是我从 pry 中得到的结果:

[21] pry(main)> @driver = Selenium::WebDriver.for :phantomjs
=> #<Selenium::WebDriver::Driver:0x..f822d288ec7f0a708 browser=:phantomjs>
[22] pry(main)> @driver.manage.timeouts.implicit_wait = 10    
=> 10
[23] pry(main)> @driver.get "http://google.com"    
=> {}
[24] pry(main)> element = @driver.find_element :name => "q"    
=> #<Selenium::WebDriver::Element:0x..f389f4a8876f601e id=":wdc:1434526425103">
[25] pry(main)> element.send_keys "BMW"    
=> nil
[26] pry(main)> element.submit    
=> {}
[27] pry(main)> sleep 10    
=> 10
[28] pry(main)> content = @driver.find_element(:xpath, '//*[@id="rhs_block"]/ol/li/div[1]/div')    
Selenium::WebDriver::Error::NoSuchElementError: {"errorMessage":"Unable to find element with xpath '//*[@id=\"rhs_block\"]/ol/li/div[1]/div'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"gzip;q=1.0,deflate;q=0.6,identity;q=0.3","Connection":"close","Content-Length":"67","Content-Type":"application/json; charset=utf-8","Host":"127.0.0.1:8929","User-Agent":"Ruby"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"xpath\",\"value\":\"//*[@id=\\\"rhs_block\\\"]/ol/li/div[1]/div\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/2f3cf350-14c3-11e5-9f8e-4173e8049986/element"}} (org.openqa.selenium.NoSuchElementException)

[29] pry(main)> content = @driver.find_element(:css, "#rhs_block > ol > li > div.kp-blk._Jw._Rqb._RJe > .xpdopen")
Selenium::WebDriver::Error::NoSuchElementError: {"errorMessage":"Unable to find element with css selector '#rhs_block > ol > li > div.kp-blk._Jw._Rqb._RJe > .xpdopen'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"gzip;q=1.0,deflate;q=0.6,identity;q=0.3","Connection":"close","Content-Length":"113","Content-Type":"application/json; charset=utf-8","Host":"127.0.0.1:8929","User-Agent":"Ruby"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"css selector\",\"value\":\"#rhs_block \\u003e ol \\u003e li \\u003e div.kp-blk._Jw._Rqb._RJe \\u003e .xpdopen\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/2f3cf350-14c3-11e5-9f8e-4173e8049986/element"}} (org.openqa.selenium.NoSuchElementException)

只是为了证明它可以毫无问题地找到同一页面上的其他元素:

[30] pry(main)> results = @driver.find_elements(:xpath, "//p/a") 
=> [#<Selenium::WebDriver::Element:0x6f6a74631e2b7010 id=":wdc:1434527087873">,
 #<Selenium::WebDriver::Element:0x7b6d276448081688 id=":wdc:1434527087874">,
 #<Selenium::WebDriver::Element:0x..f9504a4171b03970a id=":wdc:1434527087875">,
 #<Selenium::WebDriver::Element:0x..fa6e0158aa8d24e2a id=":wdc:1434527087876">,
 #<Selenium::WebDriver::Element:0x327bf842e4399368 id=":wdc:1434527087877">,
 #<Selenium::WebDriver::Element:0x..fae292d7ca211ab32 id=":wdc:1434527087878">,
 #<Selenium::WebDriver::Element:0x129a58eb5ed6ee9c id=":wdc:1434527087879">,
 #<Selenium::WebDriver::Element:0x46ef3b45800e63e0 id=":wdc:1434527087880">,
 #<Selenium::WebDriver::Element:0x26bfb47f8ad498ea id=":wdc:1434527087881">,
 #<Selenium::WebDriver::Element:0x..f03756c2924a2974 id=":wdc:1434527087882">,
 #<Selenium::WebDriver::Element:0xfba93aab4b32af8 id=":wdc:1434527087883">]

用截图发现phantomjs不显示(没有内容)知识图谱

来自 phantomjs 的屏幕截图

phantomjs page content

来自 Firefox 的屏幕截图 Firefox page content

为什么phantomjs没有content knowledge graph?

最佳答案

显然 css 自己不知道在哪里可以找到 xpdopen 类,你必须给出元素的完整路径:

Xpath:

content = @driver.find_element(:xpath, "//*[@id="rhs_block"]/ol/li/div[1]/div")

CSS:

content = @driver.find_element(:css, "#rhs_block > ol > li > div.kp-blk._Jw._Rqb._RJe > .xpdopen")

关于javascript - ruby selenium web driver - 获取谷歌知识图谱内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30872109/

相关文章:

javascript - Puppeteer 在当前窗口而不是新窗口中启动一个新选项卡

javascript - 在数组中定位重复项并获取它们之间的长度

javascript - 为 Mongodb 创建多个动态对象 NodeJs 查询

Javascript在CSS下更改属性

ruby - 将 Ruby 类加载到应用程序中的最佳方法是什么?

Ruby:下面这个方法有没有更好的写法?

javascript - 如何在控制台中禁用 Dojo 的 JsonRest 查询的错误日志记录?

html - 将背景图片限制为仅正文

html - 无法摆脱 Firefox 中 iframe 上的固定水平滚动

ruby - 找不到 gem 'neo4j-enterprise (>= 0) ruby'