javascript - R - 如何在网站上运行 javascript 按钮以显示所有要抓取的值

我试图在显示产品 1-30 的网站上抓取一些数据，除非我按下“列出所有”按钮。此按钮是 JavaScript，在我运行它时不会更改 URL。我目前正在使用 R 中的 rvest 包来执行此操作。

  page <- paste("https://shop.supervalu.ie/shopping/shopping/shop.aspx?catid=150200350")
  page <- read_html(page)

我看过其他一些帖子，其中提到使用 RSelenium 包，但我更愿意使用其他方法。

编辑 - 感谢 Jack 的帮助，我现在已经得到了下面的这段代码，但是我遇到了两个问题。

1) 即使我们按下“ListAll”按钮，有些页面也不会显示所有产品。它将显示前 200 个，然后您必须浏览接下来的 200 个页面，等等。在本页 https://shop.supervalu.ie/shopping/shopping/shop.aspx?catid=150200275

2) 在我的循环中，如果代码无法检测到“ListAll”元素(即如果产品少于 30 个，代码会抛出错误。有人知道如何在循环中避免这种情况吗？伪(如果不存在 ListAll 元素，跳过 ListAll 并继续运行)

checkForServer()
startServer()
mybrowser <- remoteDriver()
mybrowser$open()

while(i < 67){

  # Navigate to page
  mybrowser$navigate(paste("https://shop.supervalu.ie/shopping/shopping/shop.aspx?catid=150200275"))

  # Show all products
  ListAll <- mybrowser$findElement("class", "listAllText")
  ListAll$clickElement()

  # Navigate to next page (only goes to second page, when run again, it goes back to the first page as it is the first "unselected" class it detects.
  ListAll <- mybrowser$findElement("class", "unselected")
  ListAll$clickElement()


  # Take it slow
  Sys.sleep(7)
  outhtml <- mybrowser$findElement(using = 'xpath', "//*")
  out<-outhtml$getElementAttribute("outerHTML")[[1]]

  # Parse with RCurl
  doc<-htmlParse(out, encoding = "UTF-8")
  doc
  # Scrape product info
  productRaw <- getNodeSet(doc, "//*[@class = 'productTitle']")
  products <- sapply(productRaw, xmlValue)

  priceRaw <- getNodeSet(doc, "//*[@class = 'divProductPrice BodyText Style3']")
  price <- sapply(priceRaw, xmlValue)

  pricePerUnitRaw <- getNodeSet(doc, "//*[@class = 'divProductPricePerUnit BodyText Style2']")
  pricePerUnit <- sapply(pricePerUnitRaw, xmlValue)

  barcodeRaw <- getNodeSet(doc, "//*[@class = 'productImage']//a[@href]//img[@src]")
  barcode <- sapply(barcodeRaw, xmlValue)
  barcode <- sapply(barcodeRaw,function(x) xmlAttrs(x)["src"])

  final <- rbind(final, data.frame(Products = products, 
                                   Price = price, UnitPrice = pricePerUnit, Barcode = barcode))
  i=i+1
}

最佳答案

我知道您更喜欢另一种方式，但我想提出 RSelenium 解决方案，以便您可以看到它。

library(RSelenium)
library(XML)

# Start Selenium server
checkForServer()
startServer()

remDr <- remoteDriver()

remDr$open()

# Navigate to page
remDr$navigate("https://shop.supervalu.ie/shopping/shopping/shop.aspx?catid=150200350")

# Snag the html
ListAll <- remDr$findElement("class", "listAllText")
ListAll$clickElement()

# Take it slow
Sys.sleep(.50)

outhtml <- remDr$findElement(using = 'xpath', "//*")
out<-outhtml$getElementAttribute("outerHTML")[[1]]

# Parse with RCurl
doc<-htmlParse(out, encoding = "UTF-8")

# just scraping a bit for example
gg <- getNodeSet(doc, "//*[@class = 'productTitle']")

sapply(gg, xmlValue)

hrbrmstr 可能有一些您可以使用的 ajax 魔法。查看他对另一个问题的回答 here

关于javascript - R - 如何在网站上运行 javascript 按钮以显示所有要抓取的值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35557031/

javascript - R - 如何在网站上运行 javascript 按钮以显示所有要抓取的值

上一篇：Javascript:如何确定 SVG 路径绘制方向？

下一篇：javascript - 如何在 Windows 上为 ios 构建(react-native)？