r - 如何使用 rvest 检索 google 搜索中的超链接

我正在使用 rvest 获取 Google 搜索中的超链接。用户 @AllanCameron 过去帮助我绘制了这段代码，但现在我不知道如何更改 xpath 或我需要做什么才能获取链接。这是我的代码:

library(rvest)
library(tidyverse)
#Code
#url
url <- 'https://www.google.com/search?q=Mario+Torres+Mexico'
#Get data
first_page <- read_html(url)
links <- html_nodes(first_page, xpath = "//div/div/a/h3") %>% 
  html_attr('href')

完全返回NA。

我想获取如下所示的每个项目的链接(抱歉图像质量):

是否可以将其存储在数据框中？非常感谢!

最佳答案

查看 h3 节点的父级 a 并找到它们的 href 属性。这可确保您拥有与主标题相同数量的链接，以便在数据框中轻松排列。

titles <- html_nodes(first_page, xpath = "//div/div/a/h3")

titles %>%
  html_elements(xpath = "./parent::a") %>%
  html_attr("href") %>%
  str_extract("https.*?(?=&)")

[1] "https://www.linkedin.com/in/mario-torres-b5796315b"                                                           
[2] "https://mariolopeztorres.com/"                                                                                
[3] "https://www.instagram.com/mario_torres25/%3Fhl%3Den"                                                          
[4] "https://www.1stdibs.com/buy/mario-torres-lopez/"                                                              
[5] "https://m.facebook.com/2064681987175832"                                                                      
[6] "https://www.facebook.com/mariotorresmx"                                                                       
[7] "https://www.transfermarkt.us/mario-torres/profil/spieler/28167"                                               
[8] "https://en.wikipedia.org/wiki/Mario_Garc%25C3%25ADa_Torres"                                                   
[9] "https://circawho.com/press-and-magazines/mario-lopez-torress-legacy-is-still-being-woven-in-michoacan-mexico/"

关于r - 如何使用 rvest 检索 google 搜索中的超链接，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73806861/

r - 如何使用 rvest 检索 google 搜索中的超链接

上一篇：github - 从代码审查要求中排除文件/目录

下一篇：javascript - 如何获取 Firestore 查询中的第二个 10 文档？