r - 如何用 rvest 和 xpath 刮一张 table ？

使用以下 documentation我一直在尝试从 marketwatch.com 上抓取一系列表格

这是下面的代码所代表的:

链接和 xpath 已经包含在代码中:

url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation <- url %>%
  html() %>%
  html_nodes(xpath='//*[@id="maincontent"]/div[2]/div[1]') %>%
  html_table()
valuation <- valuation[[1]]

我收到以下错误:

Warning message:
'html' is deprecated.
Use 'read_html' instead.
See help("Deprecated")

提前致谢。

最佳答案

该网站不使用 html 表格，所以 html_table()找不到任何东西。它实际上使用 div类(class) column和 data lastcolumn .
所以你可以做类似的事情

url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation_col <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="column"]')
    
valuation_data <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="data lastcolumn"]')

甚至

url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="section"]')

让你一路走好。
另请阅读他们的 terms of use - 特别是 3.4。

关于r - 如何用 rvest 和 xpath 刮一张 table ？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35707534/

上一篇：google-analytics - 按数据源属性过滤 Google Analytics 中的数据

下一篇：.net - 我可以使用Web Deploy将元素插入web.config吗？

相关文章：

r - 与AWS一起使用biocep？

html - Rvest 网页抓取带来仅包含列名称的空数据表

python - 从 Mega.nz 文件中抓取文本 (Python)

java - 尝试从只有一行的网站获取字符串

xml - 从 XML 中选择节点，其属性之一包含特定字符串

python - 如何在 python 3 中使用 urllib 请求解决 SSL 握手失败？

r - Matlab bsxfun(@times,...,...) 等价于 R

r - 将字符串向量转换为 R 中的数据帧

r - 如何在 R 中将年份映射到随后的几十年？

python - XPath:转换为小写并同时规范化空间