我对用 R 抓取有点陌生,但我收到一条我无法理解的错误消息。我的代码:
url <- "https://en.wikipedia.org/wiki/California_State_Legislature,_2017%E2%80%9318_session"
leg <- read_html(url)
testdata <- leg %>%
html_nodes('table') %>%
.[6] %>%
html_table()
我得到的回应是:
Error in out[j + k, ] : subscript out of bounds
当我用 html_text 换出 html_table 时,我没有收到错误消息。知道我做错了什么吗?
谢谢!
最佳答案
希望这可以帮助!
library(htmltab)
library(dplyr)
library(tidyr)
url <- "https://en.wikipedia.org/wiki/California_State_Legislature,_2017%E2%80%9318_session"
url %>%
htmltab(6, rm_nodata_cols = F) %>%
.[,-1] %>%
replace_na(list(Notes = "", "Term-limited?" = "")) %>%
`rownames<-` (seq_len(nrow(.)))
输出是:
District Name Party Residence Term-limited? Notes
1 1 Ted Gaines Republican El Dorado Hills
2 2 Mike McGuire Democratic Healdsburg
3 3 Bill Dodd Democratic Napa
4 4 Jim Nielsen Republican Gerber
5 5 Cathleen Galgiani Democratic Stockton
6 6 Richard Pan Democratic Sacramento
...
关于Rvest html_table 错误 - out[j + k, ] : subscript out of bounds 中的错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47585699/