我有一个循环,可从〜440个网页中读取HTML表数据。每页上的代码并不完全相同,因此有时我需要表节点1,有时需要表节点2。现在,我只是在列表中手动设置节点号并将其输入循环。我的问题是页面节点已开始更改和更新节点#列表变得很麻烦。
如果循环遇到错误的节点号(即:1而不是2,或者取反),它将给出错误并关闭。如果遇到错误,是否有办法让循环将错误的节点号替换为正确的节点号,然后继续运行循环,就好像什么都没发生一样?
这是我循环中代码的readHTML部分,并带有示例url:
url <- "http://espn.go.com/nba/player/gamelog/_/id/2991280/year/2013/"
html.page <- htmlParse(url)
tableNodes <- getNodeSet(html.page, "//table")
x <- as.numeric(Players$Nodes[s])
tbl = readHTMLTable(tableNodes[[x]], colClasses = c("character"),stringsAsFactors = FALSE)
这是节点号错误时出现的错误:
"Error in readHTMLTable(tableNodes[[x]], colClasses = c("character"), stringsAsFactors = FALSE) : error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in tableNodes[[x]] : subscript out of bounds"
示例代码:
A <- c("dog", "cat")
Nodes <- as.data.frame(1:1)
#)Nodes <- as.data.frame(1:2) <-- This works without errors
colnames(Nodes)[1] <- "Col1"
Nodes2 <- 2
url <-c("http://espn.go.com/nba/player/gamelog/_/id/6639/year/2013/","http://espn.go.com/nba/player/gamelog/_/id/6630/year/2013/")
for (i in 1:length(A))
{
html.page <- htmlParse(url[i])
tableNodes <- getNodeSet(html.page, "//table")
x <- as.numeric(Nodes$Col1[i])
df = readHTMLTable(tableNodes[[x]], colClasses = c("character"),stringsAsFactors = FALSE)
#tryCatch(df) here.....no clue
assign(paste0("", A[i]), df)
}
最佳答案
如果您收到subscript out of bounds
错误消息,则应确保使用较低的x
。基于您在原始问题中发布的演示代码的tryCatch
常规演示(尽管我不知道x
和2
是什么,但我已将Players
替换为s
):
> msg <- tryCatch(readHTMLTable(tableNodes[[2]], colClasses = c("character"),stringsAsFactors = FALSE), error = function(e)e)
> str(msg)
List of 2
$ message: chr "error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in tableNodes[[2]] : subscript"| __truncated__
$ call : language readHTMLTable(tableNodes[[2]], colClasses = c("character"), stringsAsFactors = FALSE)
- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
> msg$message
[1] "error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in tableNodes[[2]] : subscript out of bounds\n"
> grepl('subscript out of bounds', msg$message)
[1] TRUE
关于r - 如果for循环中发生错误,如何更改值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19755606/