我编写了一个嵌套的 for 循环,并希望将其转换为函数。这是我目前拥有的代码:
## Sleeper function
testit <- function(x)
{
p1 <- proc.time()
Sys.sleep(x)
proc.time() - p1 # The cpu usage should be negligible
}
## Set up null objects to fill in loop
episodes = NULL
l = NULL
## Scrape pages and append together
for (season in c(27:38)){
for (episode in c(1:13)){
tryCatch({
if (season == 34 & episode == 3){
l = 'l'
}
new_episode <- read_html(paste0('http://www.chakoteya.net/DoctorWho/', season, '-', episode, '.htm', l)) %>%
html_nodes("p") %>%
html_text() %>%
tibble(value = .)
episodes <- episodes %>% bind_rows(new_episode)
cat(paste('\rseason = ', season, '; episode = ', episode))
testit(2)
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
}
如您所见,它在第 27 至 38 季以及每季第 1 至 13 集中循环播放。有时一季只有 12 集,因此需要 tryCatch()
。而且,由于我不明白的原因,有时 URL 需要 .html
后缀(如果季节 >= 34 且剧集 >= 3),有时需要 .htm
> 后缀,因此是 if (season == 34 & Episode == 3)
语句。
我想将其转换为一个函数,也许使用 apply()
或 map()
但我的函数技能仍然非常初级,我正在努力。
作为最终输出,最好调用一个名为 doctor_who()
的函数,如下所示:
episodes <- doctor_who(season = c(27:38), episode = c(1:13))
最佳答案
我将其分为两个函数 -
第一个函数从一个文件中读取数据,第二个函数创建每个季节和剧集的所有 url,并使用 map_df
将它们一一传递。
在第一个函数中,我们首先尝试从 .htm
读取数据,如果失败,则从 .html
读取数据。
read_text <- function(url) {
val <- tryCatch({
read_html(url) %>%
html_nodes("p") %>%
html_text() %>%
tibble(value = .)
}, error = function(e) return(NA))
if(NROW(val) == 1) {
url <- paste0(url, 'l')
val <- tryCatch({
read_html(url) %>%
html_nodes("p") %>%
html_text() %>%
tibble(value = .)
}, error = function(e) return(NA))
}
if(NROW(val) > 1) val else NULL
}
doctor_who <- function(season, episode) {
all_urls <- sprintf('http://www.chakoteya.net/DoctorWho/%s.htm',
c(t(outer(season, episode, paste, sep = '-'))))
purrr::map_df(all_urls, read_text, .id = 'id')
}
res <- doctor_who(season = 1:5, episode = 1:5)
我添加了一个 id
列来区分每一集。
# id value
# <chr> <chr>
# 1 1 " \r\nOriginal Airdate: 23 Nov, 1963"
# 2 1 " [Coal Hill School corridor] "
# 3 1 " (The bell is ringing for end of classes.)\r\nGIRL: Night, Miss Wright. \…
# 4 1 " [Laboratory] "
# 5 1 " (A man is tidying up after the class) \r\nIAN: Oh? Not gone yet? \r\nBAR…
# 6 1 " [Classroom] "
# 7 1 " (Susan is listening to guitar rock music on her\r\ntransistor radio. I'm…
# 8 1 " [Totter's Lane] "
# 9 1 " (Ian and Barbara are parked up.) \r\nBARBARA: Over there. \r\nIAN: We're…
#10 1 " [Memory - classroom] "
# … with 4,446 more rows
关于r - 将嵌套 for 循环转换为函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68255115/