r - 将每 4 行转置为 4 个单独的列

标签 r list web-scraping transpose rvest

我尝试使用以下循环从 IMDB 中抓取日期、标题和评论:

   library(rvest)
   library(dplyr)
   library(stringr)
   library(tidyverse)

   ID <- 4633694

data <- lapply(paste0('http://www.imdb.com/title/tt', ID, '/reviews?filter=prolific', 1:20),
                   function(url){
                     url %>% read_html() %>% 
                       html_nodes(".review-date,.rating-other-user-rating,.title,.show-more__control") %>% 
                       html_text() %>%
                       gsub('[\r\n\t]', '', .)
                   })

其中提供了 20 页的评论数据,格式如下,重复相同的模式:

   col1
1 10/10
2 If this was..
3 14 December 2018
4 I have to say, and no...
5
6
7 10/10
8 Stan Lee Is Smiling Right Now...
9 17 December 2018
10 A movie worthy of...
11
12
13 10/10
14 the most visually stunning film I've ever seen...
15 20 December 2018
16 There's hardly anything... 
17.
18.

我想知道是否有一种方法可以将每 4 行转置为单独的列,以便每个属性在适当的列中对齐,如下所示:

         Date          Rating     Title            Review
1. 14 December 2018    10/10    If this was..    I have to...
2. 17 December 2018    10/10   Stan Lee Is...    A movie worthy...
3. 20 December 2018    10/10  the most visually.. There's hardly anything...

最佳答案

text_data = gsub('\\b(\\d+/\\d+)\\b','\n\\1',paste(grep('\\w',x$col1,value = TRUE),collapse = ':')) 

read.csv(text=text_data,h=F,sep=":",strip.white = T,fill=T,stringsAsFactors = F)
     V1                                                V2               V3                         V4 V5
1 10/10                                     If this was.. 14 December 2018   I have to say, and no... NA
2 10/10                  Stan Lee Is Smiling Right Now... 17 December 2018       A movie worthy of... NA
3 10/10 the most visually stunning film I've ever seen... 20 December 2018 There's hardly anything... NA

关于r - 将每 4 行转置为 4 个单独的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54730453/

相关文章:

与列表相关的java难题

r - unique() 用于多个变量

r - 如何从 mgcv::gam.check 保存 edf 并跳过绘图

r - 将两个饼图合二为一

r - 如何抑制函数在 R 中返回 "Null"?

python - 嵌套列表理解示例

python - 如何对具有字符串格式的 float 和非数字值的列表进行排序?

java - 从网站拍摄照片并将其加载到 JavaFX 中

xml - 抓取此 URL、R XML 并获取 sibling

php - 防止php在大脚本代码继续出错时停止运行