r - 将每 4 行转置为 4 个单独的列

我尝试使用以下循环从 IMDB 中抓取日期、标题和评论:

   library(rvest)
   library(dplyr)
   library(stringr)
   library(tidyverse)

   ID <- 4633694

data <- lapply(paste0('http://www.imdb.com/title/tt', ID, '/reviews?filter=prolific', 1:20),
                   function(url){
                     url %>% read_html() %>% 
                       html_nodes(".review-date,.rating-other-user-rating,.title,.show-more__control") %>% 
                       html_text() %>%
                       gsub('[\r\n\t]', '', .)
                   })

其中提供了 20 页的评论数据，格式如下，重复相同的模式:

   col1
1 10/10
2 If this was..
3 14 December 2018
4 I have to say, and no...
5
6
7 10/10
8 Stan Lee Is Smiling Right Now...
9 17 December 2018
10 A movie worthy of...
11
12
13 10/10
14 the most visually stunning film I've ever seen...
15 20 December 2018
16 There's hardly anything... 
17.
18.

我想知道是否有一种方法可以将每 4 行转置为单独的列，以便每个属性在适当的列中对齐，如下所示:

         Date          Rating     Title            Review
1. 14 December 2018    10/10    If this was..    I have to...
2. 17 December 2018    10/10   Stan Lee Is...    A movie worthy...
3. 20 December 2018    10/10  the most visually.. There's hardly anything...

最佳答案

text_data = gsub('\\b(\\d+/\\d+)\\b','\n\\1',paste(grep('\\w',x$col1,value = TRUE),collapse = ':')) 

read.csv(text=text_data,h=F,sep=":",strip.white = T,fill=T,stringsAsFactors = F)
     V1                                                V2               V3                         V4 V5
1 10/10                                     If this was.. 14 December 2018   I have to say, and no... NA
2 10/10                  Stan Lee Is Smiling Right Now... 17 December 2018       A movie worthy of... NA
3 10/10 the most visually stunning film I've ever seen... 20 December 2018 There's hardly anything... NA

关于r - 将每 4 行转置为 4 个单独的列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54730453/

r - 将每 4 行转置为 4 个单独的列

上一篇：bash 提示符 : highlight command being entered

下一篇：PHP URL 隐藏类别 id