R - 嵌套列表到 tibble

我有一个像这样的嵌套列表:

> ex <- list(list(c("This", "is", "an", "example", "."), c("I", "really", "hate", "examples", ".")), list(c("How", "do", "you", "feel", "about", "examples", "?")))
> ex
[[1]]
[[1]][[1]]
[1] "This"    "is"      "an"      "example" "."      

[[1]][[2]]
[1] "I"        "really"   "hate"     "examples" "."       


[[2]]
[[2]][[1]]
[1] "How"      "do"       "you"      "feel"     "about"    "examples" "?"

我想将它转换为像这样的小标题:

> tibble(d_id = as.integer(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2)),
+        s_id = as.integer(c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1)),
+        t_id = as.integer(c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7)),
+        token = c("This", "is", "an", "example", ".", "I", "really",
+                  "hate", "examples", ".", "How", "do", "you", "feel", "about", "examples", "?"))
# A tibble: 17 x 4
    d_id  s_id  t_id token   
   <int> <int> <int> <chr>   
 1     1     1     1 This    
 2     1     1     2 is      
 3     1     1     3 an      
 4     1     1     4 example 
 5     1     1     5 .       
 6     1     2     1 I       
 7     1     2     2 really  
 8     1     2     3 hate    
 9     1     2     4 examples
10     1     2     5 .       
11     2     1     1 How     
12     2     1     2 do      
13     2     1     3 you     
14     2     1     4 feel    
15     2     1     5 about   
16     2     1     6 examples
17     2     1     7 ?

我执行此操作的最有效方法是什么？最好使用 tidyverse功能？

最佳答案

是时候让一些序列工作了，这应该非常有效:

d_id <- rep(seq_along(ex), lengths(ex))
s_id <- sequence(lengths(ex))
t_id <- lengths(unlist(ex, rec=FALSE))

data.frame(
  d_id  = rep(d_id, t_id),
  s_id  = rep(s_id, t_id),
  t_id  = sequence(t_id),
  token = unlist(ex)
)

#   d_id s_id t_id    token
#1     1    1    1     This
#2     1    1    2       is
#3     1    1    3       an
#4     1    1    4  example
#5     1    1    5        .
#6     1    2    1        I
#7     1    2    2   really
#8     1    2    3     hate
#9     1    2    4 examples
#10    1    2    5        .
#11    2    1    1      How
#12    2    1    2       do
#13    2    1    3      you
#14    2    1    4     feel
#15    2    1    5    about
#16    2    1    6 examples
#17    2    1    7        ?

对于 ex 的 500K 样本，这将在大约 2 秒内运行。列表。我怀疑这在效率方面很难被击败。

关于R - 嵌套列表到 tibble，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49911087/

R - 嵌套列表到 tibble

上一篇：r - 脑筋急转弯 R 问题 - 在 R 中解决编程问题

下一篇：scala - 创建 SparkContext 失败