r - 如何尽快基于组连接行

标签 r performance dplyr data.table concatenation

我有一个数据框如下

ClientVisitGUID LineNum TextCol
1                 1      This was a great
1                 2      report I did
2                 3      was performed today
2                 1      Another great report
2                 2      for this person
3                 2      good stuff
3                 1      I really write very
3                 3      when I put my
3                 4      mind to it

我想根据 ClientVisitGUID 和行号连接行,以便获得以下输出

ClientVisitGUID TextCol
1               This was a great report I did
2               Another great report for this person was performed today
3               I really write very good stuff when I put my mind to it

我尝试了dplyr,但它需要很长时间,并且无法处理我所拥有的数千行

  resultset2<-resultset %>%
    group_by(ClientVisitGUID) %>%
    arrange(LineNum) %>%
    summarize_all(paste, collapse=",")

有没有更快的方法?我不太熟悉 data.table 但这快吗?

最佳答案

第二个data.table选项,也使用stringi来提高性能

library(data.table)
library(stringi)
setDT(df)
setkey(df, ClientVisitGUID, LineNum)
df1 <- df[, .(new = stri_c(TextCol, collapse = " ")), by = ClientVisitGUID]

结果

df1
#   ClientVisitGUID                                                      new
#1:               1                            This was a great report I did
#2:               2 Another great report for this person was performed today
#3:               3  I really write very good stuff when I put my mind to it

数据(感谢@ThomasIsCoding)

df <- structure(list(ClientVisitGUID = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L), LineNum = c(1L, 2L, 3L, 1L, 2L, 2L, 1L, 3L, 4L), TextCol = c("This was a great", 
"report I did", "was performed today", "Another great report", 
"for this person", "good stuff", "I really write very", "when I put my", 
"mind to it")), class = "data.frame", row.names = c(NA, -9L))

关于r - 如何尽快基于组连接行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61463160/

相关文章:

r - 计算R中连续值的按组比率

r - 在 R 中分组时的情况

r - dplyr 合并多个结果

wpf - 仅重写 MeasureOverride 会影响性能吗?

javascript - 如何从 Javascript 数组中获取两个最大的整数并将值返回给 DOM?

r - 使用 mutate 和 for 循环自动分配变量名称

r - dplyr 计算某些文本

c# - 如何连接到 .NET 框架内的 MySQL 数据库?

r - 通过使用 group_split 和 group_map 对变量进行分组,使用 tabyl 进行制表

使用 dplyr 将 future 日期替换为 NA