我有一个数据框如下
ClientVisitGUID LineNum TextCol
1 1 This was a great
1 2 report I did
2 3 was performed today
2 1 Another great report
2 2 for this person
3 2 good stuff
3 1 I really write very
3 3 when I put my
3 4 mind to it
我想根据 ClientVisitGUID
和行号连接行,以便获得以下输出
ClientVisitGUID TextCol
1 This was a great report I did
2 Another great report for this person was performed today
3 I really write very good stuff when I put my mind to it
我尝试了dplyr
,但它需要很长时间,并且无法处理我所拥有的数千行
resultset2<-resultset %>%
group_by(ClientVisitGUID) %>%
arrange(LineNum) %>%
summarize_all(paste, collapse=",")
有没有更快的方法?我不太熟悉 data.table 但这快吗?
最佳答案
第二个data.table
选项,也使用stringi
来提高性能
library(data.table)
library(stringi)
setDT(df)
setkey(df, ClientVisitGUID, LineNum)
df1 <- df[, .(new = stri_c(TextCol, collapse = " ")), by = ClientVisitGUID]
结果
df1
# ClientVisitGUID new
#1: 1 This was a great report I did
#2: 2 Another great report for this person was performed today
#3: 3 I really write very good stuff when I put my mind to it
数据(感谢@ThomasIsCoding)
df <- structure(list(ClientVisitGUID = c(1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 3L), LineNum = c(1L, 2L, 3L, 1L, 2L, 2L, 1L, 3L, 4L), TextCol = c("This was a great",
"report I did", "was performed today", "Another great report",
"for this person", "good stuff", "I really write very", "when I put my",
"mind to it")), class = "data.frame", row.names = c(NA, -9L))
关于r - 如何尽快基于组连接行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61463160/