r - 按组按降序连接值

这个问题在这里已经有了答案:

Collapse / concatenate / aggregate a column to a single comma separated string within each group

(5 个回答)

4年前关闭。

我想要一个数据。我的数据 A 看起来像

author_id paper_id prob
   731    24943    1
   731    24943    1
   731   688974    1
   731   964345    .8
   731  1201905    .9
   731  1267992    1
   736    249      .2
   736   6889      1
   736   94345    .7
   736  1201905    .9
   736  126992    .8

我想要的输出是:

author_id    paper_id
  731        24943,24943,688974,1201905,964345
  736        6889,1201945,126992,94345,249

即 paper_id 是按照概率降序排列的。

如果我使用 sql 和 R 的组合，我认为解决方案是

statement<-"select * from A 
            GROUP BY author_id
            ORDER BY prob"

然后在 R 中使用 paste 一旦为 paper_id 设置了顺序。

但是我需要 R 中的完整解决方案。这怎么可能呢？

谢谢

最佳答案

如 temp是你的数据集然后做

library(data.table)
setDT(temp)[order(-prob), list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]
##    author_id                                       paper_id
## 1:       731 24943, 24943, 688974, 1267992, 1201905, 964345
## 2:       736              6889, 1201905, 126992, 94345, 249

编辑:2014 年 8 月 11 日

自 data.table v >= 1.9.4，可以使用非常高效的setorder而不是 order

str(temp)
setorder(setDT(temp), -prob)[, list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]
##    author_id                                       paper_id
## 1:       731 24943, 24943, 688974, 1267992, 1201905, 964345
## 2:       736              6889, 1201905, 126992, 94345, 249

顺便说一句，这整个事情也可以用基础 R 轻松完成(尽管不推荐用于大数据集)

aggregate(paper_id ~ author_id, temp[order(-temp$prob), ], paste, collapse = ", ")
#   author_id                                       paper_id
# 1       731 24943, 24943, 688974, 1267992, 1201905, 964345
# 2       736              6889, 1201905, 126992, 94345, 249

关于r - 按组按降序连接值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22685896/

r - 按组按降序连接值

上一篇：nunit - 使用 NUnit 在派生类中设置？

下一篇：c - 这在 C/C++ 套接字通信中是否合理？