如何在 R 数据表中按组返回第二个最大值,同时返回 id 列?
下面的可重现示例:
library(data.table)
sample_dt <- data.table(
myid=c("A","B","B","A","B","B","C","C","A","A"),
mycol1 =c(101,-108,140,140,-150,-140,-111,101,150,140)
)
#desired output
desired_dt <- data.table(
myid=c("A","B","C"),
mycol1_secondmax=c(140,-108,-111)
)
最佳答案
可能的解决方案:
sample_dt[order(-mycol1),.SD[2],by=myid]
myid mycol1
<char> <num>
1: A 140
2: B -108
3: C -111
编辑:与 frank
和其他人的比较
sample_dt <- data.table(
myid=sample(LETTERS,10^4,replace=T),
mycol1 =sample(-100:100,10^4,replace=T)
)
microbenchmark::microbenchmark(
sample_dt[order(-mycol1),.SD[2],by=myid],
sample_dt[, .SD[frank(-mycol1, ties.method = "first") == 2L,], by = .(myid)],
sample_dt[, .SD[order(-mycol1) == 2], myid]
)
Unit: milliseconds
expr min lq mean median uq max neval
sample_dt[order(-mycol1), .SD[2], by = myid] 1.450001 1.850001 3.139760 3.145701 3.899852 13.6538 100
sample_dt[, .SD[frank(-mycol1, ties.method = "first") == 2L, ], by = .(myid)] 6.816001 7.199701 8.289604 7.638501 8.390902 22.0289 100
sample_dt[, .SD[order(-mycol1) == 2], myid] 4.503801 4.824001 5.606895 5.229701 5.872300 15.9330 100
关于从 R 中的 data.table 按组返回第二个最大值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74411512/