r - 如何在不同行中显示组名时聚合 data.frame

标签 r dataframe aggregate

我有一个像这样的 data.frame

df=data.frame(
grp=c("group1","s1","s2","s3","s4","s5","group2","s6","s7","s8","group2","s9","s10","group3","s11","s12","s13","s14"),
gname=c("gene1",0.00,0.05,0.01,0.01,0.01,"gene1",0.063,0.005,0.015,"gene2",0.07,0.00,"gene3",0.046,0.007,0.011,0.012),
score=c(0.989003844,NA,NA,NA,NA,NA,0.988334014,NA,NA,NA,0.983461712,NA,NA,0.982339339,NA,NA,NA,NA)
)

> df
      grp gname      score
1  group1 gene1 0.9890038
2      s1     0        NA
3      s2  0.05        NA
4      s3  0.01        NA
5      s4  0.01        NA
6      s5  0.01        NA
7  group2 gene1 0.9883340
8      s6 0.063        NA
9      s7 0.005        NA
10     s8 0.015        NA
11 group2 gene2 0.9834617
12     s9  0.07        NA
13    s10     0        NA
14 group3 gene3 0.9823393
15    s11 0.046        NA
16    s12 0.007        NA
17    s13 0.011        NA
18    s14 0.012        NA

根据组名和基因名,df可以分为4个部分。下图显示了这4个部分。

enter image description here

我要聚合df为每个部分找到maxdf$scorelengthdf$grp基于列 df$grpdf$gname .以下 df 显示了预期结果。
grp     gname   max.score   length
group1  gene1   0.989003844   5
group2  gene1   0.988334014   3
group2  gene2   0.983461712   2
group3  gene3   0.982339339   4

下图显示了结果是如何获得的。
enter image description here

我该怎么表演aggregate(score~grp+gname,df,max)aggregate(grp~grp+gname,df,length)对于每个部分,并将结果保存在 data.frame 中。

最佳答案

如果您知道每个组以一个非缺失分数开始,然后是缺失值,那么组合 cumsum/is.natapply会做的伎俩。

首先创建一个聚合变量 f .

f <- cumsum(!is.na(df$score))

现在看看结果长度是多少。最上面一行数字是 "names" 的值属性,长度是底行。这些长度包括 "group*"行,因此在最终数据帧中,减去 1。
tapply(f, f, length)
#1 2 3 4 
#6 4 3 5 

创建问题要求的结果。
result <- cbind(df[!is.na(df$score), ], length = tapply(f, f, length) - 1)

result
#      grp gname     score length
#1  group1 gene1 0.9890038      5
#7  group2 gene1 0.9883340      3
#11 group2 gene2 0.9834617      2
#14 group3 gene3 0.9823393      4

如果您还想要连续的行名,
row.names(result) <- NULL

关于r - 如何在不同行中显示组名时聚合 data.frame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57019798/

相关文章:

r - 过滤后使用左连接更新数据

python - 将 df reshape 为多索引并沿键连接

列名中带有空格的python数据框查询

python - 如何删除 DataFrame 中除某些列之外的所有列?

r - 如果字符串包含 R 中的特定文本,则聚合

mysql - 如何在 ssrs 中引用表内的另一个数据集?

r - 如何使用 R 从 gmail 下载附件?

r - 通过从另一个表中划分所有可能的列组合来创建新的数据框

r - 如何测试数据框中的任何值是否超出限制

r - data.frame 中每组的平均值