r - 在R中，如何根据列属性的统计信息选择行？

我的表有数千行(按 400 个类分类)和十几列。

理想的结果将是一个基于列“z”的最大值的包含 400 行(每个类别 1 行)的表格，并包含所有原始列。

这是我的数据示例，我只需要使用 R 提取本示例中的第 2、4、7、8 行。

     x           y         z    cluster 
1  712521.75  3637426.49  19.46   12 
2  712520.69  3637426.47  19.66   12  *
3  712518.88  3637426.63  17.37   225
4  712518.4   3637426.48  19.42   225 *
5  712517.11  3637426.51  18.81   225
6  712515.7   3637426.58  17.8    17 
7  712514.68  3637426.55  18.16   17  *
8  712513.58  3637426.55  18.23   50  *
9  712512.1   3637426.62  17.24   50
10 712513.93  3637426.88  18.08   50

我尝试了许多不同的组合，包括:

  tapply(data$z, data$cluster, max)       # returns only the max value and cluster columns
  which.max(data$z)         # returns only the index of the max value in the entire table

我也仔细阅读了plyr包，但没有找到解决方案。

最佳答案

一个非常简单的方法是使用aggregate和merge:

> merge(aggregate(z ~ cluster, mydf, max), mydf)
  cluster     z        x       y
1      12 19.66 712520.7 3637426
2      17 18.16 712514.7 3637427
3     225 19.42 712518.4 3637426
4      50 18.23 712513.6 3637427

您甚至可以使用 tapply 代码的输出来获取您需要的内容。只需将其放入 data.frame 而不是命名向量即可。

> merge(mydf, data.frame(z = with(mydf, tapply(z, cluster, max))))
      z        x       y cluster
1 18.16 712514.7 3637427      17
2 18.23 712513.6 3637427      50
3 19.42 712518.4 3637426     225
4 19.66 712520.7 3637426      12

有关更多选项，请参阅 this question 中的答案.

关于r - 在R中，如何根据列属性的统计信息选择行？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16275640/

r - 在R中，如何根据列属性的统计信息选择行？

上一篇：python - 导入内容的对象范围如何工作？

下一篇：Spring OpenentityManagerInViewFilter 替代品