r - 如何计算未排序数据集的中位数

我有数千个这样的数据集:

>student1
    quantities score
[1]          4    10         
[2]          1    12         
[3]         78     5         
[4]          6   294

我想计算该学生的分数中位数。对于每个分数，我们都有一些数量。在本例中，我希望它返回 5，因为中位数是 78 个 5 之一。

我在这里看过一些帖子，例如 how to calculate the median on grouped dataset? ，但我无法使用它，因为我有数千个数据集。

我还尝试安装aroma.light包和matrixstats包，但我仍然无法使用“weighted.median function”的东西。它告诉我

Error: could not find function "weightedMedians"

好的，上面只是一个例子，我的真实数据集是这样的:

>test
     [,1]          [,2]
info    3            10
info    2            20
        4      86779637
        1        135777
        7          2342

但是当我尝试使用

>rep(test[, 1], test[, 2])

出现了

Error in rep(test[, 1], test[, 2]) : invalid 'times' argument
In addition: Warning message:
NAs introduced by coercion

我现在能做什么？

最佳答案

您可以使用:

median(rep(student1$score, student1$quantities))

这相对较快(对于 100k 行的模拟数据集只需要几秒钟)

关于r - 如何计算未排序数据集的中位数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24649813/

相关文章：

arrays - 找到两个排序数组的中位数的时间复杂度