r - 按时间对数据帧进行子采样，保持第二个值的顶部

我有一个包含带小数秒的时间戳的数据框。每秒不止一行，我想过滤成一行。我想在每一秒的顶部或之前提取值。

这是一个数据示例:

 > head(sg1, 13)
                      time  count
1  2013-02-25 15:55:35.941      0
2  2013-02-25 15:55:36.042   8263
3  2013-02-25 15:55:36.144 147536
4  2013-02-25 15:55:36.243 165041
5  2013-02-25 15:55:36.342 126064
6  2013-02-25 15:55:36.441 100275
7  2013-02-25 15:55:36.542 101944
8  2013-02-25 15:55:36.647 108880
9  2013-02-25 15:55:36.742  86690
10 2013-02-25 15:55:36.842  74476
11 2013-02-25 15:55:36.941  76285
12 2013-02-25 15:55:37.042  79145
13 2013-02-25 15:55:37.141  84434

其中，我想选择第 1 行和第 11 行。

> dput(head(sg1, 13))
structure(list(time = structure(c(1361807735.942, 1361807736.042, 
1361807736.145, 1361807736.244, 1361807736.343, 1361807736.442, 
1361807736.542, 1361807736.647, 1361807736.742, 1361807736.842, 
1361807736.942, 1361807737.042, 1361807737.142), class = c("POSIXct", 
"POSIXt"), tzone = "GMT"), count = c(0L, 8263L, 147536L, 165041L, 
126064L, 100275L, 101944L, 108880L, 86690L, 74476L, 76285L, 79145L, 
84434L)), .Names = c("time", "count"), row.names = c(NA, 13L), class = "data.frame")

最佳答案

难的是你想要

the values at or immediately before the top of each second.

因此，将时间四舍五入并取最大的一个不太有效，因为如果第二个的顶部有一个，它就会被放入错误的组。这种方法可以正确处理这种情况。

library("lubridate")
library("plyr")
ddply(sg1, .(ceiling_date(time, unit="second")), function(DF) {
  DF[which.max(DF$time - ceiling_date(DF$time)),]
})[,-1]

给出

                 time count
1 2013-02-25 15:55:35     0
2 2013-02-25 15:55:36 76285
3 2013-02-25 15:55:37 84434

为了证明这适用于一秒，将一秒添加到数据集中。

sg2 <- rbind(sg1, 
structure(list(time=structure(1361807737, class=c("POSIXct", "POSIXt"), 
tzone="GMT"), count=c(34567L)), .Names = c("time", "count"), row.names=c(NA,1L),
class="data.frame"))
sg2 <- sg2[order(sg2$time),]

ddply(sg2, .(ceiling_date(time, unit="second")), function(DF) {
  DF[which.max(DF$time - ceiling_date(DF$time)),]
})[,-1]

现在返回“上一”秒的新行。

                 time count
1 2013-02-25 15:55:35     0
2 2013-02-25 15:55:37 34567
3 2013-02-25 15:55:37 84434

关于r - 按时间对数据帧进行子采样，保持第二个值的顶部，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15349056/

r - 按时间对数据帧进行子采样，保持第二个值的顶部

上一篇：c - quick_exit 和 at_quick_exit 函数有哪些实际应用？

下一篇：无法正确打印 txt 文件