r - R 中按天的观察次数

标签 r

我正在使用一个如下所示的数据框:

date<-c("2012-02-01", "2012-02-01", "2012-02-03", "2012-02-04", "2012-02-04", "2012-02-05", "2012-02-09", "2012-02-12", "2012-02-12")
var<-c("a","b","c","d","e","f","g","h","i")
df1<-data.frame(date,var)

我想创建第二个数据框,将我每天的观察次数制成表格。在该数据框中,未提及的日期将为零...导致如下结果:
date<-c("2012-02-01","2012-02-02","2012-02-03","2012-02-04","2012-02-05","2012-02-06","2012-02-07","2012-02-08","2012-02-09","2012-02-10","2012-02-11","2012-02-12")
num<-c(2,0,1,2,1,0,0,0,1,0,0,2)
df2<-data.frame(date,num)

我已经用聚合函数尝试了很多东西,但无法弄清楚如何包含没有观察的日期(零)。

最佳答案

这是一种使用 data.table 的方法

library(data.table)
DF1 <- as.data.table(df1)
# coerce date to a date object
DF1[, date := as.IDate(as.character(date), format = '%Y-%m-%d')]
# setkey for joining
setkey(DF1, date)

# create a data.table that matches with a data.table containing
# a sequence from the minimum date to the maximum date
# nomatch = NA includes those non-matching. 
# .N is the number of rows in the subset data.frame
# this is 0 when there are no matches 
DF2 <- DF1[J(DF1[,seq(min(date), max(date), by = 1)]), .N, nomatch = NA]
DF2

          date N
 1: 2012-02-01 2
 2: 2012-02-02 0
 3: 2012-02-03 1
 4: 2012-02-04 2
 5: 2012-02-05 1
 6: 2012-02-06 0
 7: 2012-02-07 0
 8: 2012-02-08 0
 9: 2012-02-09 1
10: 2012-02-10 0
11: 2012-02-11 0
12: 2012-02-12 2

使用 reshape2::dcast 的方法

如果您确保您的 date列具有您希望制表的每一天的级别
df1$date <- with(df1, factor(date, levels = as.character(seq(min(as.Date(as.character(date))), max(as.Date(as.character(date))), by = 1 ))))


df2 <- dcast(df1, date~., drop = FALSE)

关于r - R 中按天的观察次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13131732/

相关文章:

r - 通过变量编号在AES中寻址x和y

r - 更改ggplot中线条的叠加顺序

r - 如果满足条件,如何将值添加到前一行

删除没有小数位的 double

R 频率表包含 0

r - 计算数字向量的变化率

r - 如何更改 R 中的 t-sne 距离?

r - 通过 R Shiny 进行逻辑回归

r - 协方差矩阵和相关矩阵之间的转换

r - nchar 在 Shiny 应用程序中计算两次特殊字符