大家好我有这个问题,我想计算日期之间的天数,前提如下:
- 状态A为基准日期,所有计算均以该日期为基准(按ID分组)
- 对于状态 B、C、D,我必须选择较早的日期
- 我必须计算天数并在不同的列中显示
例如
用R生成表格
ColID = c(1, 1, 1, 1, 1, 2, 2, 2)
ColStatus = c("A", "B", "B", "C", "D", "A", "C", "C")
ColDate = c("01/01/2018","02/03/2018", "05/04/2018", "04/05/2018", "04/05/2018", "02/01/2018", "04/03/2018", "05/04/2018")
data.frame(ColID, ColStatus, ColDate)
我如何进行计算
For ColID = 1
Status A = 01/01/2018
Status B (I Have to select the older one) = 02/03/2018
Status C = 04/05/2018
Status D = 04/05/2018
ResultColB = 02/03/2018 - 01/01/2018 = 60
ResultColC = 04/05/2018 - 01/01/2018 = 123
ResultColD = 04/05/2018 - 01/01/2018 = 123
结果表(以天为单位)
用R生成表格
ColID = c(1,2)
ResultColStatusB = c(60,0)
ResultColStatusC = c(123,61)
data.frame(ColID, ResultColStatusB, ResultColStatusC, ResultColStatusB)
这个问题可以用 R、Python 或 SQL 来解决,有什么建议可以解决这个问题?
最佳答案
这是 R 中的 tidyverse
解决方案:
library(lubridate)
library(tidyverse)
df %>%
group_by(ColID, ColStatus) %>%
summarise(min_date = min(parse_date_time(ColDate, "%d/%m/%Y"))) %>%
group_by(ColID) %>%
summarise(a_b = as.period(interval(min_date[ColStatus=="A"],
min_date[ColStatus=="B"])) %/% days(1) - 1,
a_c = as.period(interval(min_date[ColStatus=="A"],
min_date[ColStatus=="C"])) %/% days(1) - 1,
a_d = as.period(interval(min_date[ColStatus=="A"],
min_date[ColStatus=="D"])) %/% days(1) - 1) %>%
mutate_all(funs(if_else(is.na(.), 0, .)))
输出:
ColID a_b a_c a_d
<dbl> <dbl> <dbl> <dbl>
1 1. 60. 123. 123.
2 2. 0. 61. 0.
解释:
- 将
ColDate
转换为日期类型,获取每个ColStatus
的最早日期 - 计算
A
和其他ColStatus
值之间的时间间隔(以天为单位)。 - 将
NA
值转换为0
。
R 中使用 data.table
的另一种方法:
setDT(df)
#convert string Dates to integer
df[, ColDate := as.numeric(as.Date(ColDate, format="%d/%m/%Y"))]
cols <- c("B","C","D")
#pivot table without A into a wide table
dcast(df[ColStatus!="A"], ColID ~ ColStatus, min, fill=NA_integer_, value.var="ColDate")[
#join with table containing only A and do the differencing
df[ColStatus=="A"], on=.(ColID),
(cols) := lapply(.SD, function(x) x - i.ColDate), .SDcols=cols]
或者使用base
R:
df$ColDate <- as.integer(as.Date(df$ColDate, format="%d/%m/%Y"))
cols <- c("B","C","D")
by(df, df$ColID, function(x) {
aDate <- x$ColDate[x$ColStatus=="A"]
vapply(cols,
function(id) if(any(x$ColStatus==id)) min(x$ColDate[x$ColStatus==id]) - aDate
else NA_integer_,
integer(1))
})
数据:
ColID = c(1, 1, 1, 1, 1, 2, 2, 2)
ColStatus = c("A", "B", "B", "C", "D", "A", "C", "C")
ColDate = c("01/01/2018","02/03/2018", "05/04/2018", "04/05/2018", "04/05/2018", "02/01/2018", "04/03/2018", "05/04/2018")
df <- data.frame(ColID, ColStatus, ColDate)
关于python - 根据sql、r或python中的不同参数计算列之间的结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51534069/