我有一个数据框,我必须在其中根据两个日期的差异创建一个新列。示例:
Col1 Col2 Col3 Date New_Column_Required
A X A 01/01/2001 Wave1
B Y Q 01/01/2001 Wave1
C Z N 01/01/2001 Wave1
D W M 02/01/2001 Wave2
E Q V 02/01/2001 Wave2
F R O 03/01/2001 Wave3
G S T 03/01/2001 Wave3
第二个日期 - 第一个日期应该是第 1 波,第三个日期 - 第二个日期应该是第 2 波,依此类推。我面临的问题是因为多个日期似乎无法弄清楚。
最佳答案
使用dplyr
我们可以将Date
更改为Date
类,并根据Date<来
并从 排列
它们first
值中减去 Date
。
library(dplyr)
df %>%
mutate(Date = lubridate::dmy(Date)) %>%
arrange(Date) %>%
mutate(new_col = paste0("Wave", Date - first(Date) + 1))
#OR
#mutate(new_col = paste0("Wave", as.integer(as.factor(Date))))
# Col1 Col2 Col3 Date new_col
#1 A X A 2001-01-01 Wave1
#2 B Y Q 2001-01-01 Wave1
#3 C Z N 2001-01-01 Wave1
#4 D W M 2001-01-02 Wave2
#5 E Q V 2001-01-02 Wave2
#6 F R O 2001-01-03 Wave3
#7 G S T 2001-01-03 Wave3
基础 R 中的逻辑相同:
df$Date = as.Date(df$Date, "%d/%m/%Y")
df <- df[order(df$Date), ]
transform(df, new_col = paste0('Wave', Date - Date[1] + 1))
数据
df <- structure(list(Col1 = c("A", "B", "C", "D", "E", "F", "G"), Col2 = c("X",
"Y", "Z", "W", "Q", "R", "S"), Col3 = c("A", "Q", "N", "M", "V",
"O", "T"), Date = c("01/01/2001", "01/01/2001", "01/01/2001",
"02/01/2001", "02/01/2001", "03/01/2001", "03/01/2001")), row.names = c(NA,
-7L), class = "data.frame")
关于r - 根据日期差异创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60091325/