r - 根据 R 中的日期通过线性插值进行数据插补

标签 r date interpolation imputation

我有一个来自不同地 block 的矿物氮值的大型数据集,其中包括一些我们无法采样的日期缺失的数据。众所周知,土壤中的矿物质氮值在采样之间线性变化。

为了简化起见,我创建了一个数据框,其中有 10 个图,其中有 4 个日期(它们之间的距离不同),其中一个日期缺少数据:

df <- data.frame(plot= c(1,2,3,4,5,6,7,8,9,10),
                  date = c("2020-10-01", "2020-10-01","2020-10-01","2020-10-01","2020-10-01","2020-10-01","2020-10-01","2020-10-01","2020-10-01","2020-10-01",
                           "2020-10-08", "2020-10-08","2020-10-08","2020-10-08","2020-10-08","2020-10-08","2020-10-08","2020-10-08","2020-10-08","2020-10-08",
                           "2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29",
                           "2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05"),
                  Nmin = c(100, 120,  50,  60,  70,  80, 100,  70,  30,  50,  90, 130,  60,  60,  60,  90, 105,  60,  25,  40,  NA,  NA,  NA,  NA,  NA,  NA,  
                           NA,  NA,  NA,  NA, 50, 170, 100, 60,  20, 130, 125,  20,   5,   0))
df$date <- as.Date(df$date, format="%d.%m.%Y")
df$Nmin <- as.numeric(df$Nmin)

是否有一个函数可以按图计算 Nmin 的缺失值,并考虑采样之间的时间(日期)

最佳答案

使用大约

df <- transform(df, flag=ifelse(is.na(Nmin), 1, 0))  ## set flag for sake of identification

res <- by(df, df$plot, transform, Nmin=approx(date, Nmin, date)$y) |> unsplit(df$plot)
res
#    plot       date Nmin flag
# 1     1 2020-10-01  100    0
# 2     2 2020-10-01  120    0
# 3     3 2020-10-01   50    0
# 4     4 2020-10-01   60    0
# 5     5 2020-10-01   70    0
# 6     6 2020-10-01   80    0
# 7     7 2020-10-01  100    0
# 8     8 2020-10-01   70    0
# 9     9 2020-10-01   30    0
# 10   10 2020-10-01   50    0
# 11    1 2020-10-08   90    0
# 12    2 2020-10-08  130    0
# 13    3 2020-10-08   60    0
# 14    4 2020-10-08   60    0
# 15    5 2020-10-08   60    0
# 16    6 2020-10-08   90    0
# 17    7 2020-10-08  105    0
# 18    8 2020-10-08   60    0
# 19    9 2020-10-08   25    0
# 20   10 2020-10-08   40    0
# 21    1 2020-10-29   60    1
# 22    2 2020-10-29  160    1
# 23    3 2020-10-29   90    1
# 24    4 2020-10-29   60    1
# 25    5 2020-10-29   30    1
# 26    6 2020-10-29  120    1
# 27    7 2020-10-29  120    1
# 28    8 2020-10-29   30    1
# 29    9 2020-10-29   10    1
# 30   10 2020-10-29   10    1
# 31    1 2020-11-05   50    0
# 32    2 2020-11-05  170    0
# 33    3 2020-11-05  100    0
# 34    4 2020-11-05   60    0
# 35    5 2020-11-05   20    0
# 36    6 2020-11-05  130    0
# 37    7 2020-11-05  125    0
# 38    8 2020-11-05   20    0
# 39    9 2020-11-05    5    0
# 40   10 2020-11-05    0    0

让我们看一下剧情。

clr <- rainbow(10)
with(res, plot(Nmin ~ date, type='n'))
by(res, res$plot, with, points(jitter(Nmin) ~ date, type='b', pch=ifelse(flag == 1, 21, 16), col=clr[plot], bg='white'))
legend('topleft', legend=paste('plot', 1:10), lty=1, col=clr, ncol=4, bty='n', cex=.7)

enter image description here

注意:对于非线性插值/外推,请参阅 this answer .


数据:

df <- structure(list(plot = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 
3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 
3, 4, 5, 6, 7, 8, 9, 10), date = structure(c(18536, 18536, 18536, 
18536, 18536, 18536, 18536, 18536, 18536, 18536, 18543, 18543, 
18543, 18543, 18543, 18543, 18543, 18543, 18543, 18543, 18564, 
18564, 18564, 18564, 18564, 18564, 18564, 18564, 18564, 18564, 
18571, 18571, 18571, 18571, 18571, 18571, 18571, 18571, 18571, 
18571), class = "Date"), Nmin = c(100, 120, 50, 60, 70, 80, 100, 
70, 30, 50, 90, 130, 60, 60, 60, 90, 105, 60, 25, 40, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 50, 170, 100, 60, 20, 130, 125, 
20, 5, 0), flag = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA, -40L
))

关于r - 根据 R 中的日期通过线性插值进行数据插补,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72964018/

相关文章:

r - kde2d 密度对比

替换多个 (3+) 大写字母之间的空格

date - 我如何根据用户的时间显示诸如早安、下午或晚上之类的问候语

用于重新采样数组的 Javascript 库?

r - 如何删除基于两列的所有重复行?

r - 从 R Shiny renderTable 中排除行名称

java - 只有一个表达式可以获取昨天的日期和本月的第一天

mysql - 如何计算mySql中给定范围内日期的差异

python - 如何使用 scipy.interpolate 中的网格数据

Python:通过非整数比例因子重新调整 pandas 中的时间序列