r - 如何用组内以前的非 NaN 替换 NaN 值

标签 r dplyr

我需要用组中以前的非 NaN 值替换 NaN 值。

这是一个例子:

+-------+------------+-------+
| ts_id |    date    | value |
+-------+------------+-------+
|     2 | 01/10/2014 | 18    |
|     2 | 01/11/2014 | 15    |
|     2 | 01/12/2014 | NaN   |
|     2 | 01/01/2015 | NaN   |
|     2 | 01/02/2015 | NaN   |
|     3 | 01/03/2015 | 19    |
|     3 | 01/04/2015 | 20    |
|     3 | 01/10/2015 | 12    |
|     3 | 01/11/2015 | 17    |
|     3 | 01/12/2015 | NaN   |
|     3 | 01/01/2016 | NaN   |
|     3 | 01/08/2016 | 7     |
|     3 | 01/09/2016 | NaN   |
|     3 | 01/10/2016 | NaN   |
|     3 | 01/11/2016 | NaN   |
|     3 | 01/12/2016 | NaN   |
|     3 | 01/01/2017 | NaN   |
+-------+------------+-------+

数据:

data <- structure(list(ts_id = c(2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 
                         3, 3, 3, 3, 3), date = structure(c(16344, 16375, 16405, 16436, 
                                                            16467, 16495, 16526, 16709, 16740, 16770, 16801, 17014, 17045, 
                                                            17075, 17106, 17136, 17167), class = "Date"), value = c(18, 15, 
                                                                                                                    NaN, NaN, NaN, 19, 20, 12, 17, NaN, NaN, 7, NaN, NaN, NaN, NaN, 
                                                                                                                    NaN)), row.names = c(NA, -17L), vars = "ts_id", drop = TRUE, indices = list(
                                                                                                                      0:16), group_sizes = 17L, biggest_group_size = 17L, labels = structure(list(
                                                                                                                        ts_id = 3L), row.names = c(NA, -1L), class = "data.frame", vars = "ts_id", drop = TRUE), class = "data.frame")

在每个组中(由 ts_id 标识),我可以在任何给定日期拥有 NaN 值。我需要用最近的非 NaN 值替换每个 NaN。

结果应如下所示:

+-------+------------+-------+
| ts_id |    date    | value |
+-------+------------+-------+
|     2 | 01/10/2014 |    18 |
|     2 | 01/11/2014 |    15 |
|     2 | 01/12/2014 |    15 |
|     2 | 01/01/2015 |    15 |
|     2 | 01/02/2015 |    15 |
|     3 | 01/03/2015 |    19 |
|     3 | 01/04/2015 |    20 |
|     3 | 01/10/2015 |    12 |
|     3 | 01/11/2015 |    17 |
|     3 | 01/12/2015 |    17 |
|     3 | 01/01/2016 |    17 |
|     3 | 01/08/2016 |     7 |
|     3 | 01/09/2016 |     7 |
|     3 | 01/10/2016 |     7 |
|     3 | 01/11/2016 |     7 |
|     3 | 01/12/2016 |     7 |
|     3 | 01/01/2017 |     7 |
+-------+------------+-------+

提前致谢。

最佳答案

你可以用这个:

library(dplyr)
library(zoo) # for the na.locf function
data %>% 
  group_by(ts_id) %>% # group by id
  mutate(value = na.locf(value,na.rm=F)) # na.locf fills with the last non-empty value

#head()
# # A tibble: 6 x 3
# # Groups:   ts_id [2]
# ts_id date       value
# <dbl> <date>     <dbl>
# 1     2 2014-10-01    18
# 2     2 2014-11-01    15
# 3     2 2014-12-01    15
# 4     2 2015-01-01    15
# 5     2 2015-02-01    15
# 6     3 2015-03-01    19

关于r - 如何用组内以前的非 NaN 替换 NaN 值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53210281/

相关文章:

r - 将不规则降水时间序列转换为规则降水时间序列

r - 使用 ggplot2 绘制 map - 创建填充框的掩码,不包括单个国家/地区

r - 具有多列输出的 mutate rnorm

r - 如何在不使用 CAS 的情况下将 R 表达式转换为 LaTeX/TeX?

r - 如何在 R 中将文件从一个文件夹移动到另一个文件夹?

R Shiny : async downloadHandler

r - 创建 "other"字段

r - 基于连续行值创建新的数据框

r - 在 R 中,使用日期范围内的事件计数创建变量

r - 以编程方式使用带参数的 dplyr::case_when