假设我有以下数据框。我想用最常出现的响应 a
df <- read.table(text = "id result
1 a
2 a
3 a
4 b
5 NA", header = T)
我正在寻找这样的东西:
calculate_mode <- function(x) {
uniqx <- unique(x)
uniqx[which.max(tabulate(match(x, uniqx)))]
}
df = df %>%
mutate(result = ifelse(is.na(result), calculate_mode(result), result))
但我不确定在定义自定义函数之外是否有更“整洁”的方法来执行此操作。
最佳答案
library(dplyr)
library(tidyr)
# manually get the most frequent values and tidyr::replace_na
most_value <- table(df$result) %>% sort(decreasing = TRUE) %>%
head(1) %>% names()
df %>% replace_na(list(result = most_value))
#> id result
#> 1 1 a
#> 2 2 a
#> 3 3 a
#> 4 4 b
#> 5 5 a
动态应用于多列
# do it acorss multiple column - still kind of using functions
most <- function(x) {
table(x) %>% sort(decreasing = TRUE) %>% head(1) %>% names()
}
multiple_column <- left_join(df, df, by = "id")
multiple_column
#> id result.x result.y
#> 1 1 a a
#> 2 2 a a
#> 3 3 a a
#> 4 4 b b
#> 5 5 <NA> <NA>
multiple_column %>%
mutate(across(.cols = starts_with("result"), .fns = function(x) {
if_else(is.na(x), most(x), x)
}))
#> id result.x result.y
#> 1 1 a a
#> 2 2 a a
#> 3 3 a a
#> 4 4 b b
#> 5 5 a a
由 reprex package 于 2021 年 4 月 24 日创建(v2.0.0)
关于将 NA 值替换为 dplyr 中因子变量的模态值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67238007/