我想计算特定值在多个列中出现的次数,并将出现次数放入新列中。我的数据集有很多缺失值,但只有当整行仅由 NA 组成时,它才应该返回 NA。如果可能的话,我更喜欢与 dplyr 管道一起使用的东西。
示例数据集:
df <- data.frame(c1 = sample(1:4, 20, replace = TRUE),
c2 = sample(1:4, 20, replace = TRUE),
c3 = sample(1:4, 20, replace = TRUE),
c4 = sample(1:4, 20, replace = TRUE),
c5 = sample(1:4, 20, replace = TRUE))
for (i in 1:5) {
df[sample(1:20, 1), sample(1:5, 1)] <- NA
df[sample(1:20, 1), ] <- NA
}
c1 c2 c3 c4 c5
1 1 2 4 4 1
2 2 2 1 3 4
3 2 4 4 3 3
4 4 2 3 2 1
5 4 2 4 1 3
6 NA 1 2 4 4
7 3 NA 4 NA 4
8 NA NA NA NA NA
9 1 3 3 2 2
10 NA NA NA NA NA
我尝试过 rowwise() 和 rowSums。这里有一些非工作示例:
# First attempt
df <- df %>%
rowwise() %>%
mutate(count2 = sum(c_across(c1:c5, ~.x %in% 2)))
# Second attempt
df <- df %>%
rowwise() %>%
mutate(count2 = sum(c_across(select(where(c1:c5 %in% 2)))))
# With rowSums
df <- df %>%
rowwise() %>%
mutate(count4 = rowSums(select(c1:c5 %in% 4), na.rm = TRUE))
最佳答案
这个怎么样:
library(dplyr)
df <- data.frame(c1 = sample(1:4, 20, replace = TRUE),
c2 = sample(1:4, 20, replace = TRUE),
c3 = sample(1:4, 20, replace = TRUE),
c4 = sample(1:4, 20, replace = TRUE),
c5 = sample(1:4, 20, replace = TRUE))
for (i in 1:5) {
df[sample(1:20, 1), sample(1:5, 1)] <- NA
df[sample(1:20, 1), ] <- NA
}
df %>%
rowwise() %>%
mutate(count2 = sum(na.omit(c_across(c1:c5)) == 2),
count2 = ifelse(all(is.na(c_across(c1:c5))), NA, count2))
#> # A tibble: 20 × 6
#> # Rowwise:
#> c1 c2 c3 c4 c5 count2
#> <int> <int> <int> <int> <int> <int>
#> 1 NA NA NA NA NA NA
#> 2 2 2 3 4 2 3
#> 3 1 1 1 4 4 0
#> 4 2 3 3 2 4 2
#> 5 NA NA NA NA NA NA
#> 6 1 1 1 2 1 1
#> 7 3 3 2 3 4 1
#> 8 1 1 4 3 4 0
#> 9 NA NA NA NA NA NA
#> 10 NA NA NA NA NA NA
#> 11 2 3 3 4 1 1
#> 12 2 1 4 2 NA 2
#> 13 4 4 2 NA 2 2
#> 14 4 2 3 3 2 2
#> 15 1 3 4 2 2 2
#> 16 1 1 3 3 2 1
#> 17 1 1 1 4 4 0
#> 18 2 4 4 NA 1 1
#> 19 NA NA NA NA NA NA
#> 20 4 1 1 NA 4 0
由 reprex package 于 2022 年 12 月 8 日创建(v2.0.1)
关于r - 如何计算多列中特定值的出现次数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74732636/