我有一个看起来像这样的数据集:
Recipient ID
(chr) (chr)
Smith C
Wells S
Wells S
Jones S
Jones N
Wu C
Wu N
Wu S
我想改变一个新变量,它是“唯一”或“多个”,基于收件人是否出现一次(唯一),收件人出现不止一次但每次出现都具有相同的 ID(唯一),收件人出现不止一次并且有 1 个或多个 ID(多个)。我试过使用:
df %>%
group_by(Recipient, ID) %>%
mutuate(Freq = case_when(
str_count(Recipient) == 1 & str_count(ID) == 1 ~ "Unique",
str_count(Recipient) > 2 & str_count(ID) == 1 ~ "Unique",
str_count(Recipient) > 2 & str_count(ID) > 1 ~ "Multiple"))
当我这样做时,所有的值都是多个:
Recipient ID Freq
(chr) (chr) (chr)
Smith C Multiple (should be Unique)
Wells S Multiple (should be Unique)
Wells S Multiple (should be Unique)
Jones S Multiple
Jones N Multiple
Wu C Multiple
Wu N Multiple
Wu S Multiple
试了很多次,还是破解不了。任何人都可以帮助解决这个问题,或者推荐一种更简单的编码方法吗?谢谢!
最佳答案
n_distinct()
的可能解决方案:
library(dplyr)
df %>%
group_by(Recipient) %>%
mutate(Freq = ifelse(n_distinct(ID) == 1, "unique", "multiple")) %>%
ungroup()
# A tibble: 8 x 3
Recipient ID Freq
<chr> <chr> <chr>
1 Smith C unique
2 Wells S unique
3 Wells S unique
4 Jones S multiple
5 Jones N multiple
6 Wu C multiple
7 Wu N multiple
8 Wu S multiple
数据
df <- structure(list(Recipient = c("Smith", "Wells", "Wells", "Jones",
"Jones", "Wu", "Wu", "Wu"), ID = c("C", "S", "S", "S", "N", "C",
"N", "S")), class = "data.frame", row.names = c(NA, -8L))
关于r - 如何根据 2 列的条件改变 R dplyr 中的新变量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71949981/