我可能在这里做了一些愚蠢的事情,但希望得到一些帮助。我正在尝试对一些填写错误的数据进行分类。
df <- data.frame(ID = c("A", "A", "A","A", "A", "B", "B", "B", "B", "B"),
headache_y_n = c("Yes", "Yes", "Yes", "No", "Yes", "No", "No", "No", "Yes", "No"),
headache_days =c("2", "2", "2", "2", "2", "1", "1", "1", "1", "1"))
我希望能够说,如果每个 ID 的头痛_y_n 为"is"超过 3 次,则满足“延长”的标准,否则应为“短”。
因此,我想要以下输出:
output <- data.frame(ID = c("A", "A", "A","A", "A", "B", "B", "B", "B", "B"),
headache_y_n = c("Yes", "Yes", "Yes", "No", "Yes", "No", "No", "No", "Yes", "No"),
headache_days =c("2", "2", "2", "2", "2", "1", "1", "1", "1", "1"),
criteria =c("prolonged", "prolonged", "prolonged", "prolonged", "prolonged", "short", "short", "short", "short", "short"))
我的代码如下:
library(dplyr)
df %>% group_by(ID) %>% mutate(criteria=case_when(
sum(any(headache_y_n=="Yes") >= 3) ~ "prolonged",
TRUE ~ "short"
))
不幸的是,它不起作用,我收到以下错误:
Error: Problem with `mutate()` input `criteria`.
x LHS of case 1 (`sum(any(headache_y_n == "Yes") >= 3)`) must be a logical vector, not an integer vector.
ℹ Input `criteria` is `case_when(...)`.
ℹ The error occurred in group 1: ID = "A".
我不够聪明,无法弄清楚我哪里出了问题,因此我恳请您的帮助!
谢谢!
最佳答案
any
和sum
应该切换,即按“ID”分组后,我们正在计算"is"的数量,即 sum
逻辑表达式 ( headache_y_n == 'Yes'
),然后在 sum
之后创建第二个表达式>=3
,用 any
包裹它匹配(这里可能不需要,因为 sum
只是一个值)
library(dplyr)
df %>%
group_by(ID) %>%
mutate(criteria=case_when(
any(sum(headache_y_n=="Yes") >= 3) ~ "prolonged",
TRUE ~ "short"
))
即即使删除 any
,它返回相同的
df %>%
group_by(ID) %>%
mutate(criteria=case_when(
sum(headache_y_n=="Yes") >= 3 ~ "prolonged",
TRUE ~ "short"
))
# A tibble: 10 x 4
# Groups: ID [2]
# ID headache_y_n headache_days criteria
# <chr> <chr> <chr> <chr>
# 1 A Yes 2 prolonged
# 2 A Yes 2 prolonged
# 3 A Yes 2 prolonged
# 4 A No 2 prolonged
# 5 A Yes 2 prolonged
# 6 B No 1 short
# 7 B No 1 short
# 8 B No 1 short
# 9 B Yes 1 short
#10 B No 1 short
关于r - 使用 sum 对数据进行分类的 Case_when 问题 - R/dplyr 解决方案,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66359835/