假设这是我的数据集。
ID Group Col1 Col2
1 A 1.49 NA
2 A 0.12 NA
2 A NA 0.35
3 B NA 0.87
4 B NA 0.64
4 B NA 0.43
5 C 1.53 NA
5 C 0.38 NA
6 C 71.92 NA
7 D 0.25 0.88
7 D 0.92 0.10
7 D NA 0.53
7 D NA 0.60
如何创建这样的派生指标变量列。
ID Group Col1 Col2 Col_I
1 A 1.49 NA Col1, Col2 mutually exclusive
2 A 0.12 NA Col1, Col2 mutually exclusive
2 A NA 0.35 Col1, Col2 mutually exclusive
3 B NA 0.87 Only Col2
4 B NA 0.64 Only Col2
4 B NA 0.43 Only Col2
5 C 1.53 NA Only Col1
5 C 0.38 NA Only Col1
6 C 71.92 NA Only Col1
7 D 0.25 0.88 Col1, Col2 overlap, Col2 majority
7 D 0.92 0.10 Col1, Col2 overlap, Col2 majority
7 D NA 0.53 Col1, Col2 overlap, Col2 majority
7 D NA 0.60 Col1, Col2 overlap, Col2 majority
指标列Col_I
位于组级别
该指示栏指示 Col1、Col2 中的值是否互斥或重叠或仅 Col1 或仅 Col2。
我知道如何做到这一点,但它非常笨拙且无效。因此,任何建议或建议都非常重要。
数据
data <- structure(list(ID = c(1L, 2L, 2L, 3L, 4L, 4L, 5L, 5L, 6L, 7L, 7L, 7L, 7L),
Group = c("A", "A", "A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "D"),
Col1 = c(1.49, 0.12, NA, NA, NA, NA, 1.53, 0.38, 71.92, 0.25, 0.92, NA, NA),
Col2 = c(NA, NA, 0.35, 0.87, 0.64, 0.43, NA, NA, NA, 0.88, 0.1, 0.53, 0.6)),
class = "data.frame", row.names = c(NA, -13L))
最佳答案
通过您的示例数据,您可以使用:
library(tidyverse)
df <- read.table(text = "ID Group Col1 Col2
1 A 1.49 NA
2 A 0.12 NA
2 A NA 0.35
3 B NA 0.87
4 B NA 0.64
4 B NA 0.43
5 C 1.53 NA
5 C 0.38 NA
6 C 71.92 NA
7 D 0.25 0.88
7 D 0.92 0.10
7 D NA 0.53
7 D NA 0.60 ",
header = TRUE)
df %>%
group_by(Group) %>%
mutate(Col_I = case_when(all(!is.na(Col1)) & all(is.na(Col2)) ~ "Only Col1",
all(is.na(Col1)) & all(!is.na(Col2)) ~ "Only Col2",
sum(!is.na(coalesce(Col1, Col2))) == length(Col1) &
sum(!is.na(Col1)) + sum(!is.na(Col2)) == length(Col1) ~ "Col1, Col2 mutually exclusive",
sum(!is.na(Col1)) > sum(!is.na(Col2)) ~ "Col1, Col2 overlap, Col1 majority",
sum(!is.na(Col1)) < sum(!is.na(Col2)) ~ "Col1, Col2 overlap, Col2 majority",
TRUE ~ "Col1, Col2 overlap, no majority")) %>%
ungroup()
#> # A tibble: 13 × 5
#> ID Group Col1 Col2 Col_I
#> <int> <chr> <dbl> <dbl> <chr>
#> 1 1 A 1.49 NA Col1, Col2 mutually exclusive
#> 2 2 A 0.12 NA Col1, Col2 mutually exclusive
#> 3 2 A NA 0.35 Col1, Col2 mutually exclusive
#> 4 3 B NA 0.87 Only Col2
#> 5 4 B NA 0.64 Only Col2
#> 6 4 B NA 0.43 Only Col2
#> 7 5 C 1.53 NA Only Col1
#> 8 5 C 0.38 NA Only Col1
#> 9 6 C 71.9 NA Only Col1
#> 10 7 D 0.25 0.88 Col1, Col2 overlap, Col2 majority
#> 11 7 D 0.92 0.1 Col1, Col2 overlap, Col2 majority
#> 12 7 D NA 0.53 Col1, Col2 overlap, Col2 majority
#> 13 7 D NA 0.6 Col1, Col2 overlap, Col2 majority
按照 MrFlick 的建议使用xor
:
df %>%
group_by(Group) %>%
mutate(Col_I = case_when(all(!is.na(Col1)) & all(is.na(Col2)) ~ "Only Col1",
all(is.na(Col1)) & all(!is.na(Col2)) ~ "Only Col2",
all(xor(is.na(Col1), is.na(Col2))) ~ "Col1, Col2 mutually exclusive",
sum(!is.na(Col1)) > sum(!is.na(Col2)) ~ "Col1, Col2 overlap, Col1 majority",
sum(!is.na(Col1)) < sum(!is.na(Col2)) ~ "Col1, Col2 overlap, Col2 majority",
TRUE ~ "Col1, Col2 overlap, no majority")) %>%
ungroup()
#> # A tibble: 13 × 5
#> ID Group Col1 Col2 Col_I
#> <int> <chr> <dbl> <dbl> <chr>
#> 1 1 A 1.49 NA Col1, Col2 mutually exclusive
#> 2 2 A 0.12 NA Col1, Col2 mutually exclusive
#> 3 2 A NA 0.35 Col1, Col2 mutually exclusive
#> 4 3 B NA 0.87 Only Col2
#> 5 4 B NA 0.64 Only Col2
#> 6 4 B NA 0.43 Only Col2
#> 7 5 C 1.53 NA Only Col1
#> 8 5 C 0.38 NA Only Col1
#> 9 6 C 71.9 NA Only Col1
#> 10 7 D 0.25 0.88 Col1, Col2 overlap, Col2 majority
#> 11 7 D 0.92 0.1 Col1, Col2 overlap, Col2 majority
#> 12 7 D NA 0.53 Col1, Col2 overlap, Col2 majority
#> 13 7 D NA 0.6 Col1, Col2 overlap, Col2 majority
创建于 2023 年 11 月 28 日 reprex v2.0.2
关于r 派生指示变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/77560254/