r - 如何识别和统计R中的交叉项

我有一个数据框，它显示了三个颜色类别的成员资格。 数字是指唯一的 ID。一个 ID 可以是一组或多个组的一部分。

dat <- data.frame(BLUE = c(1, 2, 3, 4, 6, NA),
                  RED = c(2, 3, 6, 7, 9, 13),
                  GREEN = c(4, 6, 8, 9, 10, 11))

或用于视觉引用:

BLUE  RED  GREEN
1     2    4
2     3    6
3     6    8
4     7    9
6     9    10
NA    13   11

我需要识别和统计个人和跨组成员资格(即有多少 ID 仅显示为红色，有多少同时显示为红色和蓝色等)我想要的输出如下。请注意，ID 列仅供引用，该列不会出现在预期的输出中。

COLOR                TOTAL  IDs (reference only, not needed in final output)
RED                  2      (7, 13)
BLUE                 1      (1)
GREEN                3      (8, 10, 11)
RED, BLUE            3      (2, 3, 6)
RED, GREEN           2      (6, 9)
BLUE, GREEN          2      (4, 6)
RED, BLUE, GREEN     1      (6)

有谁知道在 R 中执行此操作的有效方法？谢谢!

最佳答案

您可以使用 venn库(特别适用于数据中没有 NA 的情况):

venn_table <- venn(as.list(dat))

               BLUE RED GREEN counts
                  0   0     0      0
GREEN             0   0     1      3
RED               0   1     0      2
RED:GREEN         0   1     1      1
BLUE              1   0     0      2
BLUE:GREEN        1   0     1      1
BLUE:RED          1   1     0      2
BLUE:RED:GREEN    1   1     1      1

和:

attr(venn_table, "intersections")

$GREEN
[1]  8 10 11

$RED
[1]  7 13

$`RED:GREEN`
[1] 9

$BLUE
[1]  1 NA

$`BLUE:GREEN`
[1] 4

$`BLUE:RED`
[1] 2 3

$`BLUE:RED:GREEN`
[1] 6

还包括 ID:

data.frame(venn_table[2:nrow(venn_table), ],
           ID = do.call("rbind", lapply(attr(venn_table, "intersections"), paste0, collapse = ",")))

               BLUE RED GREEN counts      ID
GREEN             0   0     1      3 8,10,11
RED               0   1     0      2    7,13
RED:GREEN         0   1     1      1       9
BLUE              1   0     0      2    1,NA
BLUE:GREEN        1   0     1      1       4
BLUE:RED          1   1     0      2     2,3
BLUE:RED:GREEN    1   1     1      1       6

处理 NA 的一种方法:

venn_table2 <- data.frame(venn_table[2:nrow(venn_table), length(venn_table), drop = FALSE],
                          ID = do.call("rbind", lapply(attr(venn_table, "intersections"), paste0, collapse = ",")))

counts <- venn_table2[1] - with(venn_table2, lengths(regmatches(ID, gregexpr("NA", ID))))

               counts
GREEN               3
RED                 2
RED:GREEN           1
BLUE                1
BLUE:GREEN          1
BLUE:RED            2
BLUE:RED:GREEN      1

处理 NA 的更优雅的方法可能是(基于@M-- 的评论):

print(venn(Map(function(x) x[!is.na(x)], as.list(dat))))

               BLUE RED GREEN counts
                  0   0     0      0
GREEN             0   0     1      3
RED               0   1     0      2
RED:GREEN         0   1     1      1
BLUE              1   0     0      1
BLUE:GREEN        1   0     1      1
BLUE:RED          1   1     0      2
BLUE:RED:GREEN    1   1     1      1

关于r - 如何识别和统计R中的交叉项，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58120146/

r - 如何识别和统计R中的交叉项

上一篇：python - “元组”对象没有属性 'layer'

下一篇：linux - 运行`Error: could not find tiller`时`helm version`