R:所有列组合的频率

问题描述

我有一个相同大小的字符串列表，如下所示:

example.list <- c('BBCD','ABBC','ADDB','ACBB')

然后我想获取特定字母在特定位置出现的频率。首先我将其转换为矩阵:

     A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4
[1,]  0  1  0  0  0  1  0  0  0  0  1  0  0  0  0  1
[2,]  1  0  0  0  0  1  0  0  0  1  0  0  0  0  1  0
[3,]  1  0  0  0  0  0  0  1  0  0  0  1  0  1  0  0
[4,]  1  0  0  0  0  0  1  0  0  1  0  0  0  1  0  0
[5,]  1  0  0  0  0  1  0  0  0  1  0  0  0  0  0  1

现在我想获取每个列组合的频率。一些例子:

A1 : B2 = 2
A1 : B3 = 3
B1 : B2 = 1
.. etc

最佳答案

假设你的矩阵名为mat

# get all vars present in each row
present <- lapply(seq(nrow(mat)), function(i) names(which(mat[i,] == 1)))
# get all pairs
all.pairs <- gtools::combinations(n = ncol(mat), r = 2, colnames(mat))
# count times pairs appear
count <- apply(all.pairs, 1, function(x){
  there <- lapply(x, function(y) sapply(present, `%in%`, x = y))
  sum(Reduce(`&`, there))
})

cbind(all.pairs, count)[count > 0,]

#                 count
#  [1,] "A1" "B2" "2"  
#  [2,] "A1" "B3" "3"  
#  [3,] "A1" "B4" "2"  
#  [4,] "A1" "C2" "1"  
#  [5,] "A1" "C4" "1"  
#  [6,] "A1" "D2" "1"  
#  [7,] "A1" "D3" "1"  
#  [8,] "A1" "D4" "1"  
#  [9,] "B1" "B2" "1"  
# [10,] "B1" "C3" "1"  
# [11,] "B1" "D4" "1"  
# [12,] "B2" "B3" "2"  
# [13,] "B2" "C3" "1"  
# [14,] "B2" "C4" "1"  
# [15,] "B2" "D4" "2"  
# [16,] "B3" "B4" "1"  
# [17,] "B3" "C2" "1"  
# [18,] "B3" "C4" "1"  
# [19,] "B3" "D4" "1"  
# [20,] "B4" "C2" "1"  
# [21,] "B4" "D2" "1"  
# [22,] "B4" "D3" "1"  
# [23,] "C3" "D4" "1"  
# [24,] "D2" "D3" "1"

编辑:包含反向对，例如A1:B2 和 B2:A1 都定义 all.pairs 如下

all.pairs <- expand.grid(colnames(mat), colnames(mat))

关于R:所有列组合的频率，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52370283/

R:所有列组合的频率

上一篇：python - result.describe() 中的 "freq"是 True 还是 False？

下一篇：azure-devops - 是否可以在 Azure DevOps 中重命名工作项类型？