问题描述
我有一个相同大小的字符串列表,如下所示:
example.list <- c('BBCD','ABBC','ADDB','ACBB')
然后我想获取特定字母在特定位置出现的频率。 首先我将其转换为矩阵:
A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4
[1,] 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1
[2,] 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0
[3,] 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0
[4,] 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0
[5,] 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
现在我想获取每个列组合的频率。一些例子:
A1 : B2 = 2
A1 : B3 = 3
B1 : B2 = 1
.. etc
最佳答案
假设你的矩阵名为mat
# get all vars present in each row
present <- lapply(seq(nrow(mat)), function(i) names(which(mat[i,] == 1)))
# get all pairs
all.pairs <- gtools::combinations(n = ncol(mat), r = 2, colnames(mat))
# count times pairs appear
count <- apply(all.pairs, 1, function(x){
there <- lapply(x, function(y) sapply(present, `%in%`, x = y))
sum(Reduce(`&`, there))
})
cbind(all.pairs, count)[count > 0,]
# count
# [1,] "A1" "B2" "2"
# [2,] "A1" "B3" "3"
# [3,] "A1" "B4" "2"
# [4,] "A1" "C2" "1"
# [5,] "A1" "C4" "1"
# [6,] "A1" "D2" "1"
# [7,] "A1" "D3" "1"
# [8,] "A1" "D4" "1"
# [9,] "B1" "B2" "1"
# [10,] "B1" "C3" "1"
# [11,] "B1" "D4" "1"
# [12,] "B2" "B3" "2"
# [13,] "B2" "C3" "1"
# [14,] "B2" "C4" "1"
# [15,] "B2" "D4" "2"
# [16,] "B3" "B4" "1"
# [17,] "B3" "C2" "1"
# [18,] "B3" "C4" "1"
# [19,] "B3" "D4" "1"
# [20,] "B4" "C2" "1"
# [21,] "B4" "D2" "1"
# [22,] "B4" "D3" "1"
# [23,] "C3" "D4" "1"
# [24,] "D2" "D3" "1"
编辑:包含反向对,例如A1:B2 和 B2:A1 都定义 all.pairs
如下
all.pairs <- expand.grid(colnames(mat), colnames(mat))
关于R:所有列组合的频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52370283/