R:所有列组合的频率

标签 r frequency find-occurrences frequency-analysis

问题描述

我有一个相同大小的字符串列表,如下所示:

example.list <- c('BBCD','ABBC','ADDB','ACBB')

然后我想获取特定字母在特定位置出现的频率。 首先我将其转换为矩阵:

     A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4
[1,]  0  1  0  0  0  1  0  0  0  0  1  0  0  0  0  1
[2,]  1  0  0  0  0  1  0  0  0  1  0  0  0  0  1  0
[3,]  1  0  0  0  0  0  0  1  0  0  0  1  0  1  0  0
[4,]  1  0  0  0  0  0  1  0  0  1  0  0  0  1  0  0
[5,]  1  0  0  0  0  1  0  0  0  1  0  0  0  0  0  1

现在我想获取每个列组合的频率。一些例子:

A1 : B2 = 2
A1 : B3 = 3
B1 : B2 = 1
.. etc

最佳答案

假设你的矩阵名为mat

# get all vars present in each row
present <- lapply(seq(nrow(mat)), function(i) names(which(mat[i,] == 1)))
# get all pairs
all.pairs <- gtools::combinations(n = ncol(mat), r = 2, colnames(mat))
# count times pairs appear
count <- apply(all.pairs, 1, function(x){
  there <- lapply(x, function(y) sapply(present, `%in%`, x = y))
  sum(Reduce(`&`, there))
})

cbind(all.pairs, count)[count > 0,]

#                 count
#  [1,] "A1" "B2" "2"  
#  [2,] "A1" "B3" "3"  
#  [3,] "A1" "B4" "2"  
#  [4,] "A1" "C2" "1"  
#  [5,] "A1" "C4" "1"  
#  [6,] "A1" "D2" "1"  
#  [7,] "A1" "D3" "1"  
#  [8,] "A1" "D4" "1"  
#  [9,] "B1" "B2" "1"  
# [10,] "B1" "C3" "1"  
# [11,] "B1" "D4" "1"  
# [12,] "B2" "B3" "2"  
# [13,] "B2" "C3" "1"  
# [14,] "B2" "C4" "1"  
# [15,] "B2" "D4" "2"  
# [16,] "B3" "B4" "1"  
# [17,] "B3" "C2" "1"  
# [18,] "B3" "C4" "1"  
# [19,] "B3" "D4" "1"  
# [20,] "B4" "C2" "1"  
# [21,] "B4" "D2" "1"  
# [22,] "B4" "D3" "1"  
# [23,] "C3" "D4" "1"  
# [24,] "D2" "D3" "1" 

编辑:包含反向对,例如A1:B2 和 B2:A1 都定义 all.pairs 如下

all.pairs <- expand.grid(colnames(mat), colnames(mat))

关于R:所有列组合的频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52370283/

相关文章:

python - 访问列表中的重复元素并打印其旁边的元素

r - 斯 PIL 曼等级相关

r - 以 Shiny 的方式处理多个输入

android - FSK 调制和在 Android 中播放正弦音调

javascript - 将数组推送到一个数组,然后循环遍历新数组

php - 计数的计数- PHP MYSQL

java - RFC5545。同时计算 RRULE 和 EXDATE (EXRULE) 的事件发生次数

r - 通过降低 R 中的频率来排序绑定(bind)的 data.frame

r - 分隔包含值的行

c++ - 在 C++ 中使用多重函数合并两个链表