r - 计算 R 中矩阵中有序对的数量

标签 r matrix permutation

给定矩阵 m 如下(按行排列 1-5):

    # [,1] [,2] [,3] [,4] [,5]
 # [1,]    1    5    2    4    3
 # [2,]    2    1    4    3    5
 # [3,]    3    4    1    2    5
 # [4,]    4    1    3    2    5
 # [5,]    4    3    1    2    5
 # [6,]    1    4    2    3    5
 # [7,]    4    3    2    5    1
 # [8,]    4    1    3    5    2
 # [9,]    1    2    3    4    5
# [10,]    4    3    2    1    5

我想知道每个元素 1-5 在每行另一个元素之前的次数(即考虑所有可能的对)

例如,对于 (1, 5) 对, 15 之前,在所有行中 9 次。另一个例子,对于 (3, 1) 对, 31 之前,在所有行中 4 次。我希望所有行中所有可能的对都得到相同的结果。那是,
# (1, 2), (1, 3), (1, 4), (1, 5)
# (2, 1), (2, 3), (2, 4), (2, 5)
# (3, 1), (3, 2), (3, 4), (3, 5)
# (4, 1), (4, 2), (4, 3), (4, 5)
# (5, 1), (5, 2), (5, 3), (5, 4)

m <- structure(c(1L, 2L, 3L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 5L, 1L, 4L, 
1L, 3L, 4L, 3L, 1L, 2L, 3L, 2L, 4L, 1L, 3L, 1L, 2L, 2L, 3L, 3L, 
2L, 4L, 3L, 2L, 2L, 2L, 3L, 5L, 5L, 4L, 1L, 3L, 5L, 5L, 5L, 5L, 
5L, 1L, 2L, 5L, 5L), .Dim = c(10L, 5L))

如何在 R 中有效地做到这一点?

编辑

你会如何对这个矩阵做同样的事情?
      # [,1] [,2] [,3] [,4] [,5]
 # [1,]    3    4    1    5    0
 # [2,]    1    2    5    3    0
 # [3,]    3    5    0    0    0
 # [4,]    4    5    0    0    0
 # [5,]    3    4    1    5    2
 # [6,]    3    1    2    0    0
 # [7,]    4    1    5    2    0
 # [8,]    4    3    5    2    0
 # [9,]    5    2    0    0    0
# [10,]    5    4    2    0    0

m <- structure(c(3, 1, 3, 4, 3, 3, 4, 4, 5, 5, 4, 2, 5, 5, 4, 1, 1, 
3, 2, 4, 1, 5, 0, 0, 1, 2, 5, 5, 0, 2, 5, 3, 0, 0, 5, 0, 2, 2, 
0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0), .Dim = c(10L, 5L))

最佳答案

知道 (1) 每行没有重复,(2) 每行的 0 都聚集在最后,(3) nrow(m)ncol(m) 大 2-3 个数量级,我们可以遍历列搜索外观达到 0 时减少不必要的计算的特定数字:

ff = function(x, a, b)
{
    ia = rep_len(NA_integer_, nrow(x)) # positions of 'a' in each row
    ib = rep_len(NA_integer_, nrow(x)) # -//- of 'b'
    notfound0 = seq_len(nrow(x))  # rows that have not, yet, a 0
    for(j in seq_len(ncol(x))) {
        xj = x[notfound0, j]
        if(!length(xj)) break

        ia[notfound0[xj == a]] = j
        ib[notfound0[xj == b]] = j

        notfound0 = notfound0[xj != 0L]  # check if any more rows have 0 now on
    }

    i = ia < ib ## is 'a' before 'b'?

    ## return both a - b and b - a; no need to repeat computations
    data.frame(a = c(a, b), 
               b = c(b, a), 
               n = c(sum(i, na.rm = TRUE), sum(!i, na.rm = TRUE)))
}

在编辑过的 m 上:
ff(m, 3, 2)
# a b n
#1 3 2 3
#2 2 3 1
ff(m, 5, 1)
#  a b n
#1 5 1 0
#2 1 5 4

对于所有对:
xtabs(n ~ a + b, 
      do.call(rbind, 
              combn(5, 2, function(x) ff(m, x[1], x[2]), 
                    simplify = FALSE)))
#   b
#a   1 2 3 4 5
#  1 0 4 1 0 4
#  2 0 0 1 0 1
#  3 3 3 0 2 4
#  4 3 4 1 0 5
#  5 0 5 1 1 0

而且,它似乎在更大范围内是可以容忍的:
set.seed(007)
MAT = do.call(rbind, combinat::permn(8))[sample(1e4), ]
MAT[sample(length(MAT), length(MAT)*0.4)] = 0L #40% 0s
MAT = t(apply(MAT, 1, function(x) c(x[x != 0L], rep_len(0L, sum(x == 0L)))))
dim(MAT)
#[1] 10000     8

## including colonel's answer for a quick comparison
colonel = function(x, a, b)
{
    i = (which(!t(x - b)) - which(!t(x - a))) > 0L
    data.frame(a = c(a, b), b = c(b, a), n = c(sum(i), sum(!i)))
} 

microbenchmark::microbenchmark(ff(MAT, 7, 2), colonel(MAT, 7, 2))
#Unit: milliseconds
#               expr      min       lq     mean   median       uq       max neval cld
#      ff(MAT, 7, 2) 3.795003 3.908802 4.500453 3.972138 4.096377 45.926679   100   b
# colonel(MAT, 7, 2) 2.156941 2.231587 2.423053 2.295794 2.404894  3.775516   100  a 
#There were 50 or more warnings (use warnings() to see the first 50)

因此,仅将该方法简单地转换为循环就证明是足够有效的。更多的 0 也应该进一步减少计算时间。

关于r - 计算 R 中矩阵中有序对的数量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38241773/

相关文章:

r - 从R中的数据帧创建相关矩阵

regex - R - 具有特定排除的 grepl

r - R SplitRatio 参数中的 Sample.Split 必须是 i [0,1]

r - 使用多个 CPU,我可以使用哪些包来计算线性模型?

java - 获取 SQL Server 中我的表的所有唯一排列和组合 'where clause conditions'

r - 如何保存名称在变量中的对象?

r - R 中的连接矩阵

c++ - 矩阵中独立对角线的总和

java - OCaml:两组中每个值的排列? (如何从 Java 翻译这个)

algorithm - 递归交错排列