r - 计算矩阵中类别/值之间的所有行和列转换，忽略顺序

我有一个矩阵或数据框，并且想要计算值之间的总转换(忽略转换顺序)，按行和按列。理想情况下，包括实际上不会发生的可能转换。小规模示例:
mat <- matrix(c(2, 1, 2, 1, 3, 1, 2, 1, 2), nrow = 3)

     [,1] [,2] [,3]
[1,]    2    1    2
[2,]    1    3    1
[3,]    2    1    2

期望的结果类似于:

cat1 cat2 n
 1    1   0
 1    2   8
 1    3   4   
 2    2   0
 2    3   0
 3    3   0

例如由第二列中的 1-3-1 加上第二行中的 1-3-1 产生的总共四个“1 - 3”转换。

非常感激!

最佳答案

这是一种方法:

library(dplyr)


left_to_right_transitions <- function(m)
{
    # Assemble a two column matrix that contains every left-to-right transition.
    nc <- ncol(m)
    matrix(
        c(m[, 1:(nc -1)], m[, 2:nc]),
        ncol = 2,
        dimnames = list(NULL, c('cat1', 'cat2'))
    )
}


count_transitions <- function(m)
{
    nr <- nrow(m)
    nc <- ncol(m)
    num.categories <- length(unique(as.vector(m)))

    # Create three mirror reflections of the original matrix.
    mt <- t(m)
    m.right.to.left <- m[, nc:1]
    mt.right.to.left <- mt[, nr:1]

    # Assemble a two column matrix that contains every transition that occurs.
    transitions <- rbind(
        left_to_right_transitions(m),
        left_to_right_transitions(m.right.to.left),
        left_to_right_transitions(mt),
        left_to_right_transitions(mt.right.to.left)
    )

    # Count the total number of transitions for each kind that occurs.
    count <-
        transitions %>%
        as.data.frame %>%
        filter(cat1 <= cat2) %>%
        group_by(cat1, cat2) %>%
        count

    # Join `count` to a table of all possible transitions to get the full count table.
    # Note that this assumes the categories are labeled 1:num.categories.
    combn(num.categories + 1, 2) %>%
        t %>%
        as.data.frame %>%
        rename(cat1 = V1, cat2 = V2) %>%
        mutate(cat2 = cat2 - 1) %>%
        left_join(count, by = c('cat1', 'cat2')) %>%
        mutate(
            n = ifelse(is.na(n), 0, n),
            # Remove double counting of transitions with no-state change:
            n = ifelse(cat1 == cat2, n/2, n)
        )
}

上面的想法是创建一个函数，该函数创建一个两列矩阵，其中输入矩阵中的所有从左到右的转换m .那么这个函数可以应用于m的镜面反射获得从右到左、从上到下和从下到上的过渡。然后我们对四个转换矩阵进行行绑定(bind)并应用一些 dplyr 删除转换的重复计数并计算每种类型的转换数量的功能。最后，我们将转换计数表加入到所有可能转换的完整表中。

现在让我们申请count_transitions举几个例子:

set.seed(1)
m1 <- matrix(c(2, 1, 2, 1, 3, 1, 2, 1, 2), nrow = 3)
m2 <- matrix(sample(1:4, size = 16, replace = TRUE), nrow = 4)
m3 <- matrix(sample(1:9, size = 1e6, replace = TRUE), nrow = 1e3)

m1
#      [,1] [,2] [,3]
# [1,]    2    1    2
# [2,]    1    3    1
# [3,]    2    1    2
count_transitions(m1)
#   cat1 cat2 n
# 1    1    1 0
# 2    1    2 8
# 3    1    3 4
# 4    2    2 0
# 5    2    3 0
# 6    3    3 0

m2
#      [,1] [,2] [,3] [,4]
# [1,]    2    1    3    3
# [2,]    2    4    1    2
# [3,]    3    4    1    4
# [4,]    4    3    1    2
count_transitions(m2)
#    cat1 cat2 n
# 1     1    1 2
# 2     1    2 3
# 3     1    3 3
# 4     1    4 4
# 5     2    2 1
# 6     2    3 2
# 7     2    4 3
# 8     3    3 1
# 9     3    4 4
# 10    4    4 1

count_transitions功能似乎也相当快:

library(microbenchmark)
microbenchmark(count_transitions(m3), times = 10)
# Unit: milliseconds
#                   expr      min       lq     mean  median       uq      max neval
#  count_transitions(m3) 333.3395 334.3705 338.0282 335.945 337.0059 359.5586    10

关于r - 计算矩阵中类别/值之间的所有行和列转换，忽略顺序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49807661/

r - 计算矩阵中类别/值之间的所有行和列转换，忽略顺序

上一篇：java - Storm/Kafka - 无法获得 kafka 的偏移滞后

下一篇：python - 使用 lxml 中的 Element 用换行符打印每个属性