r - 提升值计算

标签 r matrix data-mining

我有一个(对称的)邻接矩阵,它是根据报纸文章(例如:a、b、c、d)中名称(例如:Greg、Mary、Sam、Tom)的共现创建的。见下文。

如何计算提升值 对于非零矩阵元素( http://en.wikipedia.org/wiki/Lift_(data_mining) )?

我会对一个有效的实现感兴趣,它也可以用于非常大的矩阵(例如一百万个非零元素)。

我很感激任何帮助。

# Load package
library(Matrix)

# Data
A <- new("dgTMatrix"
    , i = c(2L, 2L, 2L, 0L, 3L, 3L, 3L, 1L, 1L)
    , j = c(0L, 1L, 2L, 0L, 1L, 2L, 3L, 1L, 3L)
    , Dim = c(4L, 4L)
    , Dimnames = list(c("Greg", "Mary", "Sam", "Tom"), c("a", "b", "c", "d"))
    , x = c(1, 1, 1, 1, 1, 1, 1, 1, 1)
    , factors = list()
)

# > A
# 4 x 4 sparse Matrix of class "dgTMatrix"
#      a b c d
# Greg 1 . . .
# Mary . 1 . 1
# Sam  1 1 1 .
# Tom  . 1 1 1

# One mode projection of the data 
# (i.e. final adjacency matrix, which is the basis for the lift value calculation)
A.final <- tcrossprod(A)

# > A.final
# 4 x 4 sparse Matrix of class "dsCMatrix"
#      Greg Mary Sam Tom
# Greg    1    .   1   .
# Mary    .    2   1   2
# Sam     1    1   3   2
# Tom     .    2   2   3

最佳答案

以下内容可能对您有所帮助,但肯定不是最有效的实现方式。

ComputeLift <- function(data, projection){
# Initialize a matrix to store the results.
lift <- matrix(NA, nrow=nrow(projection), ncol=ncol(projection))
# Select all pairs in the projection matrix
for(i in 1:nrow(projection)){
    for(j in 1:ncol(projection)){
        # The probability to observe both names in the same article is the
        # number of articles where the names appear together divided by the
        # total number of articles
        pAB <- projection[i,j]/ncol(data)
        # The probability for a name to appear in an article is the number of
        # articles where the name appears divided by the total number of articles
        pA <- sum(A[i,])/ncol(data)
        pB <- sum(A[j,])/ncol(data)
        # The lift is computed as the probability to observe both names in an
        # article divided by the product of the probabilities to observe each name.
        lift[i,j] <-  pAB/(pA*pB)
    }
 }
lift
}

ComputeLift(data=A, projection=A.final)

关于r - 提升值计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26130475/

相关文章:

r - 分别提供相关系数下限和上限的函数是什么?

r - 贝叶斯网络中一个节点的条件概率修改(R代码)

r - 来自不同患者就诊序列的马尔可夫转移矩阵

c++ - 如何在 OpenCV 函数中访问多维矩阵的子矩阵?

c - 如何使用动态内存分配将元素分配给矩阵?

machine-learning - 从文本到 K-Means 向量输入

r - 在 ggplot 点图中为垂直线添加第二个图例

从数据框中删除行,其中两列中的值在 R 中不匹配

用于生成和使用生成的决策树的 Java 库

python - 使用 XLNet 获取词嵌入?