我查看了 Matrix包和他们的 slides 。我试图理解dgCMatrix中的论点背后的直觉和含义是什么。类(class)。我明白了

@i给出矩阵中非零条目的从零开始的行索引。
@j给出矩阵中非零条目的从零开始的列索引。
@x给出 (i,j) 处的非零元素职位。

但是我不明白指针的含义@p 。 documentation说

numeric (integer-valued) vector of pointers, one for each column (or row), to the initial (zero-based) index of elements in the column (or row).

这并没有提供太多信息。在“详细信息”部分，在同一页面上，他们解释了更多内容

If i or j is missing then p must be a non-decreasing integer vector whose first element is zero. It provides the compressed, or “pointer” representation of the row or column indices, whichever is missing. The expanded form of p, rep(seq_along(dp),dp) where dp <- diff(p), is used as the (1-based) row or column indices.

这对我来说绝对是不直观的。有人可以提供一个简单的解释吗p代表？我已经创建了一个最小工作示例，但您可以随意创建一个新示例。

最小工作示例

# Define non-zero values and their row/col indeces
i_indeces <- c(1, 3, 4, 6, 8, 9)
j_indeces <- c(2, 9, 6, 3, 9, 10)
values <- c(60, 20, 10, 40, 30, 50)
# Create the sparse matrix
A <- sparseMatrix(
    i=i_indeces,
    j=j_indeces,
    x=values,
    dims=c(10, 20)
)

哪里

> str(A)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:6] 0 5 3 2 7 8
  ..@ p       : int [1:21] 0 0 1 2 2 2 3 3 3 5 ...
  ..@ Dim     : int [1:2] 10 20
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ x       : num [1:6] 60 40 10 20 30 50
  ..@ factors : list()

和

> A
10 x 20 sparse Matrix of class "dgCMatrix"

 [1,] . 60  . . .  . . .  .  . . . . . . . . . . .
 [2,] .  .  . . .  . . .  .  . . . . . . . . . . .
 [3,] .  .  . . .  . . . 20  . . . . . . . . . . .
 [4,] .  .  . . . 10 . .  .  . . . . . . . . . . .
 [5,] .  .  . . .  . . .  .  . . . . . . . . . . .
 [6,] .  . 40 . .  . . .  .  . . . . . . . . . . .
 [7,] .  .  . . .  . . .  .  . . . . . . . . . . .
 [8,] .  .  . . .  . . . 30  . . . . . . . . . . .
 [9,] .  .  . . .  . . .  . 50 . . . . . . . . . .
[10,] .  .  . . .  . . .  .  . . . . . . . . . . .

注意

据我了解rep(seq_along(diff(A@p)), diff(A@p))是 j_indeces 的重新排列形式但还是不明白什么意思。

最佳答案

我终于明白了!我将答案发布以供将来引用。查看矩阵A

 [1,] . 60  . . .  . . .  .  . . . . . . . . . . .
 [2,] .  .  . . .  . . .  .  . . . . . . . . . . .
 [3,] .  .  . . .  . . . 20  . . . . . . . . . . .
 [4,] .  .  . . . 10 . .  .  . . . . . . . . . . .
 [5,] .  .  . . .  . . .  .  . . . . . . . . . . .
 [6,] .  . 40 . .  . . .  .  . . . . . . . . . . .
 [7,] .  .  . . .  . . .  .  . . . . . . . . . . .
 [8,] .  .  . . .  . . . 30  . . . . . . . . . . .
 [9,] .  .  . . .  . . .  . 50 . . . . . . . . . .
[10,] .  .  . . .  . . .  .  . . . . . . . . . . .

属性p

> A@p
 [1] 0 0 1 2 2 2 3 3 3 5 6 6 6 6 6 6 6 6 6 6 6

基本上计算每行中非零元素的数量。它的构造是这样的

按照惯例，第一个元素始终为 0(不确定原因)，因此 p = [0]
接下来，从矩阵的左上角(即 [1, 1])开始，我们从最左边的列到最右边的列查看每一列然后我们将该列中非零元素的数量添加到“计数器”(现在设置为 0)中。
- 列 1 没有非零元素，因此我们将 0 添加到计数器中。 p=[0,0]。
- 列 2 有一个非零元素 (60)，因此我们将 1 添加到计数器 p=[0, 0, 0+1]=[0,0,1]
- 列 3 有一个非零元素 (40)，因此 p=[0, 0, 1, 1+1]=[0, 0 , 1, 2]
- 第 4 列没有非零元素，因此 p=[0, 0, 1, 2, 2+0]=[0, 0, 1, 2, 2]
- 第 5 列没有非零元素，因此 p=[0, 0, 1, 2, 2, 2]
- 第 6 列有一个非零元素 (10)，因此 p=[0, 0, 1, 2, 2, 2, 3]
- 列 7 没有非零元素，因此 p=[0, 0, 1, 2, 2, 2, 3, 3]
- 列 8 没有非零元素，因此 p=[0, 0, 1, 2, 2, 2, 3, 3, 3]
- 列 9 有两个非零元素(20 和 30)，因此 p=[0, 0, 1, 2 , 2, 2, 3, 3, 3, 5]
- 列 10 有 1 个非零元素 (50)，因此 p=[0, 0, 1, 2, 2, 2, 3, 3 , 3, 5, 6]
- 列 11 到 20 的元素全部为零，因此我们附加 [6, 6, 6, 6, 6, 6, 6, 6, 6, 6]

因此我们得到了我们想要的p。背后的直觉是，它是从左到右按列有多少个非零元素的计数器。

关于R矩阵包: Meaning of the attributes in the dgCMatrix class for sparse matrices，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59545714/

R矩阵包: Meaning of the attributes in the dgCMatrix class for sparse matrices

最小工作示例

注意

上一篇：mysql - 为什么 MEMBER OF() 或 JSON_CONTAINS() 不使用多值索引？

下一篇：Istio 添加和删除 header ，但不覆盖