python - 高效切片三角稀疏矩阵

我有一个稀疏的三角矩阵(例如距离矩阵)。实际上，这将是一个具有高稀疏度的 > 1M x 1M 距离矩阵。

from scipy.sparse import csr_matrix
X = csr_matrix([
      [1, 2, 3, 3, 1],
      [0, 1, 3, 3, 2],
      [0, 0, 1, 1, 3],
      [0, 0, 0, 1, 3],
      [0, 0, 0, 0, 1],
])

我想将这个矩阵子集化为另一个三角距离矩阵。索引的顺序可能不同和/或重复。

idx = np.matrix([1,2,4,2])
X2 = X[idx.T, idx]

这可能导致生成的矩阵不是三角形的，其中缺少一些值上三角，一些值在下三角中被复制。

>>> X2.toarray()
array([[1, 3, 2, 3],
       [0, 1, 3, 1],
       [0, 0, 1, 0],
       [0, 1, 3, 1]])

如何尽可能高效地得到正确的上三角矩阵？目前，我在子集化之前镜像矩阵，然后将其子集化到三角形，但这感觉不是特别有效，因为它至少需要复制所有条目。

# use transpose method, see https://stackoverflow.com/a/58806735/2340703
X = X + X.T - scipy.sparse.diags(X.diagonal())
X2 = X[idx.T, idx]
X2 = scipy.sparse.triu(X2, k=0, format="csr")

>>> X2.toarray()
array([[1., 3., 2., 3.],
       [0., 1., 3., 1.],
       [0., 0., 1., 3.],
       [0., 0., 0., 1.]])

最佳答案

这是一种不涉及镜像数据的方法，而是通过操作稀疏索引来达到预期的结果:

import scipy.sparse as sp

X2 = X[idx.T, idx]

# Extract indices and data (this is essentially COO format)
i, j, data = sp.find(X2)

# Generate indices with elements moved to upper triangle
ij = np.vstack([
  np.where(i > j, j, i),
  np.where(i > j, i, j)
])

# Remove duplicate elements
ij, ind = np.unique(ij, axis=1, return_index=True)

# Re-build the matrix
X2 = sp.coo_matrix((data[ind], ij)).tocsr()

关于python - 高效切片三角稀疏矩阵，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65563193/

python - 高效切片三角稀疏矩阵

上一篇：flutter - 如何在不关闭 AdGuard 的情况下使 Flutter 工作

下一篇：android - LifecycleService 和 Service 类有什么区别？