python - 如何在 Python 中对角化稀疏 csr 一维矩阵(向量)？

[精简版]

scipy.sparse 中是否有等同于 numpy.diagflat() 的函数？或者有什么方法可以“展平”稀疏矩阵使其变得密集？

[长版]

我有一个稀疏矩阵(数学上是一个向量)x_f，我需要对其进行对角化(即创建一个方矩阵，其对角线上的 x_f 向量的值)。

x_f
Out[59]: 
<35021x1 sparse matrix of type '<class 'numpy.float64'>'
    with 47 stored elements in Compressed Sparse Row format>

我试过 scipy.sparse 模块中的“诊断”。 (我也尝试过'spdiags'，但它只是'diags'的更花哨的版本，我不需要它。)我已经尝试过[csr或csc格式]，[原始或转置的每种组合vector] 和 [.todense() 或 .toarray()]，但我不断收到错误消息:

ValueError: Different number of diagonals and offsets.

对于 sparse.diags，默认偏移量为 0，而我要做的是仅将数字放在主对角线上(这是默认设置)，所以出现此错误意味着它是没有按照我的意愿工作。

以下是分别使用 .todense() 和 .toarray() 的原始向量和转置向量的示例:

x_f_original.todense()
Out[72]: 
matrix([[  0.00000000e+00],
        [  0.00000000e+00],
        [  0.00000000e+00],
        ..., 
        [  0.00000000e+00],
        [  1.03332178e-17],
        [  0.00000000e+00]])

x_f_transposed.toarray()
Out[83]: 
array([[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
          0.00000000e+00,   1.03332178e-17,   0.00000000e+00]])

以下代码可以运行，但需要大约 15 秒才能运行:

x_f_diag = sparse.csc_matrix(np.diagflat(x_f.todense()))

有没有人对如何提高效率或更好的方法有任何想法？

[免责声明]

这是我的第一个问题。我希望我做对了，对于任何不清楚的地方，我深表歉意。

最佳答案

In [106]: x_f = sparse.random(1000,1, .1, 'csr')
In [107]: x_f
Out[107]: 
<1000x1 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>

如果将它变成一维密集数组，我可以在 sparse.diags 中使用它。

In [108]: M1=sparse.diags(x_f.A.ravel()).tocsr()
In [109]: M1
Out[109]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>

或者我可以将其设为 (1,1000) 矩阵，并使用列表作为偏移量:

In [110]: M2=sparse.diags(x_f.T.A,[0]).tocsr()
In [111]: M2
Out[111]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>

diags 采用密集的对角线，而不是稀疏的。这是按原样存储的，所以我使用了进一步的 .tocsr 来删除 0 等。

In [113]: sparse.diags(x_f.T.A,[0])
Out[113]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 1000 stored elements (1 diagonals) in DIAgonal format>

所以无论哪种方式，我都将对角线的形状与偏移量(标量或 1)相匹配。

直接映射到 csr(或 csc)可能更快。

对于这种列形状，indices 属性不会告诉我们任何信息。

In [125]: x_f.indices
Out[125]: 
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...0, 0, 0], dtype=int32)

但将其转换为 csc(这将 indptr 映射到 indices)

In [126]: x_f.tocsc().indices
Out[126]: 
array([  2,  15,  26,  32,  47,  56,  75,  82,  96,  99, 126, 133, 136,
       141, 145, 149, ... 960, 976], dtype=int32)
In [127]: idx=x_f.tocsc().indices

In [128]: M3 = sparse.csr_matrix((x_f.data, (idx, idx)),(1000,1000))
In [129]: M3
Out[129]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>

关于python - 如何在 Python 中对角化稀疏 csr 一维矩阵(向量)？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43837454/

python - 如何在 Python 中对角化稀疏 csr 一维矩阵(向量)？

上一篇：python - Keras - 获得训练层的重量

下一篇：python - 字典的字典到 Pandas 数据框的字典-将多索引行更改为列