python - 在Python中删除CSR格式矩阵的列

标签 python numpy matrix scipy sparse-matrix

我有一个 csr 格式的稀疏矩阵 (22000x97482),我想删除一些列(列号的索引存储在列表中)

最佳答案

如果您有非常多的列,那么生成完整的列索引集可能会变得相当昂贵。一种稍微快一点的替代方法是暂时转换为 COO format :

import numpy as np
from scipy import sparse

def dropcols_fancy(M, idx_to_drop):
    idx_to_drop = np.unique(idx_to_drop)
    keep = ~np.in1d(np.arange(M.shape[1]), idx_to_drop, assume_unique=True)
    return M[:, np.where(keep)[0]]

def dropcols_coo(M, idx_to_drop):
    idx_to_drop = np.unique(idx_to_drop)
    C = M.tocoo()
    keep = ~np.in1d(C.col, idx_to_drop)
    C.data, C.row, C.col = C.data[keep], C.row[keep], C.col[keep]
    C.col -= idx_to_drop.searchsorted(C.col)    # decrement column indices
    C._shape = (C.shape[0], C.shape[1] - len(idx_to_drop))
    return C.tocsr()

检查等价性:

m, n, d = 1000, 2000, 20

M = sparse.rand(m, n, format='csr')
idx_to_drop = np.random.randint(0, n, d)

M_drop1 = dropcols_fancy(M, idx_to_drop)
M_drop2 = dropcols_coo(M, idx_to_drop)

print(np.all(M_drop1.A == M_drop2.A))
# True

基准:

In [1]: m, n = 1000, 1000000

In [2]: %%timeit M = sparse.rand(m, n, format='csr')
   ...: dropcols_fancy(M, idx_to_drop)
   ...: 
1 loops, best of 3: 1.11 s per loop

In [3]: %%timeit M = sparse.rand(m, n, format='csr')
   ...: dropcols_coo(M, idx_to_drop)
   ...: 
1 loops, best of 3: 365 ms per loop

关于python - 在Python中删除CSR格式矩阵的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23966923/

相关文章:

javascript - 有没有办法用一个按钮和唯一的隐藏值提交多个表单?

python - 如何正确卸载Anaconda?

python - 这段 Haskell 代码是否等同于这段 Python 代码?

python - Django错误: create_user() takes at least 2 arguments (3 given)

python - 使用 numpy 将数字放入 bin 中

python - 等式中参数的数量无效

python - 随机矩阵所有行的快速随机加权选择

python - 如何在DataFrame中找到相同的行——python

c++ - 如何告诉 C++ 根据索引指示符放弃 vector 中的某些元素

c++ - 如何围绕特定原点旋转点?