python - 通过索引创建有序矩阵

标签 python performance numpy indexing vectorization

我有一个与检测器相关的问题,该检测器读取进入 channel 的光子数量以及它们进入检测器的时间,为简单起见,我们假设 channel 为 0 到 6。数组 A 将保存 channel ,基本上是索引列表,虽然我可以计算光子,但我无法将时间存储在一个合理的容器中而不循环(数据文件很大)。因此,将数组 A 视为索引列表,将 B 视为时间。

A=np.array([3,0,4,2,4,1,6])
#so this just says channel 3 got one photon, channel 0 got one, 
#channel 4 got two, 2 got one, 1 got one, channel 5 never got any so 
#it doesn't show up, and 6 got one.
B=np.array([1.2,1.6,3.,.7,.1,.05,9.])
#so here B are the times and they say (by referencing A) that channel 
#1 got a photon at .05s, channel 0 got its photon at 1.6s, channel 4 
#got a photon at 3s and another at .1s etc.
#I would like to somehow store these times in a coo sparse array or
# perhaps just a regular array that would look like:
C=np.array([[1.6,0],[.05,0],[.7,0],[1.2,0],[.1,3.0],[0,0],[.9,0]])
#the zeros could be nans of course. It would be helpful if each row 
# was ordered from earliest times to latest. This final array is
#of course ordered properly from 0 to 6 in terms of channels down
#the first axis (not in the random order that the index list was)

如果您不关心速度,这不是一个难题,但不幸的是,我最近所做的一切都需要快速。谢谢大家

最佳答案

这是一个向量化的方法-

from scipy.sparse import coo_matrix

# Get sorting indices for A
n = len(A)
sidx = A.argsort()

# Use those indices to get sorted A
sA = A[sidx]

# Get shifts going from one group of identical sorted A values to another
shift_mask = np.concatenate(( [True], sA[1:] != sA[:-1] ))

# Get row indices for output array assigning
row_ids = np.zeros(n,dtype=int)
row_ids[shift_mask] = sA[shift_mask]
np.maximum.accumulate(row_ids, out=row_ids)

# Get col indices for output array assigning by using shifting mask
col_ids = intervaled_cumsum(shift_mask,trigger_val=1,start_val=0)

# Setup output sparse matrix and assign values from sorted array B
out = coo_matrix((B[sidx], (row_ids, col_ids)))

函数 intervaled_cumsum 取自 here .

样本运行(在更通用的一个上)-

In [173]: A
Out[173]: array([3, 0, 4, 2, 4, 1, 6, 4, 2, 6])

In [174]: B
Out[174]: array([ 1.2 , 1.6 , 3.  , 0.7 , 0.1 , 0.05, 9.  , 1.5 , 2.9 , 3.1 ])

In [175]: out.toarray()
Out[175]: 
array([[ 1.6 ,  0.  ,  0.  ],
       [ 0.05,  0.  ,  0.  ],
       [ 0.7 ,  2.9 ,  0.  ],
       [ 1.2 ,  0.  ,  0.  ],
       [ 3.  ,  0.1 ,  1.5 ],
       [ 0.  ,  0.  ,  0.  ],
       [ 9.  ,  3.1 ,  0.  ]])

为了解释为已排序的 A 计算这些移位的部分,我们正在使用已排序的 Aone-shifted 切片得到一个代表转变的掩码 -

In [223]: sA # sorted A
Out[223]: array([0, 1, 2, 2, 3, 4, 4, 4, 6, 6])

In [224]: sA[1:] != sA[:-1]
Out[224]: array([ True,  True, False,  True,  True, False, False,  True, False], dtype=bool)

In [225]: np.concatenate(( [True], sA[1:] != sA[:-1] ))
Out[225]: array([ True,  True,  True, False,  True,  True, False, False,  True, False], dtype=bool)

因此,将此输出掩码与排序的 A 相关联,它基本上都是 1,除了索引重复的地方。

关于python - 通过索引创建有序矩阵,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44666539/

相关文章:

python - 将 unicode 转换为日期时间格式

python - 如何删除 matplotlib 中轴刻度中小数点后的数字?

python - 更快的numpy笛卡尔到球坐标转换?

java - java.util.priorityqueue是如何实现的?

python - 如何根据条件有效地将函数应用于数组中的值?

python - 如何使用二进制掩码来掩码图像

Python-填充未使用的日期

c++ - 在 C++ 循环中,重用大型数据结构并重置内容或分配新内容,哪个更快?

scala - 为什么 dataset.count() 比 rdd.count() 快?

python - 在 numpy/scipy 中从 3D 矩阵堆栈构造 3D block 对角矩阵堆栈的有效方法