我有以下形式的 numpy 数组:
rand_pos = [[1,2,2],[2,3,4],[1,2,5],[3,2,1]...] #here, total subarrays = 900
gal_pos = [[2,3,4],[56,6,64],[34,45,65]...] #here, total subarrays ~ 10^6
现在,我的程序从rand_pos中挑选出一个子列表来做如下操作:
pos2=np.array(rand_pos[0])
dist_xyz = np.subtract(pos2,gal_pos)
dist_square_xyz = np.square(dist_xyz)
axis = 1
dist_square_sum = dist_square_xyz.sum(axis)
dist_sqrt = np.sqrt(dist_square_sum)
list_gal_dist_in_sphere = dist_sqrt[abs(dist_sqrt) <=radius]
gal_number = len(list_gal_dist_in_sphere)
如何从 rand_pos 发送所有子列表并对所有子列表执行此操作?我知道我可以遍历 rand_pos,一次发送一个子列表,但是还有其他方法吗?
最佳答案
您最好的选择可能是使用 scipy.spatial.cKDTree
.要查看它是否有效,让我们将您的方法重写为具有显式 for 循环的函数:
def count_neighbours(arr1, arr2, rad):
rad2 = rad * rad
ret = np.empty((len(arr1),), dtype=np.intp)
for j, point in enumerate(arr1):
delta = point - arr2
delta *= delta
dist2 = np.sum(delta, axis=1)
ret[j] = np.count_nonzero(dist2 <= rad2)
return ret
如果我们现在补一些测试数据:
rand_pos = np.random.rand(900, 3)
gal_pos = np.random.rand(1e5, 3) # 10x smaller than OP's data set
我们可以测试这两种方法:
>>> from scipy.spatial import cKDTree
>>> gal_tree = cKDTree(gal_pos)
>>> np.all(np.equal(count_neighbours(rand_pos, gal_pos, 0.1),
... [len(x) for x in gal_tree.query_ball_point(rand_pos, 0.1)]))
True
并为他们计时:
In [13]: %timeit count_neighbours(rand_pos, gal_pos, 0.1)
1 loops, best of 3: 3.59 s per loop
In [14]: %timeit [len(x) for x in gal_tree.query_ball_point(rand_pos, 0.1)]
1 loops, best of 3: 194 ms per loop
In [15]: %timeit cKDTree(gal_pos)
100 loops, best of 3: 18.7 ms per loop
即使对于您真正的 gal_pos
形状,它也相对较快地完成:
In [16]: gal_pos = np.random.rand(1e6, 3)
In [17]: gal_tree = cKDTree(gal_pos)
In [18]: %timeit cKDTree(gal_pos)
1 loops, best of 3: 274 ms per loop
In [19]: %timeit [len(x) for x in gal_tree.query_ball_point(rand_pos, 0.1)]
1 loops, best of 3: 1.22 s per loop
关于python - 尝试在不使用 for(或类似)循环的情况下对 numpy 数组内的所有子数组执行操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20927745/