python - 尝试在不使用 for(或类似)循环的情况下对 numpy 数组内的所有子数组执行操作

我有以下形式的 numpy 数组:

rand_pos = [[1,2,2],[2,3,4],[1,2,5],[3,2,1]...]  #here, total subarrays = 900    
gal_pos = [[2,3,4],[56,6,64],[34,45,65]...]      #here, total subarrays ~ 10^6

现在，我的程序从rand_pos中挑选出一个子列表来做如下操作:

pos2=np.array(rand_pos[0])
dist_xyz = np.subtract(pos2,gal_pos)            
dist_square_xyz = np.square(dist_xyz)
axis = 1
dist_square_sum = dist_square_xyz.sum(axis)
dist_sqrt = np.sqrt(dist_square_sum)
list_gal_dist_in_sphere = dist_sqrt[abs(dist_sqrt) <=radius]
gal_number = len(list_gal_dist_in_sphere)

如何从 rand_pos 发送所有子列表并对所有子列表执行此操作？我知道我可以遍历 rand_pos，一次发送一个子列表，但是还有其他方法吗？

最佳答案

您最好的选择可能是使用 scipy.spatial.cKDTree .要查看它是否有效，让我们将您的方法重写为具有显式 for 循环的函数:

def count_neighbours(arr1, arr2, rad):
    rad2 = rad * rad
    ret = np.empty((len(arr1),), dtype=np.intp)
    for j, point in enumerate(arr1):
        delta = point - arr2
        delta *= delta
        dist2 = np.sum(delta, axis=1)
        ret[j] = np.count_nonzero(dist2 <= rad2)
    return ret

如果我们现在补一些测试数据:

rand_pos = np.random.rand(900, 3)
gal_pos = np.random.rand(1e5, 3) # 10x smaller than OP's data set

我们可以测试这两种方法:

>>> from scipy.spatial import cKDTree
>>> gal_tree = cKDTree(gal_pos)
>>> np.all(np.equal(count_neighbours(rand_pos, gal_pos, 0.1),
...                 [len(x) for x in gal_tree.query_ball_point(rand_pos, 0.1)]))
True

并为他们计时:

In [13]: %timeit count_neighbours(rand_pos, gal_pos, 0.1)
1 loops, best of 3: 3.59 s per loop

In [14]: %timeit [len(x) for x in gal_tree.query_ball_point(rand_pos, 0.1)]
1 loops, best of 3: 194 ms per loop

In [15]: %timeit cKDTree(gal_pos)
100 loops, best of 3: 18.7 ms per loop

即使对于您真正的 gal_pos 形状，它也相对较快地完成:

In [16]: gal_pos = np.random.rand(1e6, 3)

In [17]: gal_tree = cKDTree(gal_pos)

In [18]: %timeit cKDTree(gal_pos)
1 loops, best of 3: 274 ms per loop

In [19]: %timeit [len(x) for x in gal_tree.query_ball_point(rand_pos, 0.1)]
1 loops, best of 3: 1.22 s per loop

关于python - 尝试在不使用 for(或类似)循环的情况下对 numpy 数组内的所有子数组执行操作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20927745/

python - 尝试在不使用 for(或类似)循环的情况下对 numpy 数组内的所有子数组执行操作

上一篇：python - 使用 argparse 指定文件扩展名

下一篇：Linux 上的 Python 模块 win32com