python - 计算 nd 数组中相同子数组的最快方法？

让我们考虑一个二维数组 A

2   3   5   7
2   3   5   7
1   7   1   4
5   8   6   0
2   3   5   7

第一行、第二行和最后一行是相同的。我正在寻找的算法应该返回每个不同行的相同行数(=每个元素的重复数)。如果可以轻松修改脚本以也计算相同列的数量，那就太好了。

我使用一种低效的朴素算法来做到这一点:

import numpy
A=numpy.array([[2,  3,  5,  7],[2,  3,  5,  7],[1,  7,  1,  4],[5,  8,  6,  0],[2,  3,  5,  7]])
i=0
end = len(A)
while i<end:
    print i,
    j=i+1
    numberID = 1
    while j<end:
        print j
        if numpy.array_equal(A[i,:] ,A[j,:]):
            numberID+=1
        j+=1
    i+=1
print A, len(A)

预期结果:

array([3,1,1]) # number identical arrays per line

我的算法看起来像是在 numpy 中使用 native python，因此效率低下。感谢您的帮助。

最佳答案

在 unumpy >= 1.9.0 中，np.unique 有一个 return_counts 关键字参数，您可以将其与解决方案 here 结合使用获取计数:

b = np.ascontiguousarray(A).view(np.dtype((np.void, A.dtype.itemsize * A.shape[1])))
unq_a, unq_cnt = np.unique(b, return_counts=True)
unq_a = unq_a.view(A.dtype).reshape(-1, A.shape[1])

>>> unq_a
array([[1, 7, 1, 4],
       [2, 3, 5, 7],
       [5, 8, 6, 0]])

>>> unq_cnt
array([1, 3, 1])

在较旧的 numpy 中，您可以复制 np.unique does 的内容，看起来像:

a_view = np.array(A, copy=True)
a_view = a_view.view(np.dtype((np.void,
                               a_view.dtype.itemsize*a_view.shape[1]))).ravel()
a_view.sort()
a_flag = np.concatenate(([True], a_view[1:] != a_view[:-1]))
a_unq = A[a_flag]
a_idx = np.concatenate(np.nonzero(a_flag) + ([a_view.size],))
a_cnt = np.diff(a_idx)

>>> a_unq
array([[1, 7, 1, 4],
       [2, 3, 5, 7],
       [5, 8, 6, 0]])

>>> a_cnt
array([1, 3, 1])

关于python - 计算 nd 数组中相同子数组的最快方法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26384719/

python - 计算 nd 数组中相同子数组的最快方法？

上一篇：python - 使用 os.walk 时未使用的对象

下一篇：python - 有什么方法可以使用 xlwings 创建新工作表吗？