python - 识别数组中的重复行并对另一个数组中的相应值求和

标签 python arrays numpy duplicates unique

假设有一个包含结果的数组和一个包含概率的数组。有些结果可能会被多次列出。例如:

import numpy as np
x = np.array(([0,0],[1,1],[2,1],[1,1],[2,2]),dtype=int)
p = np.array([0.1,0.2,0.3,0.1,0.2],dtype=float)

现在我想列出 x 中的唯一结果，并将重复结果的 p 中的相应概率相加。所以结果应该是数组 xnew 和 pnew 定义为

xnew = np.array(([0,0],[1,1],[2,1],[2,2]),dtype=int)
pnew = np.array([0.1,0.3,0.3,0.2],dtype=float)

虽然有一些如何获取唯一行的示例，请参见，例如Removing duplicate columns and rows from a NumPy 2D array ，我不清楚如何使用它来添加另一个数组中的值。

有人有建议吗？首选使用 numpy 的解决方案。

最佳答案

bincount 可以为您对 p 数组求和，您只需为 a 中的每个唯一行创建一个唯一的 id 编号。如果您使用排序方法来识别唯一行，那么创建唯一 ID 就非常简单。一旦对行进行排序并生成 diff 数组，您就可以对 diff 数组进行求和。例如:

  x    diff cumsum
[0, 0]  1    1
[0, 0]  0    1
[0, 1]  1    2
[0, 2]  1    3
[1, 0]  1    4
[1, 0]  0    4
[1, 0]  0    4
[1, 0]  0    4
[1, 0]  0    4
[1, 1]  1    5

在代码中，它看起来像这样:

import numpy as np

def unique_rows(a, p):
    order = np.lexsort(a.T)
    a = a[order]
    diff = np.ones(len(a), 'bool')
    diff[1:] = (a[1:] != a[:-1]).any(-1)
    sums = np.bincount(diff.cumsum() - 1, p[order])
    return a[diff], sums

关于python - 识别数组中的重复行并对另一个数组中的相应值求和，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29213575/

上一篇：python - 迭代大型数据框的有效方法

下一篇：python - Django:ModelAdmin.Media.js 中的反向操作

python - Python 3 中的 Nose 提示但不是 2

python - 来自 concurrent.futures 的 ProcessPoolExecutor 比 multiprocessing.Pool 慢

python - 为什么 Numpy 比这个 cython 例程好 3 倍

python - 计算 numpy 数组成对的欧几里德距离，除了 self

python - 在python中使用基因表达矩阵进行层次聚类

python - 拦截从 iOS 应用程序发送到服务(从 Python 和 Appium)的请求

javascript - 从数组中获取对象？

php - 从数组中删除项目的所有实例

c - 字符数组的状态