python - 在 numpy/pandas 中用组聚合替换组的值

我在 numpy 数组 X 中有一张图像:

array([[ 0.01176471,  0.49019608,  0.01568627],
       [ 0.01176471,  0.49019608,  0.01568627],
       [ 0.00784314,  0.49411765,  0.00784314],
       ..., 
       [ 0.03921569,  0.08235294,  0.10588235],
       [ 0.09411765,  0.14901961,  0.18431373],
       [ 0.10196078,  0.15294118,  0.21568627]])

我对此数组运行了聚类器算法来查找相似的颜色，并有另一个数组，其中每个像素 Y 都有类:

array([19, 19, 19, ..., 37, 20, 20], dtype=int32)

用该簇的平均值替换簇中所有像素的颜色的最快、最漂亮、最Python化的方法是什么？

我想出了以下代码:

import pandas as pd
import numpy as np
<...>
df = pd.DataFrame.from_records(X, columns=list('rgb'))
df['cls'] = Y
mean_colors = df.groupby('cls').mean().values
# as suggested in comments below
# for cls in range(len(mean_colors)):
#    X[Y==cls] = mean_colors[cls]
X = mean_colors[Y]

有没有办法只在 pandas 或只在 numpy 中做到这一点？

最佳答案

假设所有标签都存在于 Y 中，您可以使用 basic-indexing -

mean_colors[Y]

对于多次索引同一位置的情况，为了提高性能，您还可以使用 np.take而不是纯粹的索引，就像这样 -

np.take(mean_colors,Y,axis=0)

运行时测试 -

In [107]: X = np.random.rand(10000,3)

In [108]: Y = np.random.randint(0,100,(10000))

In [109]: np.allclose(np.take(mean_colors,Y,axis=0),mean_colors[Y])
Out[109]: True           # Verify approaches

In [110]: %timeit mean_colors[Y]
1000 loops, best of 3: 280 µs per loop

In [111]: %timeit np.take(mean_colors,Y,axis=0)
10000 loops, best of 3: 63.7 µs per loop

关于python - 在 numpy/pandas 中用组聚合替换组的值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35722815/

上一篇：python - 使用 Python (mpi4py) 在 Google Cloud Engine 上进行分布式编程

下一篇：python - 使用 Pandas 将 360 天日历转换为正常的儒略日日历

相关文章：

python - 无法训练 keras 模型来逼近简单函数

ios - 如何在 swift 3 中使用 Alamofire 制作并发送字典数组(JSON 格式)发送到服务器

python - 使用 Numpy 查找输入数字集的均值、中值、众数或范围

python - 在 NumPy 中快速检查 NaN

python - 连接到与登录到 google colab 的不同的 google 驱动器

Python 更新函数内的值并重用它

python - timeit 通过关闭垃圾回收有什么好处？

javascript - 如果没有 jQuery，我如何有选择地删除数组中存在的元素？

代码正在添加额外的字符

python - 类型错误 : unsupported operand type(s) for/: 'list' and 'long'