python - 根据分组和条件更新数据框列

标签 python python-3.x pandas dataframe max

我有三列代表我的数据。我正在尝试根据前两列的输入更新最后一列“Val”。

我想要按“猫”列分类的“范围”的最大值。接下来，我想根据该组中“Val”列的最小值更新“Val”列。

Input
    Cat Range Val
0    1    0   1.0
2    1    2   1.5
3    1    3   2.0
5    1    5   9.0
6    2    0   1.5
7    2    5   2.0
8    2   10   0.5
9    2   15   2.8
10   2   20   9.0 

Desired Output (Only Lines 5 and 10 change):
    Cat Range Val
0    1    0   1.0
2    1    2   1.5
3    1    3   2.0
5    1    5   1.0
6    2    0   1.5
7    2    5   2.0
8    2   10   0.5
9    2   15   2.8
10   2   20   0.5

我对 pandas 的基本了解建议采用这种方法，但它不起作用，我似乎无法解决它。

df.loc[df.groupby(['Cat'])['Range'].max(), 'Val'] = df.groupby('Cat')['Val'].min()

最佳答案

您可以将 lambda 函数与 numpy.where 一起使用如果需要按 Val 列进行比较:

f = lambda x: np.where(x == x.max(), x.min(), x)
df['Val'] = df.groupby(['Cat'])['Val'].transform(f)
print (df)
    Cat  Range  Val
0     1      0  1.0
2     1      2  1.5
3     1      3  2.0
5     1      5  1.0
6     2      0  1.5
7     2      5  2.0
8     2     10  0.5
9     2     15  2.8
10    2     20  0.5

如果需要按 Range 列中的 max 进行比较，请使用 GroupBy.transform如果需要替换每组的所有最大值:

m = df['Range'].eq(df.groupby(['Cat'])['Range'].transform('max'))
df.loc[m, 'Val'] = df.groupby('Cat')['Val'].transform('min')

print (df)
    Cat  Range  Val
0     1      0  1.0
2     1      2  1.5
3     1      3  2.0
5     1      5  1.0
6     2      0  1.5
7     2      5  2.0
8     2     10  0.5
9     2     15  2.8
10    2     20  0.5

关于python - 根据分组和条件更新数据框列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65000090/

上一篇：mysql - 更改列类型smallint(5) 到int(11) 太慢了？

下一篇：javascript - 如何在 vanilla js 中插入媒体查询？

相关文章：

python - 根据另一列中的值将多列设置为零

python - 如何根据 pandas 数据框中的 2 个(或更多)其他值选择某个值

python - 使用局部变量从一个函数到另一个函数

python - 为什么在控制台中写入 stdout 会附加写入的字符数，在 Python 3 中？

python - 如何将 python 对象(如字典)分配给 pandas 列

python - 如何一次读取文件 N 行？

python-3.x - OpenCV 与沃森工作室

python - 将列表中的unicode转换为数据帧

python - 如何将日期列表与模式匹配？

Python 类型错误 : 'Response' object has no attribute 'getitem'