我有一个看起来像这样的数据框
pd.DataFrame({'a': ['cust1', 'cust1', 'cust1', 'cust1', 'cust2', 'cust2', 'cust2', 'cust2', 'cust3', 'cust3', 'cust3', 'cust3'],
'year': [2017, 2018, 2019, 2020, 2017, 2018, 2019, 2020, 2017, 2018, 2019, 2020],
'amt': [2, 0, 4, 'NaN', 2, 2, 3, 3, 3, 2, 'NaN', 5]})
a year amt
0 cust1 2017 2
1 cust1 2018 0
2 cust1 2019 4
3 cust1 2020 NaN
4 cust2 2017 2
5 cust2 2018 2
6 cust2 2019 3
7 cust2 2020 3
8 cust3 2017 3
9 cust3 2018 2
10 cust3 2019 NaN
11 cust3 2020 5
我需要检查“a”列中每组的“amt”列中是否至少有 3 个正值。生成的数据框应如下所示
a year amt cond
0 cust1 2017 2 False
1 cust1 2018 0 False
2 cust1 2019 4 False
3 cust1 2020 NaN False
4 cust2 2017 2 True
5 cust2 2018 2 True
6 cust2 2019 3 True
7 cust2 2020 3 True
8 cust3 2017 3 True
9 cust3 2018 2 True
10 cust3 2019 NaN True
11 cust3 2020 5 True
以下逻辑适用:
cust1 = False,因为只有 2 个正值(2017、2019)
cust2 = True 为 4 个正值
cust3 = True 为 3 个正值
最佳答案
让我们尝试使用 sum
进行transform
df = df.replace('NaN',np.nan)
df['cond'] = df.amt.gt(0).groupby(df['a']).transform('sum')>2
df
Out[62]:
a year amt cond
0 cust1 2017 2.0 False
1 cust1 2018 0.0 False
2 cust1 2019 4.0 False
3 cust1 2020 NaN False
4 cust2 2017 2.0 True
5 cust2 2018 2.0 True
6 cust2 2019 3.0 True
7 cust2 2020 3.0 True
8 cust3 2017 3.0 True
9 cust3 2018 2.0 True
10 cust3 2019 NaN True
11 cust3 2020 5.0 True
关于python - 如何检查pandas组中n个正值的数量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63344382/