我有一个按“Key”分组的 df。我想标记组中录取日期与另一个录取日期匹配的任何行。
df = pd.DataFrame({'Key': ['10003', '10003', '10003', '10003', '10003','10003','10034', '10034'],
'Num1': [12,13,13,13,13,13,16,13],
'Num2': [121,122,122,124,125,126,127,128],
'admit': [20120506, 20120508, 20121010,20121010,20121010,20121110,20120516,20120520],
'discharge': [20120508, 20120508, 20121012,20121016,20121023,20121111,20120518,20120522]})
df['admit'] = pd.to_datetime(df['admit'], format='%Y%m%d')
df['discharge'] = pd.to_datetime(df['discharge'], format='%Y%m%d')
初始 df
Key Num1 Num2 admit discharge
0 10003 12 121 2012-05-06 2012-05-08
1 10003 13 122 2012-05-08 2012-05-08
2 10003 13 122 2012-10-10 2012-10-12
3 10003 13 124 2012-10-10 2012-10-16
4 10003 13 125 2012-10-10 2012-10-23
5 10003 13 126 2012-11-10 2012-11-11
6 10034 16 127 2012-05-16 2012-05-18
7 10034 13 128 2012-05-20 2012-05-22
最终的df
Key Num1 Num2 admit discharge flag
0 10003 12 121 2012-05-06 2012-05-08 0
1 10003 13 122 2012-05-08 2012-05-08 0
2 10003 13 122 2012-10-10 2012-10-12 1
3 10003 13 124 2012-10-10 2012-10-16 1
4 10003 13 125 2012-10-10 2012-10-23 1
5 10003 13 126 2012-11-10 2012-11-11 0
6 10034 16 127 2012-05-16 2012-05-18 0
7 10034 13 128 2012-05-20 2012-05-22 0
df 尺寸为 1.5 亿乘 400,因此我认为使用 .loc 而不是 .apply 可能更适合于此。但我愿意接受建议。
我的代码:
df.loc[df.groupby('Key').duplicated(subset='admit'),'flag'] = 1
但是,这引发了一个错误,我应该使用 apply。
AttributeError: Cannot access callable attribute 'duplicated' of 'DataFrameGroupBy' objects, try using the 'apply' method
最佳答案
您需要申请
df.loc[df.groupby('Key').apply(lambda x : x.duplicated(subset='admit',keep=False)).values,'flag']=1
df
Out[300]:
Key Num1 Num2 admit discharge flag
0 10003 12 121 2012-05-06 2012-05-08 NaN
1 10003 13 122 2012-05-08 2012-05-08 NaN
2 10003 13 122 2012-10-10 2012-10-12 1.0
3 10003 13 124 2012-10-10 2012-10-16 1.0
4 10003 13 125 2012-10-10 2012-10-23 1.0
5 10003 13 126 2012-11-10 2012-11-11 NaN
6 10034 16 127 2012-05-16 2012-05-18 NaN
7 10034 13 128 2012-05-20 2012-05-22 NaN
关于python - 如何使用 .loc 通过 groupby pandas 标记新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49143786/