我想按特定列 (id) 对值进行分组,并用与给定 ID 关联的最大日期时间替换所有值。
这是我写的代码(不起作用)
file.groupby('data__id')['data__answered_at'].apply(lambda x: x['data__answered_at'] == x['data__answered_at'].max())
这是我的数据框示例
data__id data__answered_at
1 2019-01-10
1 Na
2 2019-01-12
2 Na
3 Na
4 Na
4 Na
5 Na
5 2019-01-15
最佳答案
使用to_datetime
使用 errors='coerce'
将非日期时间替换为 NaT
,然后使用 GroupBy.transform
获取每组的最大值, 所以可能用 Series.fillna
替换缺失值:
df['data__answered_at'] = pd.to_datetime(df['data__answered_at'], errors='coerce')
s = df.groupby('data__id')['data__answered_at'].transform('max')
df['data__answered_at'] = df['data__answered_at'].fillna(s)
print (df)
data__id data__answered_at
0 1 2019-01-10
1 1 2019-01-10
2 2 2019-01-12
3 2 2019-01-12
4 3 NaT
5 4 NaT
6 4 NaT
7 5 2019-01-15
8 5 2019-01-15
您的解决方案应该用 lambda 函数和 fillna
重写:
f = lambda x: x.fillna(x.max())
df['data__answered_at'] = df.groupby('data__id')['data__answered_at'].apply(f)
关于python - Groupby 和条件替换,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57093747/