python - Groupby 和条件替换

我想按特定列 (id) 对值进行分组，并用与给定 ID 关联的最大日期时间替换所有值。

这是我写的代码(不起作用)

file.groupby('data__id')['data__answered_at'].apply(lambda x: x['data__answered_at'] == x['data__answered_at'].max())

这是我的数据框示例

data__id     data__answered_at
1              2019-01-10
1                  Na 
2              2019-01-12
2                  Na
3                  Na
4                  Na
4                  Na
5                  Na
5              2019-01-15

最佳答案

使用to_datetime使用 errors='coerce' 将非日期时间替换为 NaT，然后使用 GroupBy.transform 获取每组的最大值, 所以可能用 Series.fillna 替换缺失值:

df['data__answered_at'] = pd.to_datetime(df['data__answered_at'], errors='coerce')

s = df.groupby('data__id')['data__answered_at'].transform('max')
df['data__answered_at'] = df['data__answered_at'].fillna(s)
print (df)
   data__id data__answered_at
0         1        2019-01-10
1         1        2019-01-10
2         2        2019-01-12
3         2        2019-01-12
4         3               NaT
5         4               NaT
6         4               NaT
7         5        2019-01-15
8         5        2019-01-15

您的解决方案应该用 lambda 函数和 fillna 重写:

f = lambda x: x.fillna(x.max())
df['data__answered_at'] = df.groupby('data__id')['data__answered_at'].apply(f)

关于python - Groupby 和条件替换，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57093747/

上一篇：python - 如何优化和寻找大量输入的输出？

下一篇：python - 如何使用批处理使 Keras ImageDataGenerator 适用于大型数据集

相关文章：

python - Pandas 试图通过获取某些字符串之间的行来转换数据框

python - 使用 Pandas 从文本文件中提取标题数据

python - 更新机器人而不关闭它discord python

python - 在 tkinter GUI 类的方法中运行 while 循环，同时仍然允许 UI 运算符(operator)

python - Pandas:如何创建多索引枢轴

python - 根据 pd.series 类型的变量对 pandas 数据框进行子集化

python - 根据行/列条件创建 DataFrame 掩码

python - 使用 matplotlib 和 python 绘制 datetime.timedelta

python - 有没有办法使用 Python 合并上传到 AWS S3 存储桶的多个 CSV 文件？

python - Google Sites API + OAuth2(在 Appengine 上)