python - 使用 python 根据 pandas 中的时间获取不同的值

有一个有值(value)的 df

a   b      time

1   test   2020-01-06 16:49:36.742 
2   test   2019-01-07 16:49:36.742
3   test   2015-01-07 16:49:36.742
4   car    2016-01-07 16:49:36.742
5   train  2017-01-07 16:49:36.742
6   train  2012-01-07 16:49:36.742
7   bat    2011-01-07 16:49:36.742

获取最早日期的不同值[日期格式:datetime[ns]]

像这样

a   b      time


1   test    2015-01-07 16:49:36.742
2   car    2016-01-07 16:49:36.742
3   train   2012-01-07 16:49:36.742
4   bat    2011-01-07 16:49:36.742

最佳答案

使用DataFrame.sort_values与 DataFrame.drop_duplicates :

( df.sort_values('time')
    .drop_duplicates('b',keep = 'first')
    .reset_index(drop=True)
    .assign(a = lambda x: x.index + 1) )

或使用 Groupby.first

 ( df.sort_values('time')
     .groupby('b',as_index=False).first()
     .reset_index(drop=True)
     .assign(a = lambda x: x.index + 1) )

<小时/>

如果您的数据按时间列按降序排序，您可以使用:

( df.drop_duplicates('b',keep ='last')
    .reset_index(drop=True)
    .assign(a = lambda x: x.index + 1) )

或

(df.groupby('b',as_index=False).last()
   .reset_index(drop=True)
   .assign(a = lambda x: x.index + 1))

<小时/>

输出

   a      b                     time
0  1   test  2015-01-07-16:49:36.742
1  2    car  2016-01-07-16:49:36.742
2  3  train  2012-01-07-16:49:36.742
3  4    bat  2011-01-07-16:49:36.742

关于python - 使用 python 根据 pandas 中的时间获取不同的值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59633937/

上一篇：Python 使用另一个列表中的信息生成列表时出现问题

下一篇：python - 根据列值删除行时 'column name' 上的语法无效

相关文章：

python - 在 Python 中比较两个不同顺序的字典列表

python - 如何在 pandas 数据帧上输出第一个非预期结果

python - 如何根据上一行的值添加新列

r - 计算数据帧列中事件组合发生的次数

python - Azure Rest API 用于 VM 还原的 resourcesRestoreRequest 语法

python - 文件上传问题 - Python-WIndows-CheryyPy

python - pandas python优化利润，线性优化

python-3.x - 高于阈值的数据帧列计数

python - For Loop 或 executemany - Python 和 SQLite3

python - 在 Pandas 中调整每月时间序列数据