python-3.x - 根据条件去除重复

标签 python-3.x pandas

我有一个包含日期信息的数据框，如下所示，我想删除代码+当前日期中的重复项，条件如下: 1) 如果['code','currentdate']中有重复，则保留小于或当前日期的最新开始日期； 2) 如果['code','currentdate']中没有重复项，则保留原始日期。谢谢!

>  code        currentdate       startdate            category 
>    a           2018-04-01      2015-04-28         category_z       
>    a           2018-04-01      2015-08-28         category_x     
>    a           2018-04-01      2018-04-17         category_y  
>    a           2018-05-01      2015-04-28         category_z   
>    a           2018-05-01      2015-08-28         category_x   
>    a           2018-05-01      2018-04-17         category_y      
>    b           2018-04-01      2018-08-28         category_x   
>    b           2018-05-01      2018-08-28         category_x  
>    c           2018-04-01      2018-03-17         category_x     
>    c           2018-04-01      2018-04-28         category_y        
>    c           2018-05-01      2018-03-17         category_x     
>    c           2018-05-01      2018-04-28         category_y

预期输出为:

>  code        currentdate       startdate            category      
>    a           2018-04-01      2015-08-28         category_x   
>    a           2018-05-01      2018-04-17         category_y      
>    b           2018-04-01      2018-08-28         category_x   
>    b           2018-05-01      2018-08-28         category_x  
>    c           2018-04-01      2018-03-17         category_x     
>    c           2018-05-01      2018-04-28         category_y

最佳答案

用途:

m=df.duplicated(['code','currentdate'],keep=False)
n=(df[m].sort_values(['code','startdate'],ascending=[True,False])
       .query("startdate<currentdate").drop_duplicates(['code','currentdate']))
pd.concat([df[~m],n]).sort_index()

  code currentdate  startdate    category
0    a  2018-04-01 2015-08-28  category_x
3    a  2018-05-01 2018-04-17  category_y
4    b  2018-04-01 2018-08-28  category_x
5    b  2018-05-01 2018-08-28  category_x
6    c  2018-04-01 2018-03-17  category_x
9    c  2018-05-01 2018-04-28  category_y

关于python-3.x - 根据条件去除重复，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57234480/

上一篇：wpf - 如何从 WPF 中的私有(private)内部类声明资源？

下一篇：.net - Entity Framework 支持 COM+ 事务吗？

相关文章：

python - 如何结束周长计算循环

python-3.x - 为什么 numpy 对这个逆矩阵给出了错误的答案？

python-3.x - 通过网络将数据并行发送到多个接收者

python - Pandas 数据框到 excel 文件中的特定工作表而不会丢失格式

python - 对 CSV 文件中的每四个元素求平均值

python - 从宽到长返回空输出 - Python 数据框

python - 如何使用正确的转义将字节字符串转换为字符？

python - 如何将数据拆分为训练和测试，同时牢记 pandas 中的 groupby 列？

python - 根据 groupby 值向 pandas 数据框添加一个新列

python - 根据不同列中值的交集查找相似组