我希望仅在数据框中保留基于个人的连续天数的条目。
假设我的数据框定义为 -
dic = {'name':['John','John','John','Susan','Susan','Susan','Susan','Mike',
'Mike','Mike'],
'worked':['2020-03-12','2020-03-13','2020-03-15','2020-03-16',
'2020-03-18','2020-03-19','2020-03-20','2020-03-31',
'2020-03-29','2020-04-01'],
'paid':[100,200,300,400,500,100,200,300,400,500]}
df = pd.DataFrame(dic)
df['worked'] = pd.to_datetime(df['worked'])
print(df)
带输出-
name worked paid
0 John 2020-03-12 100
1 John 2020-03-13 200
2 John 2020-03-15 300
3 Susan 2020-03-16 400
4 Susan 2020-03-18 500
5 Susan 2020-03-19 100
6 Susan 2020-03-20 200
7 Mike 2020-03-31 300
8 Mike 2020-03-29 400
9 Mike 2020-04-01 500
我想要的输出看起来像这样-
name worked paid
0 John 2020-03-12 100
1 John 2020-03-13 200
2 Susan 2020-03-18 500
3 Susan 2020-03-19 100
4 Susan 2020-03-20 200
5 Mike 2020-03-31 300
6 Mike 2020-04-01 500
最佳答案
我的 2 美分与 diff
;
df = df.sort_values(['name','worked'])
c = df.groupby("name")['worked'].diff().dt.days.eq(1)
df[c|c.shift(-1)].sort_index()
name worked paid
0 John 2020-03-12 100
1 John 2020-03-13 200
4 Susan 2020-03-18 500
5 Susan 2020-03-19 100
6 Susan 2020-03-20 200
7 Mike 2020-03-31 300
9 Mike 2020-04-01 500
关于python - 在数据框中保留连续的天数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62559795/