我的数据框包含三列 name
、content
和 day
df
content day name
0 first_day 01-01-2017 marcus
1 present 10-01-2017 marcus
2 first_day 01-02-2017 marcus
3 first_day 01-03-2017 marcus
4 absent 05-03-2017 marcus
5 present 20-03-2017 marcus
6 first_day 01-04-2017 bruno
7 present 11-04-2017 bruno
8 first_day 01-05-2017 bruno
9 absent 02-05-2017 bruno
10 first_day 01-06-2017 bruno
11 absent 02-06-2017 bruno
12 payment 09-06-2017 bruno
我试图找出 month wise
的用户,其行有 first_day
、absent
和 present
连续.
示例输出:
content day name absent_after_present
0 first_day 01-01-2017 marcus False
1 first_day 01-02-2017 marcus False
2 first_day 01-03-2017 marcus True
3 first_day 01-04-2017 bruno False
4 first_day 01-05-2017 bruno False
5 first_day 01-06-2017 bruno True
例如:marcus
first_day
、缺席
和 present
从 01-03-2017 连续
、05-03-2017
和
20-03-2017
同一个月
。所以 marcus
状态应该是 True
最佳答案
也许您可以尝试提取每月的内容,然后按名称和月份分组,如下所示。
import pandas as pd
data = pd.DataFrame({'content' : ['first_day','present', 'first_day', 'first_day', 'absent',
'present', 'first_day', 'present', 'first_day', 'absent', 'first_day', 'absent', 'present'],
'day' : ['2017-01-01', '2017-01-10', '2017-02-01', '2017-03-01', '2017-03-05', '2017-03-20',
'2017-04-01', '2017-04-11', '2017-05-01', '2017-05-02', '2017-06-01', '2017-06-02', '2017-06-09'],
'name' : ['marcus', 'marcus', 'marcus', 'marcus', 'marcus', 'marcus', 'bruno', 'bruno', 'bruno',
'bruno', 'bruno', 'bruno', 'bruno']})
data['day'] = pd.to_datetime(data['day'])
data['month'] = data.day.dt.month
data_new = pd.DataFrame(data.groupby(['name', 'month'])['content'].unique()).join(pd.DataFrame(data.groupby(['name', 'month'])['day'].unique()), on=['name', 'month'])
data_new['absent_after_present'] = data_new['content'].apply(lambda x : True if len(x) == 3 and len(set(x)) == 3 else False)
data_new['day'] = data_new['day'].apply(lambda x : x[0])
data_new['content'] = data_new['content'].apply(lambda x : x[0])
data_new = data_new.droplevel(1)
data_new
name content day absent_after_present
bruno first_day 2017-04-01 False
bruno first_day 2017-05-01 False
bruno first_day 2017-06-01 True
marcus first_day 2017-01-01 False
marcus first_day 2017-02-01 False
marcus first_day 2017-03-01 True
关于Python 数据帧 : Seperate rows based on custom condition?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67018896/