我有一个数据框,其中包含法庭不空闲的时间:
df = pd.DataFrame(
[
{'court_name': 'Court 1', 'reserved_fr': '2021-11-15T08:00:00', 'reserved_to': '2021-11-15T12:00:00'},
{'court_name': 'Court 1', 'reserved_fr': '2021-11-15T15:00:00', 'reserved_to': '2021-11-15T16:00:00'},
{'court_name': 'Court 1', 'reserved_fr': '2021-11-15T16:00:00', 'reserved_to': '2021-11-15T21:00:00'},
{'court_name': 'Court 2', 'reserved_fr': '2021-11-15T20:00:00', 'reserved_to': '2021-11-15T21:00:00'}
]
)
| | court_name | reserved_fr | reserved_to |
|---:|:-------------|:--------------------|:--------------------|
| 0 | Court 1 | 2021-11-15T08:00:00 | 2021-11-15T12:00:00 |
| 1 | Court 1 | 2021-11-15T15:00:00 | 2021-11-15T16:00:00 |
| 2 | Court 1 | 2021-11-15T16:00:00 | 2021-11-15T21:00:00 |
| 3 | Court 2 | 2021-11-15T20:00:00 | 2021-11-15T21:00:00 |
如果每个法庭的工作时间是早上7点到晚上11点,我想知道法庭什么时候有空。
例如法庭是免费的:
Court 1 2021-11-15 07:00:00 2021-11-15 08:00:00
Court 1 2021-11-15 12:00:00 2021-11-15 15:00:00
Court 1 2021-11-15 21:00:00 2021-11-15 23:00:00
Court 2 2021-11-15 07:00:00 2021-11-15 20:00:00
Court 2 2021-11-15 21:00:00 2021-11-15 23:00:00
如何将数据帧转换为上述格式的另一个数据帧?
最佳答案
在 7:00
和 23:00
之间没有定义确切日期的解决方案是:
#reshape for hours to one column date
L = [pd.date_range(s,e, freq='H')
for s, e in df[['reserved_fr','reserved_to']].to_numpy()]
df['date'] = L
df1 = df.explode('date').drop_duplicates(['court_name','date'])
print (df1)
court_name reserved_fr reserved_to date
0 Court 1 2021-11-15T08:00:00 2021-11-15T12:00:00 2021-11-15 08:00:00
0 Court 1 2021-11-15T08:00:00 2021-11-15T12:00:00 2021-11-15 09:00:00
0 Court 1 2021-11-15T08:00:00 2021-11-15T12:00:00 2021-11-15 10:00:00
0 Court 1 2021-11-15T08:00:00 2021-11-15T12:00:00 2021-11-15 11:00:00
0 Court 1 2021-11-15T08:00:00 2021-11-15T12:00:00 2021-11-15 12:00:00
1 Court 1 2021-11-15T15:00:00 2021-11-15T16:00:00 2021-11-15 15:00:00
1 Court 1 2021-11-15T15:00:00 2021-11-15T16:00:00 2021-11-15 16:00:00
2 Court 1 2021-11-15T16:00:00 2021-11-15T21:00:00 2021-11-15 17:00:00
2 Court 1 2021-11-15T16:00:00 2021-11-15T21:00:00 2021-11-15 18:00:00
2 Court 1 2021-11-15T16:00:00 2021-11-15T21:00:00 2021-11-15 19:00:00
2 Court 1 2021-11-15T16:00:00 2021-11-15T21:00:00 2021-11-15 20:00:00
2 Court 1 2021-11-15T16:00:00 2021-11-15T21:00:00 2021-11-15 21:00:00
3 Court 2 2021-11-15T20:00:00 2021-11-15T21:00:00 2021-11-15 20:00:00
3 Court 2 2021-11-15T20:00:00 2021-11-15T21:00:00 2021-11-15 21:00:00
#added missing values between 7:00 and 23:00 if not exist
def f(x):
r = pd.date_range(x.index.min().normalize() + pd.Timedelta('7H'),
x.index.max().normalize() + pd.Timedelta('23H'), freq='H')
return x.reindex(r)
s = df1.set_index('date').groupby('court_name')['court_name'].apply(f)
#create groups for missing values and aggregate first with last
mask = s.notna()
df = (mask.cumsum()[~mask].reset_index(name='new')
.groupby(['court_name','new'])['level_1']
.agg(['min','max'])
.reset_index(level=1, drop=True))
#change by subtract and add 1 hour if not 7:00 and 23:00
df['min'] = df['min'].where(df['min'].dt.hour.eq(7), df['min'] - pd.Timedelta('1H'))
df['max'] = df['max'].where(df['max'].dt.hour.eq(23), df['max'] + pd.Timedelta('1H'))
print (df)
min max
court_name
Court 1 2021-11-15 07:00:00 2021-11-15 08:00:00
Court 1 2021-11-15 12:00:00 2021-11-15 15:00:00
Court 1 2021-11-15 21:00:00 2021-11-15 23:00:00
Court 2 2021-11-15 07:00:00 2021-11-15 20:00:00
Court 2 2021-11-15 21:00:00 2021-11-15 23:00:00
关于python - Pandas,如何找到互补的时间范围?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69930081/