python - Pandas,如何找到互补的时间范围?

标签 python pandas time-series transform

我有一个数据框,其中包含法庭不空闲的时间:

df = pd.DataFrame(
    [
        {'court_name': 'Court 1', 'reserved_fr': '2021-11-15T08:00:00', 'reserved_to': '2021-11-15T12:00:00'}, 
        {'court_name': 'Court 1', 'reserved_fr': '2021-11-15T15:00:00', 'reserved_to': '2021-11-15T16:00:00'}, 
        {'court_name': 'Court 1', 'reserved_fr': '2021-11-15T16:00:00', 'reserved_to': '2021-11-15T21:00:00'}, 
        {'court_name': 'Court 2', 'reserved_fr': '2021-11-15T20:00:00', 'reserved_to': '2021-11-15T21:00:00'}
    ]
)


|    | court_name   | reserved_fr         | reserved_to         |
|---:|:-------------|:--------------------|:--------------------|
|  0 | Court 1      | 2021-11-15T08:00:00 | 2021-11-15T12:00:00 |
|  1 | Court 1      | 2021-11-15T15:00:00 | 2021-11-15T16:00:00 |
|  2 | Court 1      | 2021-11-15T16:00:00 | 2021-11-15T21:00:00 |
|  3 | Court 2      | 2021-11-15T20:00:00 | 2021-11-15T21:00:00 |

如果每个法庭的工作时间是早上7点到晚上11点,我想知道法庭什么时候有空。

例如法庭是免费的:

Court 1     2021-11-15 07:00:00   2021-11-15 08:00:00
Court 1     2021-11-15 12:00:00   2021-11-15 15:00:00
Court 1     2021-11-15 21:00:00   2021-11-15 23:00:00
Court 2     2021-11-15 07:00:00   2021-11-15 20:00:00
Court 2     2021-11-15 21:00:00   2021-11-15 23:00:00

如何将数据帧转换为上述格式的另一个数据帧?

最佳答案

7:0023:00 之间没有定义确切日期的解决方案是:

#reshape for hours to one column date
L = [pd.date_range(s,e, freq='H') 
     for s, e in df[['reserved_fr','reserved_to']].to_numpy()]
df['date'] = L

df1 = df.explode('date').drop_duplicates(['court_name','date'])
print (df1)
  court_name          reserved_fr          reserved_to                date
0    Court 1  2021-11-15T08:00:00  2021-11-15T12:00:00 2021-11-15 08:00:00
0    Court 1  2021-11-15T08:00:00  2021-11-15T12:00:00 2021-11-15 09:00:00
0    Court 1  2021-11-15T08:00:00  2021-11-15T12:00:00 2021-11-15 10:00:00
0    Court 1  2021-11-15T08:00:00  2021-11-15T12:00:00 2021-11-15 11:00:00
0    Court 1  2021-11-15T08:00:00  2021-11-15T12:00:00 2021-11-15 12:00:00
1    Court 1  2021-11-15T15:00:00  2021-11-15T16:00:00 2021-11-15 15:00:00
1    Court 1  2021-11-15T15:00:00  2021-11-15T16:00:00 2021-11-15 16:00:00
2    Court 1  2021-11-15T16:00:00  2021-11-15T21:00:00 2021-11-15 17:00:00
2    Court 1  2021-11-15T16:00:00  2021-11-15T21:00:00 2021-11-15 18:00:00
2    Court 1  2021-11-15T16:00:00  2021-11-15T21:00:00 2021-11-15 19:00:00
2    Court 1  2021-11-15T16:00:00  2021-11-15T21:00:00 2021-11-15 20:00:00
2    Court 1  2021-11-15T16:00:00  2021-11-15T21:00:00 2021-11-15 21:00:00
3    Court 2  2021-11-15T20:00:00  2021-11-15T21:00:00 2021-11-15 20:00:00
3    Court 2  2021-11-15T20:00:00  2021-11-15T21:00:00 2021-11-15 21:00:00

#added missing values between 7:00 and 23:00 if not exist
def f(x):
    r = pd.date_range(x.index.min().normalize() + pd.Timedelta('7H'),
                      x.index.max().normalize() + pd.Timedelta('23H'), freq='H')
    return x.reindex(r)
        
    
s = df1.set_index('date').groupby('court_name')['court_name'].apply(f)

#create groups for missing values and aggregate first with last
mask = s.notna()
df = (mask.cumsum()[~mask].reset_index(name='new')
          .groupby(['court_name','new'])['level_1']
          .agg(['min','max'])
          .reset_index(level=1, drop=True))

#change by subtract and add 1 hour if not 7:00 and 23:00
df['min'] = df['min'].where(df['min'].dt.hour.eq(7), df['min'] - pd.Timedelta('1H'))
df['max'] = df['max'].where(df['max'].dt.hour.eq(23), df['max'] + pd.Timedelta('1H'))

print (df)
                           min                 max
court_name                                        
Court 1    2021-11-15 07:00:00 2021-11-15 08:00:00
Court 1    2021-11-15 12:00:00 2021-11-15 15:00:00
Court 1    2021-11-15 21:00:00 2021-11-15 23:00:00
Court 2    2021-11-15 07:00:00 2021-11-15 20:00:00
Court 2    2021-11-15 21:00:00 2021-11-15 23:00:00

关于python - Pandas,如何找到互补的时间范围?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69930081/

相关文章:

python - 如何在 Django 中实现地址表单?这就是我所拥有的

python - 将 Matplotlib 绘图绘制到 Psychopy 循环中

python - 使用列表中的名称创建数据框

python - 如何找到最小长度为 3 的所有可能的连续且不重叠的子列表

r - 如何用R创建时间散点图?

具有 ReferenceProperty 和连接表的 Python 模型

python - mysql-python 停止工作

python - 在pandas中获取时间戳最接近给定日期时间的行的有效方法

python - 这种方法是否为 "vectorized"- 用于中等数据集,速度相对较慢

python - 仅在二级索引上的 Pandas 多索引切片