python - 在 Pandas 中逐行比较一个数据帧中的日期列值与另一个数据帧中的两个日期列

我有一个像这样的数据框，其中有两个日期列和一个质量列:

     start_date       end_date          qty
1    2018-01-01      2018-01-08         23
2    2018-01-08      2018-01-15         21           
3    2018-01-15      2018-01-22         5
4    2018-01-22      2018-01-29         12

我有第二个数据框，其中仅包含几年假期的列，如下所示:

         holiday
1       2018-01-01 
2       2018-01-27
3       2018-12-25
4       2018-12-26

我想逐行浏览第一个数据帧，如果第二个数据帧中的日期落在第一个日期帧的日期值之间，则将 bool 值分配给新列假期。结果如下所示:

  start_date       end_date          qty         holidays
1    2018-01-01      2018-01-08         23       True
2    2018-01-08      2018-01-15         21       False  
3    2018-01-15      2018-01-22         5        False
4    2018-01-22      2018-01-29         12       True

当我尝试使用 for 循环执行此操作时，出现以下错误:

ValueError: Can only compare identically-labeled Series objects

如果能得到答复，我们将不胜感激。

最佳答案

如果您想要完全矢量化的解决方案，请考虑使用底层 numpy 数组:

import numpy as np


def holiday_arr(start, end, holidays):
    start = start.reshape((-1, 1))
    end = end.reshape((-1, 1))
    holidays = holidays.reshape((1, -1))
    result = np.any(
        (start <= holiday) & (holiday <= end),
        axis=1
    )
    return result

如果您有如上所述的数据帧(称为 df1 和 df2)，您可以通过运行以下命令获得所需的结果:

df1["contains_holiday"] = holiday_arr(
    df1["start_date"].to_numpy(),
    df1["end_date"].to_numpy(),
    df2["holiday"].to_numpy()
)

df1 然后看起来像:

  start_date   end_date  qty  contains_holiday
1 2018-01-01 2018-01-08   23              True
2 2018-01-08 2018-01-15   21             False
3 2018-01-15 2018-01-22    5             False
4 2018-01-22 2018-01-29   12              True

关于python - 在 Pandas 中逐行比较一个数据帧中的日期列值与另一个数据帧中的两个日期列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59655017/

python - 在 Pandas 中逐行比较一个数据帧中的日期列值与另一个数据帧中的两个日期列

上一篇：python - 如何转换字符串中的字符？ -Hackerrank挑战

下一篇：不带时间戳模块名称的 Python Absl 日志记录