是以下 python pandas DataFrame:
| num_ID | start_date | end_date | time |
| ------ | ----------- | ---------- | ----------------- |
| 1 | 2022-02-10 | 2022-02-11 | 0 days 09:23:00 |
| 1 | 2022-02-12 | 2022-02-15 | 2 days 12:23:00 |
| 2 | 2022-02-12 | 2022-02-15 | 2 days 10:23:00 |
| 2 | 2022-02-05 | 2022-02-27 | 22 days 02:35:00 |
| 3 | 2022-02-04 | 2022-02-06 | 1 days 19:55:00 |
| 3 | 2022-02-12 | 2022-02-15 | 2 days 05:21:00 |
| 3 | 2022-02-12 | 2022-02-15 | 2 days 05:15:00 |
下面的 DataFrame 包含连续日期以及 is_holiday
列中各自的假期值。
| date | is_holiday | name | other |
| ---------- | ---------- | ---- | ----- |
| 2022-01-01 | True | ABC | red |
| 2022-01-02 | False | CNA | blue |
...
# we assume in this case that the omitted rows have the value False in column
| 2022-02-15 | True | OOO | red |
| 2022-02-16 | True | POO | red |
| 2022-02-17 | False | KTY | blue |
...
| 2023-12-30 | False | TTE | white |
| 2023-12-31 | True | VVV | red |
我想向初始 DataFrame 添加一个新列 total_days
,该列指示第二个 DataFrame 中每行在两个日期(start_date
和结束日期
)。
输出结果示例:
| num_ID | start_date | end_date | time | total_days |
| ------ | ----------- | ---------- | ----------------- | -------------- |
| 1 | 2022-02-10 | 2022-02-11 | 0 days 09:23:00 | 0 |
| 1 | 2022-02-12 | 2022-02-15 | 2 days 12:23:00 | 1 |
| 2 | 2022-02-12 | 2022-02-15 | 2 days 10:23:00 | 1 |
| 2 | 2022-02-05 | 2022-02-27 | 22 days 02:35:00 | 2 |
| 3 | 2022-02-04 | 2022-02-06 | 1 days 19:55:00 | 0 |
| 3 | 2022-02-12 | 2022-02-15 | 2 days 05:21:00 | 1 |
| 3 | 2022-02-12 | 2022-02-15 | 2 days 05:15:00 | 1 |
编辑:@jezrael 提供的解决方案通过按之前的时间间隔分组来添加更多天数。错误结果:
| num_ID | start_date | end_date | time | total_days |
| ------ | ----------- | ---------- | ----------------- | -------------- |
| 1 | 2022-02-10 | 2022-02-11 | 0 days 09:23:00 | 0 |
| 1 | 2022-02-12 | 2022-02-15 | 2 days 12:23:00 | 3 |
| 2 | 2022-02-12 | 2022-02-15 | 2 days 10:23:00 | 3 |
| 2 | 2022-02-05 | 2022-02-27 | 22 days 02:35:00 | 2 |
| 3 | 2022-02-04 | 2022-02-06 | 1 days 19:55:00 | 0 |
| 3 | 2022-02-12 | 2022-02-15 | 2 days 05:21:00 | 3 |
新编辑:@jezrael 提供的新解决方案提供了另一个错误:
| num_ID | start_date | end_date | time | total_days |
| ------ | ----------- | ---------- | ----------------- | -------------- |
| 1 | 2022-02-10 | 2022-02-11 | 0 days 09:23:00 | 0 |
| 1 | 2022-02-12 | 2022-02-15 | 2 days 12:23:00 | 1 |
| 2 | 2022-02-12 | 2022-02-15 | 2 days 10:23:00 | 1 |
| 2 | 2022-02-05 | 2022-02-27 | 22 days 02:35:00 | 2 |
| 3 | 2022-02-04 | 2022-02-06 | 1 days 19:55:00 | 0 |
| 3 | 2022-02-12 | 2022-02-15 | 2 days 05:21:00 | 2 |
| 3 | 2022-02-12 | 2022-02-15 | 2 days 05:15:00 | 2 |
最佳答案
编辑:因为需要单独计算每行匹配的日期
创建date_range
并按 Index.isin
计算匹配值与总和
:
L = df1.loc[df1['is_holiday'], 'date'].tolist()
df['total_holidays'] = [pd.date_range(s, e).isin(L).sum()
for s, e in zip(df['start_date'], df['end_date'])]
print (df)
num_ID start_date end_date time total_holidays
0 1 2022-02-10 2022-02-11 0 days 09:23:00 0
1 1 2022-02-12 2022-02-15 2 days 12:23:00 1
2 2 2022-02-12 2022-02-15 2 days 10:23:00 1
3 2 2022-02-05 2022-02-27 2 days 02:35:00 2
4 3 2022-02-04 2022-02-06 1 days 19:55:00 0
5 3 2022-02-12 2022-02-15 2 days 05:21:00 1
6 3 2022-02-12 2022-02-15 2 days 05:21:00 1
Index.intersection
之后索引长度的另一个想法:
L = df1.loc[df1['is_holiday'], 'date'].tolist()
df['total_holidays'] = [len(pd.date_range(s, e).intersection(L))
for s, e in zip(df['start_date'], df['end_date'])]
print (df)
num_ID start_date end_date time total_holidays
0 1 2022-02-10 2022-02-11 0 days 09:23:00 0
1 1 2022-02-12 2022-02-15 2 days 12:23:00 1
2 2 2022-02-12 2022-02-15 2 days 10:23:00 1
3 2 2022-02-05 2022-02-27 2 days 02:35:00 2
4 3 2022-02-04 2022-02-06 1 days 19:55:00 0
5 3 2022-02-12 2022-02-15 2 days 05:21:00 1
6 3 2022-02-12 2022-02-15 2 days 05:21:00 1
或者集合的交集:
sets = set(df1.loc[df1['is_holiday'], 'date'])
df['total_holidays'] = [len(set(pd.date_range(s, e)) & sets)
for s, e in zip(df['start_date'], df['end_date'])]
print (df)
num_ID start_date end_date time total_holidays
0 1 2022-02-10 2022-02-11 0 days 09:23:00 0
1 1 2022-02-12 2022-02-15 2 days 12:23:00 1
2 2 2022-02-12 2022-02-15 2 days 10:23:00 1
3 2 2022-02-05 2022-02-27 2 days 02:35:00 2
4 3 2022-02-04 2022-02-06 1 days 19:55:00 0
5 3 2022-02-12 2022-02-15 2 days 05:21:00 1
6 3 2022-02-12 2022-02-15 2 days 05:21:00 1
关于python - 另一个 DataFrame 的两个不同日期之间满足特定条件的总行数的计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73985190/