我有 2 个数据帧,如果 df_B["Date"]
介于 df_B["From_Date"]
和 df_B["To_Date"]
df_A: df_B:
Date Cycle From_Date To_Date
07.02.2021 C01 07.02.2021 13.02.2021
08.02.2021 C01 14.02.2021 27.02.2021
14.02.2021 C02 28.02.2021 03.03.2021
15.06.2021 C02
28.02.2021 C03
Desired Output:
Df B:
From_Date To_Date Cycle
07.02.2021 13.02.2021 C01
14.02.2021 27.02.2021 C02
28.02.2021 03.03.2021 C03
到目前为止,我尝试使用 np.dot,但它返回一个形状 - 值错误。我在网上找到了这段代码
s1=Promo_Data["Date From"].values
s2=Promo_Data["Date to"].values
s=Cycle_Mapping["Date"].values[:,None]
Promo_Data["Cyc"]=np.dot((s>=s1)&(s<=s2),Cycle_Mapping["Cycle"])
最佳答案
df1:
Date Cycle
0 2021-02-07 C01
1 2021-02-08 C01
2 2021-02-14 C02
3 2021-06-15 C02
4 2021-02-28 C03
df2:
From_Date To_Date
0 2021-02-07 2021-02-13
1 2021-02-14 2021-02-27
2 2021-02-28 2021-03-03
首先,让我们make sure that dates are of datetime type :
df1['Date'] = pd.to_datetime(df1['Date'], format='%d.%m.%Y')
df2['From_Date'] = pd.to_datetime(df2['From_Date'], format='%d.%m.%Y')
df2['To_Date'] = pd.to_datetime(df2['To_Date'], format='%d.%m.%Y')
构造 IntervalIndex对于 df2:
>>> df2.index = pd.IntervalIndex.from_arrays(df2['From_Date'], df2['To_Date'],closed='both')
>>> df2
From_Date To_Date
[2021-02-07, 2021-02-13] 2021-02-07 2021-02-13
[2021-02-14, 2021-02-27] 2021-02-14 2021-02-27
[2021-02-28, 2021-03-03] 2021-02-28 2021-03-03
定义函数将 df1 中的日期映射到 df2 中的日期范围,并计算 df1 中的新列以存储此范围:
def get_date(d):
try:
return df2.loc[d].name
except KeyError:
pass
df1['index'] = df1['Date'].apply(get_date)
输出:
Date Cycle index
0 2021-02-07 C01 [2021-02-07, 2021-02-13]
1 2021-02-08 C01 [2021-02-07, 2021-02-13]
2 2021-02-14 C02 [2021-02-14, 2021-02-27]
3 2021-06-15 C02 NaN
4 2021-02-28 C03 [2021-02-28, 2021-03-03]
合并“索引”上的两个数据框并过滤列:
df2.reset_index().merge(df1, on='index')[['From_Date', 'To_Date', 'Cycle']]
From_Date To_Date Cycle
0 2021-02-07 2021-02-13 C01
1 2021-02-07 2021-02-13 C01
2 2021-02-14 2021-02-27 C02
3 2021-02-28 2021-03-03 C03
如果您真的只想合并每个范围的第一个 df1 值,您可以分组并保留第一个值,假设合并现在是 df3:
df3.groupby(['From_Date', 'To_Date'], as_index=False).first()
输出:
From_Date To_Date Cycle
0 2021-02-07 2021-02-13 C01
1 2021-02-14 2021-02-27 C02
2 2021-02-28 2021-03-03 C03
完整代码:
df1 = pd.DataFrame({'Date': ['02.07.2021', '08.02.2021', '14.02.2021', '15.06.2021', '28.02.2021'],
'Cycle': ['C01', 'C01', 'C02', 'C02', 'C03']})
df2 = pd.DataFrame({'From_Date': ['07.02.2021', '14.02.2021', '28.02.2021'],
'To_Date': ['13.02.2021', '27.02.2021', '03.03.2021']})
df1['Date'] = pd.to_datetime(df1['Date'], format='%d.%m.%Y')
df2['From_Date'] = pd.to_datetime(df2['From_Date'], format='%d.%m.%Y')
df2['To_Date'] = pd.to_datetime(df2['To_Date'], format='%d.%m.%Y')
df2.index = pd.IntervalIndex.from_arrays(df2['From_Date'], df2['To_Date'], closed='both')
def get_date(d):
try:
return df2.loc[d].name
except KeyError:
pass
df1['index'] = df1['Date'].apply(get_date)
df3 = df2.reset_index().merge(df1, on='index')[['From_Date', 'To_Date', 'Cycle']]
df3.groupby(['From_Date', 'To_Date'], as_index=False).first()
关于python - 如果一个数据框中的日期在另一个数据框中的范围内,则分配值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68309651/