我有一个促销描述数据集,其中包含有关正在运行的各种促销及其开始日期和结束日期的信息:
promo item start_date end_date
Buy1-get 1 A 2015-01-08 2015-01-12
Buy1-get 1 A 2015-02-16 2015-02-20
Buy1-40% off B 2016-05-08 2016-05-09
现在我想组织我的数据以供后续分析,这样我就只有带有促销信息的单个可变日期。
date item Promo
2015-01-08 A Buy1-get 1
2015-01-09 A Buy1-get 1
2015-01-10 A ......
2015-01-11 ....
2015-01-12
2015-02-16 A Buy1-get 1
2015-02-17 A Buy1-get 1
2015-02-18 .... .......
2015-02-19 .....
..........
2016-05-09 B Buy1-40% off
非常感谢任何帮助。
最佳答案
您可以使用 concat
date_range
创建的所有系列
与 itertuples
然后 join
promo
和 item
列:
df1 = pd.concat([pd.Series(r.Index,
pd.date_range(r.start_date,r.end_date)) for r in df.itertuples()])
.reset_index()
df1.columns = ['date','idx']
df1 = df1.set_index('idx')
df1 = df1.join(df[['item','promo']]).reset_index(drop=True)
print (df1)
date item promo
0 2015-01-08 A Buy1-get 1
1 2015-01-09 A Buy1-get 1
2 2015-01-10 A Buy1-get 1
3 2015-01-11 A Buy1-get 1
4 2015-01-12 A Buy1-get 1
5 2015-02-16 A Buy1-get 1
6 2015-02-17 A Buy1-get 1
7 2015-02-18 A Buy1-get 1
8 2015-02-19 A Buy1-get 1
9 2015-02-20 A Buy1-get 1
10 2016-05-08 B Buy1-40% off
11 2016-05-09 B Buy1-40% off
另一种解决方案 melt
和 groupby with resample :
df1 = df.reset_index().rename(columns={'index':'idx'})
df1 = pd.melt(df1, id_vars='idx', value_vars=['start_date','end_date'], value_name='date')
.set_index('date')
df1 = df1.groupby('idx')
.resample('d')
.ffill()
.reset_index(level=1)
.drop(['idx','variable'], axis=1)
df1 = df1.join(df[['item','promo']]).reset_index(drop=True)
print (df1)
date item promo
0 2015-01-08 A Buy1-get 1
1 2015-01-09 A Buy1-get 1
2 2015-01-10 A Buy1-get 1
3 2015-01-11 A Buy1-get 1
4 2015-01-12 A Buy1-get 1
5 2015-02-16 A Buy1-get 1
6 2015-02-17 A Buy1-get 1
7 2015-02-18 A Buy1-get 1
8 2015-02-19 A Buy1-get 1
9 2015-02-20 A Buy1-get 1
10 2016-05-08 B Buy1-40% off
11 2016-05-09 B Buy1-40% off
关于python - 数据操作开始日期结束日期 python pandas,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41542769/