我有一个 pandas 数据框,其中有日期时间(不在索引中,并且首选这种方式)。 我想将其上采样(重新采样)到指定的时间尺度,例如“10S”。并将字符串数据(即 Acitivty/Action/EPIC 等列)保留在 dataFrame 中。
Ind TIME_STAMP Activity Action Quantity EPIC Price Sub-activity Venue Position
0 2018-08-22 08:01:36 Allocation SELL 100.0 BB. 1.142200 CPTY 300AD -427.0
1 2018-08-22 08:02:17 Allocation BUY 15.0 BB. 1.152300 CPTY ZDDD02 -388.0
2 2018-08-22 08:24:51 Allocation SELL 60.0 BB. 1.165900 CPTY 666 -515.0
3 2018-08-22 09:07:59 NaN NaN NaN NaN 1.167921 NaN -515.0
4 2018-08-22 09:11:00 NaN NaN NaN NaN 1.174500 NaN
我尝试了几种不同的方法,即 dataFrame.asfreq(freq = '10S');和 dataFrame.resample('10S', on ='TIME_STAMP')
我真正想做的是 1) 将数据上采样为 10 秒 block ,保留原始数据,2) 使用列“TIME_STAMP”。 3) 之后,能够使用一些填充方法填充数值数据,例如 .fillna(method ='pad')
最佳答案
想法是通过 GroupBy.cumcount
创建辅助列,通过unstack
创建唯一的Datetimeindex
最后通过 stack
reshape 回来:
print (df)
TIME_STAMP Activity Action Quantity EPIC Price \
Ind
0 2018-08-22 08:01:36 Allocation SELL 100.0 BB. 1.142200
1 2018-08-22 08:01:36 Allocation BUY 15.0 BB. 1.152300
2 2018-08-22 08:01:51 Allocation SELL 60.0 BB. 1.165900
3 2018-08-22 08:02:59 NaN NaN NaN NaN 1.167921
4 2018-08-22 08:02:59 NaN NaN NaN NaN 1.174500
Sub-activity Venue Position
Ind
0 CPTY 300AD -427.0
1 CPTY ZDDD02 -388.0
2 CPTY 666 -515.0
3 NaN -515.0 NaN
4 NaN NaN NaN
df = (df.set_index(['TIME_STAMP', df.groupby('TIME_STAMP').cumcount()])
.unstack()
.asfreq('10S', method ='pad')
.stack()
.reset_index(level=1, drop=True)
.sort_index())
print (df)
Activity Action Quantity EPIC Price Sub-activity \
TIME_STAMP
2018-08-22 08:01:36 Allocation SELL 100.0 BB. 1.1422 CPTY
2018-08-22 08:01:36 Allocation BUY 15.0 BB. 1.1523 CPTY
2018-08-22 08:01:46 Allocation SELL 100.0 BB. 1.1422 CPTY
2018-08-22 08:01:46 Allocation BUY 15.0 BB. 1.1523 CPTY
2018-08-22 08:01:56 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:06 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:16 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:26 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:36 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:46 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:56 Allocation SELL 60.0 BB. 1.1659 CPTY
Venue Position
TIME_STAMP
2018-08-22 08:01:36 300AD -427.0
2018-08-22 08:01:36 ZDDD02 -388.0
2018-08-22 08:01:46 300AD -427.0
2018-08-22 08:01:46 ZDDD02 -388.0
2018-08-22 08:01:56 666 -515.0
2018-08-22 08:02:06 666 -515.0
2018-08-22 08:02:16 666 -515.0
2018-08-22 08:02:26 666 -515.0
2018-08-22 08:02:36 666 -515.0
2018-08-22 08:02:46 666 -515.0
2018-08-22 08:02:56 666 -515.0
关于python-3.x - python中的resample或asfreq pandas时间序列数据帧错误为 'Duplicate Index',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52001044/