我正在检索以下数据帧的 cummax() 值,
exit_price trend netgain high low MFE_pr
exit_time
2000-02-01 01:00:00 1400.25 -1 1.00 1401.50 1400.25 1400.25
2000-02-01 01:30:00 1400.75 -1 0.50 1401.00 1399.50 1399.50
2000-02-01 02:00:00 1400.00 -1 1.25 1401.00 1399.75 1399.50
2000-02-01 02:30:00 1399.25 -1 2.00 1399.75 1399.25 1399.25
2000-02-01 03:00:00 1399.50 -1 1.75 1400.00 1399.50 1399.25
2000-02-01 03:30:00 1398.25 -1 3.00 1399.25 1398.25 1398.25
2000-02-01 04:00:00 1398.75 -1 2.50 1399.00 1398.25 1398.25
2000-02-01 04:30:00 1400.00 -1 1.25 1400.25 1399.00 1398.25
2000-02-01 05:00:00 1400.25 -1 1.00 1400.50 1399.25 1398.25
2000-02-01 05:30:00 1400.50 -1 0.75 1400.75 1399.50 1398.25
用下面的公式
trade ['MFE_pr'] = np.nan
trade ['MFE_pr'] = trade ['MFE_pr'].where(trade ['trend']<0, trade.high.cummax())
trade ['MFE_pr'] = trade ['MFE_pr'].where(trade ['trend']>0, trade.low.cummin())
现在我想检索从每一行中获取 cummax() 的行的时间戳。
我一直在尝试以下方法:
trade['timestamp']= trade.index
trade ['MFE_ts'] = trade.groupby('MFE_pr')['timestamp'].first()
但我收到的结果是:
exit_price trend netgain high low MFE_pr \
exit_time
2000-02-01 01:00:00 1400.25 -1 1.00 1401.50 1400.25 1400.25
2000-02-01 01:30:00 1400.75 -1 0.50 1401.00 1399.50 1399.50
2000-02-01 02:00:00 1400.00 -1 1.25 1401.00 1399.75 1399.50
2000-02-01 02:30:00 1399.25 -1 2.00 1399.75 1399.25 1399.25
2000-02-01 03:00:00 1399.50 -1 1.75 1400.00 1399.50 1399.25
2000-02-01 03:30:00 1398.25 -1 3.00 1399.25 1398.25 1398.25
2000-02-01 04:00:00 1398.75 -1 2.50 1399.00 1398.25 1398.25
2000-02-01 04:30:00 1400.00 -1 1.25 1400.25 1399.00 1398.25
2000-02-01 05:00:00 1400.25 -1 1.00 1400.50 1399.25 1398.25
2000-02-01 05:30:00 1400.50 -1 0.75 1400.75 1399.50 1398.25
timestamp MFE_ts
exit_time
2000-02-01 01:00:00 2000-02-01 01:00:00 NaT
2000-02-01 01:30:00 2000-02-01 01:30:00 NaT
2000-02-01 02:00:00 2000-02-01 02:00:00 NaT
2000-02-01 02:30:00 2000-02-01 02:30:00 NaT
2000-02-01 03:00:00 2000-02-01 03:00:00 NaT
2000-02-01 03:30:00 2000-02-01 03:30:00 NaT
2000-02-01 04:00:00 2000-02-01 04:00:00 NaT
2000-02-01 04:30:00 2000-02-01 04:30:00 NaT
2000-02-01 05:00:00 2000-02-01 05:00:00 NaT
2000-02-01 05:30:00 2000-02-01 05:30:00 NaT
我做错了什么?
最佳答案
现在,它计算并返回每个组中第一个值的结果。
trade.groupby('MFE_pr')['timestamp'].first()
MFE_pr
1398.25 2000-02-01 03:30:00
1399.25 2000-02-01 02:30:00
1399.50 2000-02-01 01:30:00
1400.25 2000-02-01 01:00:00
Name: timestamp, dtype: datetime64[ns]
因此,当您尝试将其重新索引回原始 DF
(通过将此值分配给新列)时,它会导致 NaTs
像它们一样被创建没有一个共同的索引来重建索引:
trade.groupby('MFE_pr')['timestamp'].first().reindex(trade.index)
exit_time
2000-02-01 01:00:00 NaT
2000-02-01 01:30:00 NaT
2000-02-01 02:00:00 NaT
2000-02-01 02:30:00 NaT
2000-02-01 03:00:00 NaT
2000-02-01 03:30:00 NaT
2000-02-01 04:00:00 NaT
2000-02-01 04:30:00 NaT
2000-02-01 05:00:00 NaT
2000-02-01 05:30:00 NaT
Name: timestamp, dtype: datetime64[ns]
你需要transform
相反,它将这些计算值累积应用于分组系列中存在的所有行,从而保持原始 DF
的形状不变:
trade['MFE_ts'] = trade.groupby('MFE_pr')['timestamp'].transform('first')
trade
关于python pandas - groupby.first() 返回 NaT 值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40820552/