这是我第一次在这里提问,所以我希望我能做对!
我有一个 Pandas 数据框:
df2.data
Out[66]:
date
2016-01-02 0.0
2016-01-03 1.0
2016-01-04 1.0
2016-01-05 1.0
2016-01-06 0.0
2016-01-07 0.0
2016-01-08 1.0
2016-01-09 2.0
2016-01-10 1.0
2016-01-11 0.0
Name: data, dtype: float64
我想要以下结果:
data trend trend_type
date
2016-01-02 0.0 0 0
2016-01-03 1.0 0 0
2016-01-04 1.0 1 1
2016-01-05 1.0 2 1
2016-01-06 0.0 0 0
2016-01-07 0.0 1 0
2016-01-08 1.0 0 0
2016-01-09 2.0 0 0
2016-01-10 1.0 0 0
2016-01-11 0.0 0 0
我的问题与 How to use pandas to find consecutive same data in time series 有点相关.
到目前为止,我设法掌握了趋势,但效率不够高(对于 750 行数据帧大约需要 8 秒)
df['grp'] = (df.close.diff(1) == 0).astype('int')
df['trend'] = 0
start_time = time.time()
for i in range(2, len(df['grp'])):
if df.grp.iloc[i] == 1:
df['trend'].iloc[i] = df['trend'].iloc[i-1] + 1
最佳答案
第一步
要获得 trend
,请执行 groupby
+ cumcount
-
df['trend'] = df.data.groupby(df.data.ne(df.data.shift()).cumsum()).cumcount()
df
data trend
2016-01-02 0.0 0
2016-01-03 1.0 0
2016-01-04 1.0 1
2016-01-05 1.0 2
2016-01-06 0.0 0
2016-01-07 0.0 1
2016-01-08 1.0 0
2016-01-09 2.0 0
2016-01-10 1.0 0
2016-01-11 0.0 0
第 2 步
(IIUC),要获取 trend_type
,比较连续的行并分配。
df['trend_type'] = 0
m = df.data.eq(df.data.shift())
df.loc[m, 'trend_type'] = df.loc[m, 'data']
df
data trend trend_type
2016-01-02 0.0 0 0.0
2016-01-03 1.0 0 0.0
2016-01-04 1.0 1 1.0
2016-01-05 1.0 2 1.0
2016-01-06 0.0 0 0.0
2016-01-07 0.0 1 0.0
2016-01-08 1.0 0 0.0
2016-01-09 2.0 0 0.0
2016-01-10 1.0 0 0.0
2016-01-11 0.0 0 0.0
关于python - 计算时间序列中的连续值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47794195/