python - Pandas系列在Python中按月份索引排序(时间序列不同)

我有一个 Series 对象，其中包含:

df = 
    index              value
2014-05-23 07:00:00     0.67
2014-05-23 07:30:00     0.47
2014-05-23 08:00:00     0.42
2014-05-23 08:30:00     0.80
....

2017-07-10 22:00:00     0.42
2017-07-10 22:30:00     0.79
2017-07-10 23:00:00     0.84
2017-07-10 23:30:00     Nan

我想计算一年的平均值，然后按月分组，所以数据框看起来像这样，

df_new = 
  index                    value
   Jan      {0.11, 0.5, 0.3, 0.99, ... ,0.13} <-  time step of each value is 
   Feb      {...............................}     still 30 min, and each 
   Mar      {...............................}     value is average of same 
   Apr      {...............................}     time in the other year.  
   ....
   Dec      {...............................}

我有一些像这样的数据帧，但具有不同的时间间隔(15分钟，60分钟...)，有没有更好的方法自动计算它？例如像函数一样，它会自动从索引中知道时间步长。提前致谢!

最佳答案

我认为首先需要通过 resample 进行上采样或下采样:

#upsample
s = s.resample('15Min').ffill()
#downsample
#s = s.resample('60Min').mean()
#if already 30 minutes values no resample necessary

然后groupby按 月 秒按 DatetimeIndex.strftime转换为ordered Categorical和 DatetimeIndex.time ，聚合均值和最后一次 reshape unstack :

cats = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
months = pd.Categorical(s.index.strftime('%b'), categories=cats, ordered=True)
df = s.groupby([months, s.index.time]).mean().unstack()

关于python - Pandas系列在Python中按月份索引排序(时间序列不同)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49601166/

python - Pandas系列在Python中按月份索引排序(时间序列不同)

上一篇：python - 如何使用 pandas 选择组中的前一行？

下一篇：python - Django 计数聚合与相关字段上的过滤器不起作用