python-3.x - concat 后的 groupby，组中缺少的列均值

连接两个数据帧，然后 groupby 'type' 并计算平均值，第二个 df 的列，即 d1~d10，显示在连接的数据帧中，但不在分组平均值中。我可能错过了一些 pt... 请指出...这里是代码。

results=pd.concat([stockpicks,stock_analysis],axis=1)
print(stockpicks.head(5))
print(stock_analysis.head(5))
print(results.head(5))


results_typed=results.groupby('type')
mean_overall=results_typed.mean()

print(mean_overall)

在输出下方。

         date  type stocknum  price      pe
0  2014-02-17  cao3  0326.HK   0.20   20.00
1  2014-02-17  cao3  0536.HK   2.56   25.60
2  2014-02-17  cao3  0595.HK   0.97   48.50
3  2014-02-17  cao3  0698.HK   0.95   15.83
4  2014-02-17  cao3  0759.HK   3.25  108.33

[5 rows x 5 columns]
         d1        d2        d5       d10
0        95        95        95        90
1  99.21875       100  97.65625   89.0625
2       100  107.2165  104.1237  93.81443
3  102.1053  97.89474  97.89474  105.2632
4  95.38462  94.15385        92  90.15385

[5 rows x 4 columns]
         date  type stocknum  price      pe        d1        d2        d5  \
0  2014-02-17  cao3  0326.HK   0.20   20.00        95        95        95   
1  2014-02-17  cao3  0536.HK   2.56   25.60  99.21875       100  97.65625   
2  2014-02-17  cao3  0595.HK   0.97   48.50       100  107.2165  104.1237   
3  2014-02-17  cao3  0698.HK   0.95   15.83  102.1053  97.89474  97.89474   
4  2014-02-17  cao3  0759.HK   3.25  108.33  95.38462  94.15385        92   

        d10  
0        90  
1   89.0625  
2  93.81443  
3  105.2632  
4  90.15385  

[5 rows x 9 columns]
          price         pe
type                      
bbom   2.050526   8.135789
bbos   3.136842  10.116316
cao3   1.717368  36.494211
maos   6.661935  20.565161
rscp  48.983333   6.280000

[5 rows x 2 columns]

实际上我一直在重写代码。在我通过重新索引扩展第一个 df，然后将值分配给扩展的 df，groupby 'type' 并计算分组平均值之前，完全没问题......

我在 ubuntu 上使用 python3.3 和 pandas 13.1。

计算 d 值的语句

days=[1,2,5,10]
p0=stockprice[p0_date]
stock_pct_change={('d'+str(d)):stockprice[p0_date+d]/p0*100.0 if (p0_date+d)< len(trading_days) else np.nan for d in days }

最佳答案

缺少的列是因为它们是字符串/日期/对象，并且计算这些列的平均值毫无意义。

在我看来，您的 d1、d2、d5 和 d10 列是字符串，因为输出将它们显示为整数/浮点数的混合。

如果要计算这些列的平均值，请像这样更改它们的 dtype:

df.d1 = df.d1.astype(np.float64) 
# do the same for d2..etc..

关于python-3.x - concat 后的 groupby，组中缺少的列均值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22526237/

python-3.x - concat 后的 groupby，组中缺少的列均值

上一篇：redirect - Blogger/Blogspot 重定向到自己的域？

下一篇：google-cast - 没有媒体的 Chromecast 定制接收器