python - Pandas :如何 groupby/pivot 保留 NaNs？将 float 转换为 str 然后再转换回 float 有效但看起来很复杂

我正在跟踪某个事件发生在哪个“月份”。如果没有，则“月”字段为 NaN。起始表如下所示:

+-------+----------+---------+
| Month | Category | Balance |
+-------+----------+---------+
| 1     | a        |     100 |
| nan   | a        |     300 |
| 2     | a        |     200 |
+-------+----------+---------+

我正在尝试构建这样的交叉表:

+-------+----------------------------------+
| Month | Category a - cumulative % amount |
+-------+----------------------------------+
|     1 |                             0.16 |
|     2 |                             0.50 |
+-------+----------------------------------+

在第 1 个月，事件发生了 100/600，即 16% 在第 2 个月，事件累计发生了 (100 + 200)/600 = 50%，其中 100 发生在第 1 个月，200 发生在第 2 个月。

我的问题是 NaN。 Pandas 自动从任何 groupby/pivot/crosstab 中删除 NaN。我可以将月份字段转换为字符串，以便对其进行分组不会删除 NaN，但是 pandas 然后按月份排序，就好像它是一个字符串一样，即它会排序:10、48、5、6。

有什么建议吗？

以下工作但似乎非常复杂:

将“月”转换为字符串
做一个交叉表
将“月”转换回 float (我可以不将索引移动到列，然后将列返回索引吗？)
重新排序
射精

代码:

import numpy as np
import pandas as pd

df = pd.DataFrame()
mylen = int(10e3)
df['ix'] = np.arange(0,mylen)
df['amount'] = np.random.uniform(10e3,20e3,mylen)
df['category'] = np.where( df['ix'] <=4000, 'a','b' )
df['month'] = np.random.uniform(3,48,mylen)
df['month'] = np.where( df['ix'] <=1000, np.nan, df['month'] )
df['month rounded'] = np.ceil(df['month'])

ct = pd.crosstab(df['month rounded'].astype(str) , df['category'], \
                 values = df['amount'] ,aggfunc = 'sum', margins = True ,\
                     normalize = 'columns', dropna = False)
    
# the index is 'month rounded'
ct = ct.reset_index()
ct['month rounded'] = ct['month rounded'].astype('float32')
ct = ct.sort_values('month rounded')
ct = ct.set_index('month rounded')
ct2 = ct.cumsum (axis = 0)

最佳答案

使用:

new_df = df.assign(cumulative=df['Balance'].mask(df['Month'].isna())
                                           .groupby(df['Category'])
                                           .cumsum()
                                           .div(df.groupby('Category')['Balance']
                                                  .transform('sum'))).dropna()
print(new_df)
   Month Category  Balance  cumulative
0    1.0        a      100    0.166667
2    2.0        a      200    0.500000

如果您想为每个类别创建一个DataFrame，您可以创建一个字典:

df_category = {i:group for i,group in new_df.groupby('Category')}

关于python - Pandas :如何 groupby/pivot 保留 NaNs？将 float 转换为 str 然后再转换回 float 有效但看起来很复杂，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60159973/

python - Pandas :如何 groupby/pivot 保留 NaNs？将 float 转换为 str 然后再转换回 float 有效但看起来很复杂

上一篇：python - 直接在线运行一个Jupyter notebook(不用本地下载)

下一篇：python - Pandas 在 MultiIndex DataFrame 中选择特定的低级列