python - 具有阶跃函数的聚合加权和

标签 python pandas

我有一个如下所示的数据框:

我在末尾附加了一个聚合行,它计算列的平均值,同时忽略空值。在这里查看我的代码:

repayments_amt_pivot.loc['Aggregated'] = repayments_amt_pivot.iloc[:, 3:].mean(skipna=True)

但是,我实际上需要计算百分比的加权和乘以 principal_due_per_month 中的比例,而不是聚合行的简单平均值。

在这种情况下,对于第 4 个月,第 0 行将乘以 (27,845/27,845 + 310,506 + 659,705 + 1,433,121)。

对于第 3 个月,第 4 行将乘以 (1,941,036/27,845 + 310,506 + 659,705 + 1,433,121 + 1,941,036)

等等

任何帮助将不胜感激,因为我无法弄清楚这个问题!

excel计算方法截图见下图

数据:

    pd.DataFrame([{'$ Amount Due': 27845.312793586978,
  'Month 0': 56.479872661140476,
  'Month 1': 92.94027983726657,
  'Month 2': 100.00000000000003,
  'Month 3': 100.00000000000003,
  'Month 4': 100.00000000000003},
 {'$ Amount Due': 310505.5597382864,
  'Month 0': 78.34839385064039,
  'Month 1': 79.58303224427453,
  'Month 2': 79.58303224427453,
  'Month 3': 81.43498983472573,
  'Month 4': 92.54673537743292},
 {'$ Amount Due': 659705.2173778547,
  'Month 0': 90.79718901057414,
  'Month 1': 97.8418387417451,
  'Month 2': 97.85768670717538,
  'Month 3': 97.85768670717538,
  'Month 4': 97.85768670717538},
 {'$ Amount Due': 1433121.318250646,
  'Month 0': 91.7207168764003,
  'Month 1': 94.34283888419282,
  'Month 2': 94.51326381568556,
  'Month 3': 94.8581612152927,
  'Month 4': 94.91544740629973},
 {'$ Amount Due': 1941036.1276433321,
  'Month 0': 79.75029644420579,
  'Month 1': 85.62252846197367,
  'Month 2': 86.59251760542142,
  'Month 3': 86.70920561577343,
  'Month 4': np.nan},
 {'$ Amount Due': 3448302.2801859295,
  'Month 0': 75.83697471065258,
  'Month 1': 83.6700011095642,
  'Month 2': 86.16217213969533,
  'Month 3': np.nan,
  'Month 4': np.nan},
 {'$ Amount Due': 3190042.0279137697,
  'Month 0': 76.69574360823212,
  'Month 1': 85.4625418697537,
  'Month 2': np.nan,
  'Month 3': np.nan,
  'Month 4': np.nan},
 {'$ Amount Due': 2614440.2956102462,
  'Month 0': 74.87175589142862,
  'Month 1': np.nan,
  'Month 2': np.nan,
  'Month 3': np.nan,
  'Month 4': np.nan}])

最佳答案

我的方法是:

months = df.iloc[:, 1:]                    # dataframe of months only
due_row = months.where(months.isna(), df['$ Amount Due'], axis=0)    # single due values
due_sum = due_row.sum()                    # summed due values

(months*due_row/due_sum).sum()             # sum of product and quotient like requested

#Month 0    78.823057
#Month 1    86.680023
#Month 2    88.573969
#Month 3    90.772494
#Month 4    95.469538
#dtype: float64

如果它应该作为最后一行附加到数据框:

df.loc['Aggregated', df.columns[1:]] = (months*due_row/due_sum).sum().values

#            $ Amount Due    Month 0     ...         Month 3     Month 4
#0           2.784531e+04  56.479873     ...      100.000000  100.000000
#1           3.105056e+05  78.348394     ...       81.434990   92.546735
#2           6.597052e+05  90.797189     ...       97.857687   97.857687
#3           1.433121e+06  91.720717     ...       94.858161   94.915447
#4           1.941036e+06  79.750296     ...       86.709206         NaN
#5           3.448302e+06  75.836975     ...             NaN         NaN
#6           3.190042e+06  76.695744     ...             NaN         NaN
#7           2.614440e+06  74.871756     ...             NaN         NaN
#Aggregated           NaN  78.823057     ...       90.772494   95.469538

补充编辑:

这段代码(有点)更短,IMO 更清晰,几乎可以 self 解释,而且速度更快:

timings:
AnanayMital :   0.0209
SpghttCd :      0.00538

关于python - 具有阶跃函数的聚合加权和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54143631/

相关文章:

python - 将( yield )分配给变量

python - 如果我的 docker 镜像有 2.5GB,我会遇到什么问题以及如何减小大小?

python - 为什么 Dask 似乎存储 Parquet 效率低下

python - 如何用饼图绘制分类变量

python - HTTP header 被 `urllib3.exceptions.HeaderParsingError: [MissingHeaderBodySeparatorDefect()], unparsed data` 切成两半

python - 导入类而不执行.py 它在?

python - 如何创建带有可滚动文本小部件的 GUI

python - 如何在 plotly 中使用聚合绘制箱线图?

python - 索引错误 : positional indexers are out-of-bounds stratify sklearn test_train_split

python - R-Python : Getting Monthly, 每周索引点