python - 在 Pandas 中 : Interpolate between two rows such that the sum of interpolated values + second row = old value of the second row

我发现很难清楚地表达这个问题，但它与 this one 是一个类似的问题。 .

例如，假设我有一个这样的表:

<表类=“s-表”> <标题> 标签值 <正文> 一个 1 B 2 C 南 D 南 E 南 F 12

它需要看起来像这样:

<表类=“s-表”> <标题> 标签值 <正文> 一个 1 B 2 C 3 D 3 E 3 F 3

使用下一个可用值 (12) 并将其除以 Nan 值的数量 + 1 (12/4) = 3，并将 Nan 值 AND 用于插值 (12) 的原始值替换为 3。它是类似的到上一个问题，但也修改了用于插值的原始值。

test = pd.DataFrame({'Label': ['A', 'B', 'C', 'D', 'E', 'F','G','H','I'],
                     'Value': [1, 2, None, None, None, 12,None,None,4]})

test['break'] = np.where(test['Value'].notnull(),1,0)
test['group'] = test['break'].shift().fillna(0).cumsum()
test['Value2'] = test.groupby('group').Value.apply(lambda x: x.fillna( x.max() / len(x)))

for row in range(0,test.shape[0]):
    if test['break'].iloc[row] == 0 and test['break'].iloc[row+1] == 1:
        test.at[row+1, 'Value2'] = test['Value2'].iloc[row]

df.interpolate() 无法做到这一点，这就是我到目前为止所拥有的。它完成了工作，但不是很优雅

最佳答案

也许是这样的？

test = pd.DataFrame({'Label': ['A', 'B', 'C', 'D', 'E', 'F','G','H','I'],
                     'Value': [1, 2, None, None, None, 12,None,None,4]})

tr = test.assign(
    g=(~test['Value'].isna())[::-1].cumsum()
).groupby('g')['Value'].transform

df = test.assign(Value=tr('last') / tr('size'))

>>> df
  Label     Value
0     A  1.000000
1     B  2.000000
2     C  3.000000
3     D  3.000000
4     E  3.000000
5     F  3.000000
6     G  1.333333
7     H  1.333333
8     I  1.333333

说明

使用.assign(g=...)，我们创建以非 NaN 值结尾且前面有零个或多个 NaN 的值组:

>>> test.assign(
...     g=(~test['Value'].isna())[::-1].cumsum()
... )

  Label  Value  g
0     A    1.0  4
1     B    2.0  3
2     C    NaN  2
3     D    NaN  2
4     E    NaN  2
5     F   12.0  2
6     G    NaN  1
7     H    NaN  1
8     I    4.0  1

然后，我们使用 .groupby('g')['Values'].transform 两次:获取 last() 值，并将其除以组的 size()。

关于python - 在 Pandas 中 : Interpolate between two rows such that the sum of interpolated values + second row = old value of the second row，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/74734536/

python - 在 Pandas 中 : Interpolate between two rows such that the sum of interpolated values + second row = old value of the second row

说明

上一篇：html - 网站页脚中的四开安装版本

下一篇：rust - 如何从 std::cmp::Reverse::<T> 获取 T 值