python - Pandas:如何用前一个非空值和下一个非空值的平均值填写 n/a

我的数据框中有一些 N/A 值

df = pd.DataFrame({'A':[1,1,1,3],
              'B':[1,1,1,3],
              'C':[1,np.nan,3,5],
              'D':[2,np.nan, np.nan, 6]})
print(df)

    A   B   C   D
0   1   1   1.0 2.0
1   1   1   NaN NaN
2   1   1   3.0 NaN
3   3   3   5.0 6.0

如何用其列中前一个非空值和下一个非空值的平均值填充 n/a 值？比如C列的第二个值应该填成(1+3)/2=2

期望的输出:

    A   B   C   D
0   1   1   1.0 2.0
1   1   1   2.0 4.0
2   1   1   3.0 4.0
3   3   3   5.0 6.0

谢谢!

最佳答案

使用ffill 和bfill 正向和反向填充替换NaN，然后concat和 groupby 按索引聚合 mean:

df1 = pd.concat([df.ffill(), df.bfill()]).groupby(level=0).mean()
print (df1)
   A  B    C    D
0  1  1  1.0  2.0
1  1  1  2.0  4.0
2  1  1  3.0  4.0
3  3  3  5.0  6.0

详细信息:

print (df.ffill())
   A  B    C    D
0  1  1  1.0  2.0
1  1  1  1.0  2.0
2  1  1  3.0  2.0
3  3  3  5.0  6.0

print (df.bfill())
   A  B    C    D
0  1  1  1.0  2.0
1  1  1  3.0  6.0
2  1  1  3.0  6.0
3  3  3  5.0  6.0

关于python - Pandas:如何用前一个非空值和下一个非空值的平均值填写 n/a，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46628892/

上一篇：python - 使用 pandas DataFrame 中的 loc 将 pandas.to_numeric 应用于选定的列子集

下一篇：python - fit() 得到了一个意外的关键字参数 'criterion'

相关文章：

mysql - 使用 RJDBC 在 R 中创建 JDBC 驱动程序

python - 重新采样/上采样周期索引并使用数据的两个极端时间 "edges"

python - Pandas DataFrame 子字符串匹配不起作用

python-3.x - Pandas 基于 groupby 创建百分位字段，级别为 1

machine-learning - 我如何将 bool 张量输入到 tf.cond() 而不仅仅是一个 bool 值？

python - 金字塔和统计模型 fit() 和 ARIMA() 之间的区别？

python - TensorFlow:执行此损失计算

python - 如何使用网格移动图像？

python - 为什么我的基本 PyGame 模块这么慢？

pandas - pyspark 的 pandas 中的 flatMap