假设我有以下数据框
df.Consumption
0 16.208
1 11.193
2 9.845
3 9.348
4 9.091
...
19611 0.000
19612 0.000
19613 0.000
19614 0.000
19615 0.000
Name: Consumption, Length: 19616, dtype: float64
我想用前 10 个和下一个不为 0.00 的值的平均值替换 0 值
有什么好的方法吗?我正在考虑使用替换和插值方法,但我不知道如何有效地编写它
最佳答案
您可以使用Series.rolling()
与 center=True
以及 Rolling.mean()
一起使用获取前一个值和下一个值的平均值。
如果您想从平均值计算中排除 0
,请将 0
替换为 NaN
。
设置center=True
,以便滚动窗口查找上一个和下一个条目。
最后,使用 .loc
将值为 0
的条目设置为平均值,如下所示:
n = 10 # check previous and next 10 entries
# rolling window size is (2n + 1)
Consumption_mean = (df['Consumption'].replace(0, np.nan)
.rolling(n * 2 + 1, min_periods=1, center=True)
.mean())
df.loc[df['Consumption'] == 0, 'Consumption'] = Consumption_mean
演示
使用较小的窗口大小n = 3
来演示:
df
Consumption
0 16.208
1 11.193
2 9.845
3 9.348
4 9.091
5 8.010
6 0.000 <==== target entry
7 7.100
8 0.000 <==== target entry
9 6.800
10 6.500
11 6.300
12 5.900
13 5.800
14 5.600
#n = 10 # check previous and next 10 entries
n = 3 # smaller window size for demo
# rolling window size is (2n + 1)
Consumption_mean = (df['Consumption'].replace(0, np.nan)
.rolling(n * 2 + 1, min_periods=1, center=True)
.mean())
# Update into a new column `Consumption_New` for demo purpose
df['Consumption_New'] = df['Consumption']
df.loc[df['Consumption'] == 0, 'Consumption_New'] = Consumption_mean
演示结果:
print(df)
Consumption Consumption_New
0 16.208 16.2080
1 11.193 11.1930
2 9.845 9.8450
3 9.348 9.3480
4 9.091 9.0910
5 8.010 8.0100
6 0.000 8.0698 # 8.0698 = (9.348 + 9.091 + 8.01 + 7.1 + 6.8) / 5 with skipping 0.000 between 7.100 and 6.800
7 7.100 7.1000
8 0.000 6.9420 # 6.942 = (8.01 + 7.1 + 6.8 + 6.5 + 6.3) / 5 with skipping 0.000 between 8.010 and 7.100
9 6.800 6.8000
10 6.500 6.5000
11 6.300 6.3000
12 5.900 5.9000
13 5.800 5.8000
14 5.600 5.6000
关于python - 使用前 10 个值和下一个值之间的平均值替换 pandas 数据框中的特定值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69700382/