python - 平滑离散数据集

我正在尝试平滑该数据集并生成带有误差线的单个代表性曲线。获取数据点的方法通过相当粗略的步骤进行离散化。我没有太多的编程经验，但正在努力学习。我读到高斯滤波器可能是一个不错的选择。任何帮助，将不胜感激。 Optical dilatometer data for shrinkage of a ceramic pellet

这是一个示例数据集:

Time (min)  Non-Normalized Shrinkage    Normalized Shrinkage
200 93  1.021978022
202 92  1.010989011
204 92  1.010989011
206 92  1.010989011
208 92  1.010989011
210 92  1.010989011
212 91  1
214 90  0.989010989
216 90  0.989010989
218 90  0.989010989
220 88  0.967032967
222 88  0.967032967
224 87  0.956043956
226 86  0.945054945
228 86  0.945054945
230 86  0.945054945
232 86  0.945054945
234 86  0.945054945
236 85  0.934065934
238 84  0.923076923
240 83  0.912087912
242 83  0.912087912
244 83  0.912087912
246 82  0.901098901
248 83  0.912087912
250 82  0.901098901
252 81  0.89010989
254 81  0.89010989
256 82  0.901098901
258 82  0.901098901
260 79  0.868131868
262 80  0.879120879
264 80  0.879120879

我在网上找到了这个代码片段，但我不知道如何实现它，也不知道它是否是我正在寻找的。

def smoothListGaussian(list,degree=5):  

window=degree*2-1  

weight=numpy.array([1.0]*window)  

weightGauss=[]  

for i in range(window):  

    i=i-degree+1  

    frac=i/float(window)  

    gauss=1/(numpy.exp((4*(frac))**2))  

    weightGauss.append(gauss)  

weight=numpy.array(weightGauss)*weight  

smoothed=[0.0]*(len(list)-window)  

for i in range(len(smoothed)):  

    smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight)  

return smoothed

最佳答案

通常，您会使用库来实现此目的，而不是自己实现。

我将使用scipy.ndimage来代替scipy.signal。如果您有信号处理类，您可能会发现 scipy.signal 方法更直观，但如果没有，它可能会显得令人困惑。 scipy.ndimage 提供了一种直接的单函数调用 gaussian_filter，而不必了解更多的信号处理约定。

这是一个简单的示例，使用您在问题中发布的数据。这假设您的数据是定期采样的(即:每 2 个时间单位)。

import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage

time, _, shrinkage = np.loadtxt('discrete_data.txt', skiprows=1).T

fig, ax = plt.subplots()
ax.plot(time, shrinkage, 'ro')
ax.plot(time, scipy.ndimage.gaussian_filter(shrinkage, 3))
plt.show()

enter image description here

其中大部分内容相当简单，但您可能会注意到我在 scipy.ndimage.gaussian_filter(shrinkage, 3)< 中指定的 3 的“神奇”值。这是样本中高斯函数的 sigma 参数。由于您的数据每 2 个单位时间采样一次，因此 sigma 为 6 个单位。

sigma 参数与“钟形曲线”正态分布中的标准差完全相似。它越大，高斯函数就越宽，曲线就越平滑。通过反复试验，值 3 似乎适合此特定数据集，但您应该进行试验并查看您认为最好的值。

最后一点:有很多不同的方法可以解决这个问题。高斯滤波器是一个合理的解决方案，但还有很多很多其他解决方案。如果确切的结果非常重要，您可能应该比较几种方法，看看哪种方法最适合您的特定数据集。

在您的评论中，您询问将平滑数据保存到文件而不是绘制它。以下是您可以采取的一种方法的简单示例:

import numpy as np
import scipy.ndimage

time, _, shrinkage = np.loadtxt('discrete_data.txt', skiprows=1).T
smoothed = scipy.ndimage.gaussian_filter(shrinkage, 3)

np.savetxt('smoothed_data.txt', np.c_[time, smoothed])

关于python - 平滑离散数据集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27734719/

python - 平滑离散数据集

上一篇：python - 如何从 scrapy 运行中获取统计信息？

下一篇：python - 如何更改ttk.Notebook的标签