Python 拟合任意高斯函数 - 拟合不佳?

标签 python curve-fitting scipy-optimize

我试图概括一些代码,以便能够在单个数据集中拟合多个(n 从 1 到 >10)高斯曲线/峰值。

使用 Scipy Optimize Curve_fit 当我对 1-3 个高斯函数进行硬编码时,我可以获得非常好的拟合结果,并且我已经成功地生成了对于泛化的、任意数量的高斯函数运行没有错误的函数。然而,输出拟合度非常差。尽管给出的输入参数与用于生成“原始”数据的参数相同 - 即最好的情况。

此外,特定函数在某个时刻可能需要从简单的高斯函数进行修改的可能性不为零,但目前应该没问题。

下面是我的代码示例,输出图如下所示。

import numpy as np
import pandas as pd
import scipy 
import scipy.optimize
import matplotlib.pyplot as plt
from matplotlib import gridspec

amp1 = 1
cen1 = 1
sigma1 = 0.05

df=pd.DataFrame(index=np.linspace(0,10,num=1000),columns=['int'])

def _ngaussian(x, amps,cens,sigmas):
    fn = 0
    if len(amps)== len(cens)== len(sigmas):
        for i in range(len(amps)):
            fn = fn+amps[i]*(1/(sigmas[i]*(np.sqrt(2*np.pi))))*\
            (np.exp((-1.0/2.0)*(((x-cens[i])/sigmas[i])**2)))
    else:
        print('Your inputs have unequal lengths')
    return fn



amps = [1,1.1,0.9]
cens = [1,2,1.7]
sigmas=[0.05]*3

popt_peaks = [amps,cens,sigmas]
df['peaks'] = _ngaussian(df.index, *popt_peaks)

# Optionally adding noise to the raw data
#noise = np.random.normal(0,0.1,len(df['peaks'])) 
#df['peaks'] = df['peaks']+noise

def wrapper_fit_func(x, *args):
    N = len(args)
    a, b, c = list(args[0][:N]),list(args[0][N:N*2]),list(args[0][2*N:3*N])
    return _ngaussian(x, a, b, c)

def unwrapper_fit_func(x, *args):
    N = int(len(args)/3)
    a, b, c = list(args[:N]),list(args[N:N*2]),list(args[2*N:3*N])
    return _ngaussian(x, a, b, c)

popt_fitpeaks, pcov_fitpeaks = scipy.optimize.curve_fit(lambda x, *popt_peaks: wrapper_fit_func(x, popt_peaks), 
                       df.index, df['peaks'], p0=popt_peaks,
                       method='lm')


df['peaks_fit'] = unwrapper_fit_func(df.index, *popt_fitpeaks)


fig = plt.figure(figsize=(8,8))
gs = gridspec.GridSpec(1,1)
ax1 = fig.add_subplot(gs[0])
ax1.set_xlim(0,3)
ax1.plot(df.index, df['peaks'], "b",label='ideal data')
ax1.plot(df.index, df['peaks_fit'], "g",label='fit data')
ax1.legend(loc='upper right')

Bad Curve Fitting of Three Gaussians

如果您感兴趣,可以了解分析化学、核磁共振 (NMR) 和傅里叶变换 ionic 回旋共振质谱 (FTICR MS) 信号处理。

最佳答案

您可能会发现 lmfit ( https://lmfit.github.io/lmfit-py/ ,披露:我是主要作者)对此很有用。它提供了一个易于使用的 Model 类来对数据进行建模,包括高斯、Voigt 和类似线形的内置模型,可以轻松比较模型函数。

可以添加(或相乘)Lmfit 模型来制作复合模型,从而可以轻松支持 1、2、3 等高斯模型,并包含不同的基线函数。上面的链接中有文档和几个示例。对示例进行小幅重写(包括添加一些噪音)可能如下所示:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from lmfit.models import GaussianModel

amp1 = 1
cen1 = 1
sigma1 = 0.05

df=pd.DataFrame(index=np.linspace(0,10,num=1000),columns=['int'])

def _ngaussian(x, amps,cens,sigmas):
    fn = 0
    if len(amps)== len(cens)== len(sigmas):
        for i in range(len(amps)):
            fn = fn+amps[i]*(1/(sigmas[i]*(np.sqrt(2*np.pi))))*\
            (np.exp((-1.0/2.0)*(((x-cens[i])/sigmas[i])**2)))
            fn = fn+np.random.normal(size=len(x), scale=0.05)
    else:
        print('Your inputs have unequal lengths')
    return fn

amps = [1.30, 0.92, 2.11]
cens = [1.10, 1.73, 2.06]
sigmas=[0.05, 0.09, 0.07]

popt_peaks = [amps,cens,sigmas]
df['peaks'] = _ngaussian(df.index, *popt_peaks)

# create a model with 3 Gaussians: pretty easy to generalize
# to a loop to make N peaks
model = (GaussianModel(prefix='p1_') +
         GaussianModel(prefix='p2_') +
         GaussianModel(prefix='p3_') )

# create Parameters (named from function arguments). For
# Gaussian, Lorentzian, Voigt, etc these are "center", "amplitude", "sigma"
params = model.make_params(p1_center=1.0, p1_amplitude=2, p1_sigma=0.1,
                           p2_center=1.5, p2_amplitude=2, p2_sigma=0.1,
                           p3_center=2.0, p3_amplitude=2, p3_sigma=0.1)

# Parameters can have min/max bounds, be fixed (`.vary = False`)
# or constrained to a mathematical expression of other Parameter values
params['p1_center'].min = 0.8
params['p1_center'].max = 1.5

params['p2_center'].min = 1.1
params['p2_center'].max = 1.9

params['p3_center'].min = 1.88
params['p3_center'].max = 3.00

# run the fit
result = model.fit(df['peaks'], params, x=df.index)

# print out the fit results
print(result.fit_report())

# plot results
plt.plot(df.index, df['peaks'],     'o', label='data')
plt.plot(df.index, result.best_fit, '-', label='fit')
plt.legend()
plt.gca().set_xlim(0, 3)
plt.show()

这将产生如下所示的拟合图: enter image description here

并打印出报告

[[Model]]
    ((Model(gaussian, prefix='p1_') + Model(gaussian, prefix='p2_')) + Model(gaussian, prefix='p3_'))
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 102
    # data points      = 1000
    # variables        = 9
    chi-square         = 6.88439024
    reduced chi-square = 0.00694691
    Akaike info crit   = -4960.49871
    Bayesian info crit = -4916.32892
[[Variables]]
    p1_amplitude:  1.29432022 +/- 0.00428720 (0.33%) (init = 2)
    p1_center:     1.09993745 +/- 1.9012e-04 (0.02%) (init = 1)
    p1_sigma:      0.04970776 +/- 1.9012e-04 (0.38%) (init = 0.1)
    p2_amplitude:  0.91875183 +/- 0.00604913 (0.66%) (init = 2)
    p2_center:     1.73039597 +/- 6.7594e-04 (0.04%) (init = 1.5)
    p2_sigma:      0.09054027 +/- 7.0994e-04 (0.78%) (init = 0.1)
    p3_amplitude:  2.10077395 +/- 0.00533617 (0.25%) (init = 2)
    p3_center:     2.06019332 +/- 2.0105e-04 (0.01%) (init = 2)
    p3_sigma:      0.06970239 +/- 2.0752e-04 (0.30%) (init = 0.1)
    p1_fwhm:       0.11705282 +/- 4.4770e-04 (0.38%) == '2.3548200*p1_sigma'
    p1_height:     10.3878975 +/- 0.03440799 (0.33%) == '0.3989423*p1_amplitude/max(2.220446049250313e-16, p1_sigma)'
    p2_fwhm:       0.21320604 +/- 0.00167179 (0.78%) == '2.3548200*p2_sigma'
    p2_height:     4.04824243 +/- 0.02582408 (0.64%) == '0.3989423*p2_amplitude/max(2.220446049250313e-16, p2_sigma)'
    p3_fwhm:       0.16413657 +/- 4.8866e-04 (0.30%) == '2.3548200*p3_sigma'
    p3_height:     12.0238006 +/- 0.02922330 (0.24%) == '0.3989423*p3_amplitude/max(2.220446049250313e-16, p3_sigma)'
[[Correlations]] (unreported correlations are < 0.100)
    C(p3_amplitude, p3_sigma)     =  0.622
    C(p2_amplitude, p2_sigma)     =  0.621
    C(p1_amplitude, p1_sigma)     =  0.577
    C(p2_sigma, p3_sigma)         = -0.299
    C(p2_sigma, p3_amplitude)     = -0.271
    C(p2_amplitude, p3_sigma)     = -0.239
    C(p2_sigma, p3_center)        =  0.226
    C(p2_amplitude, p3_amplitude) = -0.210
    C(p2_center, p3_sigma)        = -0.192
    C(p2_amplitude, p3_center)    =  0.171
    C(p2_center, p3_amplitude)    = -0.160
    C(p2_center, p3_center)       =  0.126

关于Python 拟合任意高斯函数 - 拟合不佳?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58874920/

相关文章:

Python 导入没有 "from"

optimization - SLSQP 优化算法是如何工作的?

python - 如何优化参数取决于自变量的函数的拟合

gnuplot - 在 gnuplot 中尝试曲线拟合时,错误消息 "w = 0 in Givens();"是什么意思?

python - 我可以使用 PyOpenCL 与 Scipy 集成来与 GPU 并行执行差分进化吗?

python - 正在寻找可能更好的方法来使用 Glom 获取嵌套数据?

Python 分析方法

python - Pandas :如何计算考虑到以前记录的变化数量

python - Scipy 优化 Curve_fit 边界错误

python - 曲线拟合为指数