python - 使用三次样条进行曲线拟合

我正在尝试插入例如的累积分布i) 人数 ii) 拥有的汽车数量，表明例如前 20% 的人拥有超过 20% 的汽车 - 当然，100% 的人拥有 100% 的汽车。我也知道有例如1 亿人和 2 亿辆汽车。

现在来看我的代码:

#import libraries (more than required here)
import pandas as pd
from scipy import interpolate
from scipy.interpolate import interp1d
from sympy import symbols, solve, Eq
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
%matplotlib inline
import plotly.express as px
from scipy import interpolate

curve=pd.read_excel('inputs.xlsx',sheet_name='inputdata')

输入数据:Curveplot (左侧累计人数 (x)//右侧累计汽车 (y))

#Input data in list form (I am not sure how to interpolate from a list for the moment)
cumulatedpeople = [0, 0.453086, 0.772334, 0.950475, 0.978981, 0.999876, 0.999990, 1]
cumulatedcars= [0, 0.016356, 0.126713, 0.410482, 0.554976, 0.950073, 0.984913, 1]

x, y = points[:,0], points[:,1]
interpolation = interp1d(x, y, kind = 'cubic')

number_of_people_mn= 100000000

oneperson = 1 / number_of_people_mn
dataset = pd.DataFrame(range(number_of_people_mn + 1))
dataset.columns = ["nr_of_one_person"]
dataset.drop(dataset.index[:1], inplace=True)

#calculating the position of every single person on the cumulated x-axis (between 0 and 1)
dataset["cumulatedpeople"] = dataset["nr_of_one_person"] / number_of_people_mn

#finding the "cumulatedcars" to the "cumulatedpeople" via interpolation (between 0 and 1)
dataset["cumulatedcars"] = interpolation(dataset["cumulatedpeople"])

plt.plot(dataset["cumulatedpeople"], dataset["cumulatedcars"])
plt.legend(['Cubic interpolation'], loc = 'best')
plt.xlabel('Cumulated people')
plt.ylabel('Cumulated cars')
plt.title("People-to-car cumulated curve")
plt.show()

但是，当查看实际绘图时，我得到以下错误结果:Cubic interpolation

事实上，该曲线应该看起来几乎像具有完全相同的输入数据的线性插值曲线 - 但这对于我的目的来说不够准确:Linear interpolation

我是否遗漏了任何相关步骤，或者从几乎看起来像线性插值的输入中获得准确插值的最佳方法是什么？

最佳答案

简短回答:您的代码正在做正确的事情，但数据不适合三次插值。

让我解释一下。为了清楚起见，这是我简化的代码

from scipy.interpolate import interp1d
from matplotlib import pyplot as plt

cumulatedpeople = [0, 0.453086, 0.772334, 0.950475, 0.978981, 0.999876, 0.999990, 1]
cumulatedcars= [0, 0.016356, 0.126713, 0.410482, 0.554976, 0.950073, 0.984913, 1]
interpolation = interp1d(cumulatedpeople, cumulatedcars, kind = 'cubic')

number_of_people_mn= 100#000000
cumppl = np.arange(number_of_people_mn + 1)/number_of_people_mn
cumcars = interpolation(cumppl)
plt.plot(cumppl, cumcars)
plt.plot(cumulatedpeople, cumulatedcars,'o')
plt.show()

注意最后几行——我在同一张图表上绘制了插值结果和输入日期。这是结果

橙色点是原始数据，蓝色线是三次插值。插值器遍历所有点，因此从技术上讲，它正在做正确的事情

显然它没有做你想要的事情

这种奇怪行为的原因主要是在右端，其中有几个非常靠近的 x 点 - 插值器会产生巨大的摆动，试图适应间隔非常近的点。

如果我从插值器中删除两个最右边的点:

interpolation = interp1d(cumulatedpeople[:-2], cumulatedcars[:-2], kind = 'cubic')

看起来更合理一些:

但仍有人认为线性插值更好。现在左端出现摆动，因为初始 x 点之间的间隙太大

这里的寓意是，只有当 x 点之间的间隙大致相同时才应该使用三次插值

我认为，你最好的选择是使用类似 curve_fit 的东西。

python - 使用三次样条进行曲线拟合

上一篇：python - 使用 BeautifulSoup 删除 Python 中不需要的标签

下一篇：.net-core - NSubstitute:检查 DidNotReceive 异步方法