python - 如何根据中心极限定理绘制正态分布曲线

我正在尝试沿着我的中央极限数据分布获得一条正态分布曲线。

下面是我尝试过的实现。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import math

# 1000 simulations of die roll
n = 10000

avg = []
for i in range(1,n):#roll dice 10 times for n times
    a = np.random.randint(1,7,10)#roll dice 10 times from 1 to 6 & capturing each event
    avg.append(np.average(a))#find average of those 10 times each time

plt.hist(avg[0:])

zscore = stats.zscore(avg[0:])

mu, sigma = np.mean(avg), np.std(avg)
s = np.random.normal(mu, sigma, 10000)

# Create the bins and histogram
count, bins, ignored = plt.hist(s, 20, normed=True)

# Plot the distribution curve
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *np.exp( - (bins - mu)**2 / (2 * sigma**2)))

我得到下图，

您可以在底部看到红色的正态曲线。

谁能告诉我为什么曲线不合适？

最佳答案

你几乎成功了!首先，请注意您正在同一轴上绘制两个直方图:

plt.hist(avg[0:])

和

plt.hist(s, 20, normed=True)

为了可以在直方图上绘制正态密度，您使用 normed=True 参数对第二个图进行了正确的归一化。但是，您也忘记了对第一个直方图进行归一化 (plt.hist(avg[0:]), normed=True)。

我还建议，因为您已经导入了 scipy.stats，所以您也可以使用该模块中的正态分布，而不是自己编写 pdf。

将这些放在一起我们有:

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# 1000 simulations of die roll
n = 10000

avg = []
for i in range(1,n):
    a = np.random.randint(1,7,10)
    avg.append(np.average(a))

# CHANGED: normalise this histogram too
plt.hist(avg[0:], 20, normed=True)

zscore = stats.zscore(avg[0:])

mu, sigma = np.mean(avg), np.std(avg)
s = np.random.normal(mu, sigma, 10000)

# Create the bins and histogram
count, bins, ignored = plt.hist(s, 20, normed=True)

# Use scipy.stats implementation of the normal pdf
# Plot the distribution curve
x = np.linspace(1.5, 5.5, num=100)
plt.plot(x, stats.norm.pdf(x, mu, sigma))

这给了我以下情节:

编辑

在您询问的评论中:

我是如何在np.linspace中选择1.5和5.5的
是否可以在非标准化直方图上绘制正常内核？

解决问题 1。首先，我通过眼睛选择了 1.5 和 5.5。绘制直方图后，我发现直方图 bin 看起来介于 1.5 和 5.5 之间，因此这是我们想要绘制正态分布的范围。

选择此范围的更具编程性的方式是:

x = np.linspace(bins.min(), bins.max(), num=100)

关于问题2，是的，我们可以达到你想要的。但是，您应该知道，我们根本不再绘制概率密度函数。

在绘制直方图时删除 normed=True 参数后:

x = np.linspace(bins.min(), bins.max(), num=100)

# Find pdf of normal kernel at mu
max_density = stats.norm.pdf(mu, mu, sigma)
# Calculate how to scale pdf
scale = count.max() / max_density

plt.plot(x, scale * stats.norm.pdf(x, mu, sigma))

这给了我以下情节:

关于python - 如何根据中心极限定理绘制正态分布曲线，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55096595/

python - 如何根据中心极限定理绘制正态分布曲线

编辑

上一篇：python - 使用 etree 从所有元素中删除所有数据属性

下一篇：python - 使用Python在Zapier中获取图像