python - 我创建了一个类来在引导后返回置信区间,但我的置信区间看起来非常窄。我做错了什么?

标签 python statistics confidence-interval resampling

我的目的是让代码对给定列表执行引导(统计) 样本大小等于列表长度 10,000 倍,然后计算 95% 置信区间。

import numpy
from random import choice

class bootstrapping(object):

    def __init__(self,bslist=[],iteration=10000):
        self.bslist = bslist
        self.iteration = iteration

    def CI(self):
        listofmeans = []

        for numbers in range(0,self.iteration):
            bootstraplist = [choice(self.bslist) for _ in range(len(self.bslist))]
            listofmeans.append(sum(bootstraplist) / len(bootstraplist))

        s = numpy.std(listofmeans)
        z = 1.96
        n = self.iteration**0.5

        lower_confidence = (sum(listofmeans) / len(listofmeans)) - (z*s/n)
        upper_confidence = (sum(listofmeans) / len(listofmeans)) + (z*s/n)

        return lower_confidence,upper_confidence

test = bootstrapping([60,33,102,53,63,33,42,19,31,86,15,50,
                      45,47,26,23,30,20,18,48,22,20,17,29,43,52,29],10000)
test.CI()

我得到的置信区间(37.897427638499948, 38.102572361500052)是 奇怪地狭窄。当我在 Minitab 中输入相同的数字列表时,95% 我得到的置信区间是(30.74,47.48)。我是不是做错了什么?

最佳答案

要找到 95% 的置信区间,令 z = 1.96 (大约)并计算平均值、正负 z*std 的区间哪里std是标准差。换句话说,使用 z*std不是z*std/n :

import numpy as np
import random
random.seed(2017)

class Bootstrapping(object):

    def __init__(self,bslist=[],iteration=10000):
        self.bslist = bslist
        self.iteration = iteration

    def CI(self):
        listofmeans = []

        for numbers in range(0,self.iteration):
            bootstraplist = [random.choice(self.bslist) for _ in range(len(self.bslist))]
            mean = sum(bootstraplist) / len(bootstraplist)
            listofmeans.append(mean)

        mean = np.mean(listofmeans, axis=0)
        std = np.std(listofmeans, axis=0)
        z = 1.96
        err = z*std
        lower_confidence = mean - err
        upper_confidence = mean + err

        return lower_confidence, upper_confidence

test = Bootstrapping([60,33,102,53,63,33,42,19,31,86,15,50,
                      45,47,26,23,30,20,18,48,22,20,17,29,43,52,29],10000)
print(test.CI())

产量

(31.309540089458281, 46.876348799430602)

或者,您可以计算置信区间,而无需求助于平均值 +/-1.96*std 公式。您可以通过排序 listofmeans 来获得置信区间的经验估计。并找到第 5 个和第 95 个百分位数的值:

import random
random.seed(2017)

class Bootstrapping(object):

    def __init__(self,bslist=[],iteration=10000):
        self.bslist = bslist
        self.iteration = iteration

    def CI(self):
        listofmeans = []

        for numbers in range(0,self.iteration):
            bootstraplist = [random.choice(self.bslist) for _ in range(len(self.bslist))]
            mean = sum(bootstraplist) / len(bootstraplist)
            listofmeans.append(mean)

        listofmeans = sorted(listofmeans)    
        a, b = round(self.iteration*0.05), round(self.iteration*0.95)
        lower_confidence = listofmeans[a]
        upper_confidence = listofmeans[b]

        return lower_confidence, upper_confidence

test = Bootstrapping([60,33,102,53,63,33,42,19,31,86,15,50,
                      45,47,26,23,30,20,18,48,22,20,17,29,43,52,29],10000)
print(test.CI())

产量

(32.888888888888886, 45.888888888888886)

关于python - 我创建了一个类来在引导后返回置信区间,但我的置信区间看起来非常窄。我做错了什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41989866/

相关文章:

python - 如何在 Pandas 中标记具有多个条件的列?

R,在 R 中运行实验 100000 次

python - 计算由另一列分组的列的 z 分数

python - Sklearn线性回归拟合输入顺序?外生变量先行吗?

r - 计算均值的 95% 置信区间

Python 类型提示 : how to tell X is a subclass for Foo?

python - 无法在 Python 中中止 (ctrl+c)

python - 带有评论的空类与通过相同?

r - 如何在 R 中引导线性回归并估计置信区间?

graph - 在 gnuplot 中绘制第 n 行的平均值