python - scipy.stats 中 ttest 的两种实现的不同结果

这里有两种在 scipy 中进行独立 t 检验(welch 版本)的方法。两者对计算的 p 值以及 t 统计量本身给出了不同的结果？为什么会这样？

import scipy
print(scipy.__version__)
# 1.4.1
from scipy.stats import ttest_ind, ttest_ind_from_stats
x1_a = [19.0924, 19.1055, 19.1192, 19.1431, 19.0970]
x1_b = [20.3323, 20.3472, 20.3417, 20.3408, 20.2849]
x1_c = [19.0448, 18.9576, 19.0171, 19.0184, 18.9534]
ttest_ind(x1_a, x1_c, equal_var=False)
# Ttest_indResult(statistic=5.568858312509857, pvalue=0.0014998806395224108)
ttest_ind_from_stats(np.mean(x1_a), np.std(x1_a), 5, np.mean(x1_c), np.std(x1_c), 5, equal_var=False)
# Ttest_indResult(statistic=6.226172871918404, pvalue=0.000844418100098984)
ttest_ind(x1_a, x1_b, equal_var=False)
# Ttest_indResult(statistic=-83.49461195258749, pvalue=1.3516515130741807e-12)
ttest_ind_from_stats(np.mean(x1_a), np.std(x1_a), 5, np.mean(x1_b), np.std(x1_b), 5, equal_var=False)
# Ttest_indResult(statistic=-93.34981404047603, pvalue=5.764760941006529e-13)

我试图排除可能的原因，包括通过输入np.sqrt(np.var(x))而不是np.std(x)来检查可能的舍入问题))，使用维基百科解释编写自定义测试函数，该函数给出类似于 ttest_ind_from_stats 的结果，尝试多个值，手动计算 sds 以避免 n-1 code>/n 问题并尝试阅读源代码文档，但似乎 ttest_ind 在内部使用 _ttest_ind_from_stats 这引起了我的困惑。这是我的自定义函数:

from scipy.stats import t as tdist
def welch_ttest(m1, m2, s1, s2, n1, n2):
    numerator = m1 - m2
    denominator = np.sqrt((s1 ** 2) / n1 + (s2 ** 2) / n2)
    t = numerator / denominator
    dof_numerator = ((s1 ** 2) / n1 + (s2 ** 2) / n2) ** 2
    dof_denominator = ((s1 ** 4) / (n1 ** 2) / (n1 - 1) + (s2 ** 4) / (n2 ** 2) / (n2 - 1))
    dof = dof_numerator / dof_denominator
    p_half = tdist.cdf(t, dof)
    if p_half > 0.5:
        p_final = 2 * (1 - p_half)
    else:
        p_final = 2 * p_half
    return t, p_final  # returning t to check the validity of the function

最佳答案

np.std 不执行 Bessel's correction 。如果替换为 pandas 版本的 std，则结果匹配:

ttest_ind(x1_a, x1_c, equal_var=False)                                                             
# Ttest_indResult(statistic=5.568858312509857, pvalue=0.0014998806395224108)

ttest_ind_from_stats(np.mean(x1_a), pd.Series(x1_a).std(), 5, np.mean(x1_c), pd.Series(x1_c).std(), 5, equal_var=False)                                                                               
# Ttest_indResult(statistic=5.568858312509857, pvalue=0.0014998806395224108)

或者，如果您不需要额外的导入，只需将 std 乘以 sqrt(n/n-1)

关于python - scipy.stats 中 ttest 的两种实现的不同结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64199528/

python - scipy.stats 中 ttest 的两种实现的不同结果

上一篇：ios - swift/iOS : issue HTTP GET request without following redirects

下一篇：python - 在 django 模型中使用 Greatest 函数时如何获取注释中相关对象的列表