python-3.x - python3 - pandas 确定事件发生是否具有统计显着性

我有一个大型数据集，如下所示。我想知道事件发生的时间与事件不发生的时间之间是否存在显着的统计差异。这里的假设是百分比变化越高越有意义/越好。

在另一个数据集中，“事件发生”列是“真、假、中性”。 (请忽略该索引，因为它是默认的 pandas 索引。)

   index    event occurs            percent change
    148       False                  11.27
    149        True                  14.56
    150       False                  10.35
    151       False                   6.07
    152       False                  21.14
    153       False                   7.26
    154       False                   7.07
    155       False                   5.37
    156        True                   2.75
    157       False                   7.12
    158       False                   7.24

当它是“真/假”或“真/假/中性”时，确定重要性的最佳方法是什么？

最佳答案

Load Packages, Set Globals, Make Data.

import scipy.stats as stats
import numpy as np

n = 60
stat_sig_thresh = 0.05

event_perc = pd.DataFrame({"event occurs": np.random.choice([True,False],n),
                          "percent change": [i*.1 for i in np.random.randint(1,1000,n)]})

Determine if Distribution is Normal

stat_sig = event_perc.groupby("event occurs").apply(lambda x: stats.normaltest(x))
stat_sig = pd.DataFrame(stat_sig)
stat_sig = pd.DataFrame(stat_sig[0].values.tolist(), index=stat_sig.index).reset_index()
stat_sig.loc[(stat_sig.pvalue <= stat_sig_thresh), "Normal"] = False
stat_sig["Normal"].fillna("True",inplace=True)

>>>stat_sig

    event occurs  statistic             pvalue                  Normal
0   False         [2.9171920993203915]  [0.23256255191146755]   True
1   True          [2.938332679486047]   [0.23011724484588764]   True

Determine Statistical Significance

normal = [bool(i) for i in stat_sig.Normal.unique().tolist()]

rvs1 = event_perc["percent change"][event_perc["event occurs"] == True]
rvs2 = event_perc["percent change"][event_perc["event occurs"] == False]

if (len(normal) == 1) & (normal[0] == True):
    print("the distributions are normal")
    if stats.ttest_ind(rvs1,rvs2).pvalue >= stat_sig_thresh:
        # we cannot reject the null hypothesis of identical average scores
        print("we can't say whether there is statistically significant difference")
    else:
        # we reject the null hypothesis of equal averages
        print("there is a statisically significant difference")

elif (len(normal) == 1) & (normal[0] == False):
    print("the distributions are not normal")
    if stats.wilcoxon(rvs1,rvs2).pvalue >= stat_sig_thresh:
        # we cannot reject the null hypothesis of identical average scores
        print("we can't say whether there is statistically significant difference")
    else:
        # we reject the null hypothesis of equal averages
        print("there is a statisically significant difference")
else:
    print("samples are drawn from different distributions")

the distributions are normal
we can't say whether there is statistically significant difference

关于python-3.x - python3 - pandas 确定事件发生是否具有统计显着性，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58770711/

python-3.x - python3 - pandas 确定事件发生是否具有统计显着性

上一篇：azure - 基于计时器的 Azure 功能，具有表存储、HTTP 请求和 Azure 服务总线

下一篇：sql - Typeorm - 通过多对多关系查找条目