python-3.x - python3 - pandas 确定事件发生是否具有统计显着性

标签 python-3.x pandas

我有一个大型数据集,如下所示。我想知道事件发生的时间与事件不发生的时间之间是否存在显着的统计差异。这里的假设是百分比变化越高越有意义/越好。

在另一个数据集中,“事件发生”列是“真、假、中性”。 (请忽略该索引,因为它是默认的 pandas 索引。)

   index    event occurs            percent change
    148       False                  11.27
    149        True                  14.56
    150       False                  10.35
    151       False                   6.07
    152       False                  21.14
    153       False                   7.26
    154       False                   7.07
    155       False                   5.37
    156        True                   2.75
    157       False                   7.12
    158       False                   7.24

当它是“真/假”或“真/假/中性”时,确定重要性的最佳方法是什么?

最佳答案

Load Packages, Set Globals, Make Data.

import scipy.stats as stats
import numpy as np

n = 60
stat_sig_thresh = 0.05

event_perc = pd.DataFrame({"event occurs": np.random.choice([True,False],n),
                          "percent change": [i*.1 for i in np.random.randint(1,1000,n)]})

Determine if Distribution is Normal

stat_sig = event_perc.groupby("event occurs").apply(lambda x: stats.normaltest(x))
stat_sig = pd.DataFrame(stat_sig)
stat_sig = pd.DataFrame(stat_sig[0].values.tolist(), index=stat_sig.index).reset_index()
stat_sig.loc[(stat_sig.pvalue <= stat_sig_thresh), "Normal"] = False
stat_sig["Normal"].fillna("True",inplace=True)

>>>stat_sig

    event occurs  statistic             pvalue                  Normal
0   False         [2.9171920993203915]  [0.23256255191146755]   True
1   True          [2.938332679486047]   [0.23011724484588764]   True

Determine Statistical Significance

normal = [bool(i) for i in stat_sig.Normal.unique().tolist()]

rvs1 = event_perc["percent change"][event_perc["event occurs"] == True]
rvs2 = event_perc["percent change"][event_perc["event occurs"] == False]

if (len(normal) == 1) & (normal[0] == True):
    print("the distributions are normal")
    if stats.ttest_ind(rvs1,rvs2).pvalue >= stat_sig_thresh:
        # we cannot reject the null hypothesis of identical average scores
        print("we can't say whether there is statistically significant difference")
    else:
        # we reject the null hypothesis of equal averages
        print("there is a statisically significant difference")

elif (len(normal) == 1) & (normal[0] == False):
    print("the distributions are not normal")
    if stats.wilcoxon(rvs1,rvs2).pvalue >= stat_sig_thresh:
        # we cannot reject the null hypothesis of identical average scores
        print("we can't say whether there is statistically significant difference")
    else:
        # we reject the null hypothesis of equal averages
        print("there is a statisically significant difference")
else:
    print("samples are drawn from different distributions")

the distributions are normal
we can't say whether there is statistically significant difference

关于python-3.x - python3 - pandas 确定事件发生是否具有统计显着性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58770711/

相关文章:

python - 如何消除 CSV 文件中的空白?

python - 您可以从 Windows 符号链接(symbolic link)导入 Python 模块吗?

python - 根据其他键值对填充空白字典值

Python 3 : Getting information from list in list

python-3.x - tensorflow : undefined symbol :cudnnSetRNNDescriptor_v6

python - 向尚无后缀的列名称添加后缀

python - Pandas 数据框默认使用 .loc

python - 将 pandas DataFrame 制作成 dict 和 dropna

python-3.x - 确定 Pandas 数组中的唯一用法

python - 基于日期的切片 Pandas Dataframe