python - 使用 statsmodel 计算 Pandas 时间序列中单个比例的 95% 置信区间

我有一个时间序列数据框:

df = pd.DataFrame({'year':['2010','2011','2012','2013','2014','2015','2016','2017','2018','2019'],
                       'total_count': [545,779,706,547,626,530,766,1235,1260,947], 
                       'rand_count':[96,184,148,154,160,149,124,274,322,301],
                       'rand_perc':[17.61,23.62,20.96,28.15,25.56,28.11,16.19,22.19,25.56,31.78]
                       })

这里；

df['rand_perc'] = (df['rand_count']/df['total_count'])*100

问题:

我想计算 df< 的每一行中 df['rand_count'] 与 df['total_count'] 的单个比例的置信区间 并绘制 df['year'] 与 df['rand_perc'] 的对比图，并以 CI 作为误差线。我尝试使用 statsmodel 使用以下代码计算每行的 CI:

import statsmodels.api as sm

df['CI'] =  df[['total_count', 'rand_count']].apply(lambda row: sm.stats.proportion_confint(count = 
df['rand_count'], nobs = df['total_count'], alpha = 0.05), axis = 1)

但是结果df['CI']看起来非常烦人，每行中所有 CI 的元组为:

0    ([0.14416430990026746, 0.2063732756491498, 0.1...
1    ([0.14416430990026746, 0.2063732756491498, 0.1...
2    ([0.14416430990026746, 0.2063732756491498, 0.1...
3    ([0.14416430990026746, 0.2063732756491498, 0.1...
4    ([0.14416430990026746, 0.2063732756491498, 0.1...
5    ([0.14416430990026746, 0.2063732756491498, 0.1...
6    ([0.14416430990026746, 0.2063732756491498, 0.1...
7    ([0.14416430990026746, 0.2063732756491498, 0.1...
8    ([0.14416430990026746, 0.2063732756491498, 0.1...
9    ([0.14416430990026746, 0.2063732756491498, 0.1...
Name: CI, dtype: object

期望的结果

df['CI'] 每行中两个元素的各自元组，例如:

(0.144164, 0.206373)
(0.179606, 0.243846)
(0.221421, 0.242859)
...................

还有两个单独的列df[upper]和df[lower]分别表示df['CI']的上限和下限分别。

非常感谢您的帮助。

非常感谢!

最佳答案

考虑分配多个列，这些列应按索引排列，因为 docs :

When a pandas object is returned, then the index is taken from the count.

df['lower_CI'], df['upper_CI'] =  sm.stats.proportion_confint(
                                      count = df['rand_count'],
                                      nobs = df['total_count'],
                                      alpha = 0.05
                                  )

关于python - 使用 statsmodel 计算 Pandas 时间序列中单个比例的 95% 置信区间，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67727853/

python - 使用 statsmodel 计算 Pandas 时间序列中单个比例的 95% 置信区间

上一篇：sdn - yang解析器库中的YangInferencePipeline已移至哪个包？

下一篇：ruby - 如何检查在另一个方法 rspec 中调用的方法的返回值？