我有一个时间序列数据框:
df = pd.DataFrame({'year':['2010','2011','2012','2013','2014','2015','2016','2017','2018','2019'],
'total_count': [545,779,706,547,626,530,766,1235,1260,947],
'rand_count':[96,184,148,154,160,149,124,274,322,301],
'rand_perc':[17.61,23.62,20.96,28.15,25.56,28.11,16.19,22.19,25.56,31.78]
})
这里;
df['rand_perc'] = (df['rand_count']/df['total_count'])*100
问题:
我想计算 df< 的每一行中
并绘制 df['rand_count']
与 df['total_count']
的单个比例的置信区间df['year']
与 df['rand_perc']
的对比图,并以 CI
作为误差线。我尝试使用 statsmodel 使用以下代码计算每行的 CI:
import statsmodels.api as sm
df['CI'] = df[['total_count', 'rand_count']].apply(lambda row: sm.stats.proportion_confint(count =
df['rand_count'], nobs = df['total_count'], alpha = 0.05), axis = 1)
但是结果df['CI']
看起来非常烦人,每行中所有 CI 的元组为:
0 ([0.14416430990026746, 0.2063732756491498, 0.1...
1 ([0.14416430990026746, 0.2063732756491498, 0.1...
2 ([0.14416430990026746, 0.2063732756491498, 0.1...
3 ([0.14416430990026746, 0.2063732756491498, 0.1...
4 ([0.14416430990026746, 0.2063732756491498, 0.1...
5 ([0.14416430990026746, 0.2063732756491498, 0.1...
6 ([0.14416430990026746, 0.2063732756491498, 0.1...
7 ([0.14416430990026746, 0.2063732756491498, 0.1...
8 ([0.14416430990026746, 0.2063732756491498, 0.1...
9 ([0.14416430990026746, 0.2063732756491498, 0.1...
Name: CI, dtype: object
期望的结果
df['CI'] 每行中两个元素的各自元组,例如:
(0.144164, 0.206373)
(0.179606, 0.243846)
(0.221421, 0.242859)
...................
还有两个单独的列df[upper]
和df[lower]
分别表示df['CI']
的上限和下限分别。
非常感谢您的帮助。
非常感谢!
最佳答案
考虑分配多个列,这些列应按索引排列,因为 docs :
When a pandas object is returned, then the index is taken from the count.
df['lower_CI'], df['upper_CI'] = sm.stats.proportion_confint(
count = df['rand_count'],
nobs = df['total_count'],
alpha = 0.05
)
关于python - 使用 statsmodel 计算 Pandas 时间序列中单个比例的 95% 置信区间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67727853/