我有一个看起来像这样的 DF:
trainee | course | completed | days overdue
Ava ABC Yes 0
Bob ABC Yes 1
Charlie DEF No 10
David DEF Yes 0
Emily DEF Yes 0
Finn GHI Yes 0
我需要创建一个 DF,它会告诉我有多少门类(class)已上完,有多少是及时上完的(即 0 逾期天数),以及完成的百分比是多少。
即结果应该是这样的:
course | count | in time | % completed in time
ABC 2 1 0.5
DEF 3 2 0.66
GHI 1 1 1
我怎样才能用 Pandas 做到这一点?
谢谢!
计划书
附注这是生成输入 DF 的代码:
df = pd.DataFrame({'Trainee': ['Ava','Bob','Charlie','David','Emily','Finn'],'Course':['ABC','ABC','DEF','DEF','DEF','GHI'],'Completed': ['Yes','Yes','No','Yes','Yes','Yes'],'Days overdue':[0,1,10,0,0,0]})
最佳答案
使用agg
fot total and count 0
每组,最后将列除以 div
:
tups = [('count', 'size'), ('in time', lambda x: (x==0).sum())]
df = df.groupby('course')['days overdue'].agg(tups).reset_index()
df['% completed in time'] = df['in time'].div(df['count'])
print (df)
course count in time % completed in time
0 ABC 2 1 0.500000
1 DEF 3 2 0.666667
2 GHI 1 1 1.000000
关于python - Pandas 数据框中的计数和计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50507750/