这是我的数据框:
data = {'transit_time':[1,1,2,2,3,3],
'orig_state':['UT','UT','UT','UT','UT','UT'],
'dest_state':['CA','CA','AZ','AZ','NY','NY'],
'GEOID':['01','01','02','02','03','03'],
'dest_state_fn':['California','California','Arizona','Arizona','New York','New York'],
'dest_county_name':['county1','county1','county2','county2','county3','county3']
}
df = pd.DataFrame(data,columns = ['transit_time','orig_state','dest_state','GEOID','dest_state_fn','dest_county_name'])
print (df)
transit_time orig_state dest_state GEOID dest_state_fn dest_county_name
0 1 UT CA 01 California county1
1 1 UT CA 01 California county1
2 2 UT AZ 02 Arizona county2
3 2 UT AZ 02 Arizona county2
4 3 UT NY 03 New York county3
5 3 UT NY 03 New York county3
我想获得一个按 GEOID、dest_county_name、AVG(transit time)、COUNT(*) 分组的数据框,如下图所示:最佳答案
查看 groupby
+ agg
newdf=df.groupby(['GEOID','dest_county_name']).agg(ave_transit_time=('transit_time','mean'),
Count=('GEOID','count')).reset_index()
GEOID dest_county_name ave_transit_time Count
0 01 county1 1 2
1 02 county2 2 2
2 03 county3 3 2
关于python - Pandas - 尝试使用计数和平均值制作新的数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62987281/