我有一个数据框如下:
df=pd.DataFrame({ 'family' : ["A","A","B","B"],
'V1' : [5,5,40,10,],
'V2' :[50,10,180,20],
'gr_0' :["all","all","all","all"],
'gr_1' :["m1","m1","m2","m3"],
'gr_2' :["m12","m12","m12","m9"],
'gr_3' :["NO","m14","m15","NO"]
})
我想通过以下方式对其进行转换:
df_new=pd.DataFrame({ 'family' : ["A","A","A","A","B","B","B","B","B","B"],
'gr' : ["all","m1","m12","m14","all","m2","m3","m12","m9","m15"],
"calc(sumV2/sumV1)":[6,6,6,2,4,4.5,2,4.5,2,4.5]
})
family gr calc(sumV2/sumV1)
0 A all 6.0
1 A m1 6.0
2 A m12 6.0
3 A m14 2.0
4 B all 4.0
5 B m2 4.5
6 B m3 2.0
7 B m12 4.5
8 B m9 2.0
9 B m15 4.5
为了到达 df_new:
- 我希望行按“family”X“gr_”列的每个唯一值对齐。
- 对于每一行计算各自的 sum(V2)/sum(V1),如 df_new 中所示。
我对Python还很陌生。对我来说,这个软编码似乎相当复杂。 最好,我不希望在此 df_new 中列出“否”记录,但它也可以保留在输出中。
最佳答案
你可以这样做:
df_new = df.melt(id_vars=['family','V1','V2']).groupby(['family','value'])
.apply(lambda x: x.V2.sum()/x.V1.sum())
.reset_index(name='calc(sumV2/sumV1)')
df_new = df_new[df_new.value != 'NO'].reset_index(drop=True)
print(df_new)
family value calc(sumV2/sumV1)
0 A all 6.0
1 A m1 6.0
2 A m12 6.0
3 A m14 2.0
4 B all 4.0
5 B m12 4.5
6 B m15 4.5
7 B m2 4.5
8 B m3 2.0
9 B m9 2.0
关于python - reshape 数据框并对每行应用计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53323755/