我想用python + pandas按技术总结发电厂的容量( previous question )。
对于此任务,数据必须进行分组/透视,而“技术”列中的列条目应成为列标签
这是我的输入:
Plant Name,Nameplate Capacity,Technology,...
Barry,153.1,Natural Gas Steam Turbine,..
Barry,153.1,Natural Gas Steam Turbine,..
Barry,403.7,Conventional Steam Coal,..
Barry,788.8,Conventional Steam Coal,..
Barry,195.2,Natural Gas Fired Combined Cycle,..
Barry,195.2,Natural Gas Fired Combined Cycle,..
以及所需的输出:
Plant Name,Natural Gas Steam Turbine,Conventional Steam Coal,Natural Gas Fired Combined Cycle,..
Barry,306.2,1192.5,390.4,..
我尝试了一些命令,但没有成功:
df.groupby(['Plant Name', 'Technology']).sum().pivot('Plant Name', 'Technology').fillna(0)
或
#with numpy as np
res = df.pivot_table(index=["Plant Name"], columns=["Plant Name"], values=["Technology"], aggfunc=np.sum)
一个附加问题
如何找出每行的最大条目(例如我的示例中的“常规动力煤”)作为新列?
最佳答案
我认为需要更改列名称并添加参数fill_value
:
res = df.pivot_table(index="Plant Name",
columns="Technology",
values="Nameplate Capacity",
aggfunc=np.sum,
fill_value=0).reset_index()
print (res)
Technology Plant Name Conventional Steam Coal \
0 Barry 1192.5
Technology Natural Gas Fired Combined Cycle Natural Gas Steam Turbine
0 390.4 306.2
第一个解决方案应更改为聚合 sum
和 unstack
的指定列 reshape :
res = (df.groupby(['Plant Name', 'Technology'])['Nameplate Capacity']
.sum()
.unstack(fill_value=0)
.reset_index())
print (res)
Technology Plant Name Conventional Steam Coal \
0 Barry 1192.5
Technology Natural Gas Fired Combined Cycle Natural Gas Steam Turbine
0 390.4 306.2
关于python - Pandas 分组/透视数据,而一列的条目成为新标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51634368/