在下面用 df3 的列对 df2 的列求和的最佳方法是什么?
df = pd.DataFrame(np.random.rand(25).reshape((5,5)),index = ['A','B','C','D','E'])
df1 = pd.DataFrame(np.random.rand(15).reshape((5,3)),index = ['A','B','C','D','E'])
df2 = pd.concat([df,df1],axis=1)
df3 = pd.DataFrame(np.random.rand(25).reshape((5,5)),columns = np.arange(5),index = ['A','B','C','D','E'])
答案将是 df3 的形状。
为清楚起见编辑:
df = pd.DataFrame(np.ones(25).reshape((5,5)),index = ['A','B','C','D','E'])
df1 = pd.DataFrame(np.ones(15).reshape((5,3))*2,index = ['A','B','C','D','E'],columns = [1,3,4])
df2 = pd.concat([df,df1],axis=1)
df3 = pd.DataFrame(np.empty((5,5)),columns = np.arange(5),index = ['A','B','C','D','E'])
print(df2)
0 1 2 3 4 1 3 4
A 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0
B 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0
C 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0
D 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0
E 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0
期望的结果是:
0 1 2 3 4
A 1.0 3.0 1.0 3.0 3.0
B 1.0 3.0 1.0 3.0 3.0
C 1.0 3.0 1.0 3.0 3.0
D 1.0 3.0 1.0 3.0 3.0
E 1.0 3.0 1.0 3.0 3.0
最佳答案
您可以按列对 DF 进行分组:
In [57]: df2.groupby(axis=1, by=df2.columns).sum()
Out[57]:
0 1 2 3 4
A 1.0 3.0 1.0 3.0 3.0
B 1.0 3.0 1.0 3.0 3.0
C 1.0 3.0 1.0 3.0 3.0
D 1.0 3.0 1.0 3.0 3.0
E 1.0 3.0 1.0 3.0 3.0
您可以明确指定轴名称:
In [58]: df2.groupby(axis='columns', by=df2.columns).sum()
Out[58]:
0 1 2 3 4
A 1.0 3.0 1.0 3.0 3.0
B 1.0 3.0 1.0 3.0 3.0
C 1.0 3.0 1.0 3.0 3.0
D 1.0 3.0 1.0 3.0 3.0
E 1.0 3.0 1.0 3.0 3.0
或a short version from @piRSquared
df2.groupby(df2.columns, 1).sum()
关于python - 具有重复列名的 Pandas sumif,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43530290/