我有两个 data_frame,如下所示:
df_name:
Student_ID Name DOB
0 1 Raju 1993-02-02
1 2 Indu 1987-01-04
2 3 Laya 2000-06-24
df_marks:
Student_ID Subject Int1/40 Int2/40
0 1 Eng 10 35
1 1 Tam 30 38
2 1 Mat 20 30
3 1 Sci 15 20
4 2 Eng 35 25
5 2 Tam 25 15
6 2 Mat 22 30
7 2 Sci 29 23
8 3 Eng 18 17
9 3 Tam 19 16
10 3 Mat 27 26
任务是创建一个data_frame(下面一个),我需要在其中添加df_marks['Int1/40']
和df_marks['Int2/40']
, if df_name['Student_ID'] == df_marks['Student_ID']
Student_id Name DOB Tam/50
0 1 Raju 1993-02-02 NaN
1 2 Indu 1987-01-04 NaN
2 3 Laya 2000-06-24 NaN
我试过了
df_out['Tam/50'] = df_marks[['Int1/40','Int2/40']].sum(axis=1).where(df_marks['Subject']==df_out['Student_id'])
但它给出的错误是,
ValueError: Can only compare identically-labeled Series objects
我们有什么简单的方法可以做到这一点吗?
问候, 迪帕克·达什
最佳答案
使用DataFrame.join
带有 df_name
中新列的聚合 sum
:
df_marks['Tam/50'] = df_marks[['Int1/40','Int2/40']].sum(axis=1)
df_name = df_name.join(df_marks.groupby('Student_ID')['Tam/50'].sum(), on='Student_ID')
print (df_name)
Student_ID Name DOB Tam/50
0 1 Raju 1993-02-02 198
1 2 Indu 1987-01-04 204
2 3 Laya 2000-06-24 123
或者没有辅助列的解决方案:
s = (df_marks[['Int1/40','Int2/40']].sum(axis=1)
.groupby(df_marks['Student_ID'])
.sum()
.rename('Tam/50'))
df_name = df_name.join(s, on='Student_ID')
print (df_name)
Student_ID Name DOB Tam/50
0 1 Raju 1993-02-02 198
1 2 Indu 1987-01-04 204
2 3 Laya 2000-06-24 123
关于python - 数据帧错误 - ValueError : Can only compare identically-labeled Series objects,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64042634/