我有 2 个数据框。一份包含学生批处理详细信息,另一份包含分数。我想加入 2 个数据框。
Dataframe1 包含
+-------+-------+-------+--+
| s1 | s2 | s3 | |
+-------+-------+-------+--+
| Stud1 | Stud2 | Stud3 | |
| Stud2 | Stud4 | Stud1 | |
| Stud1 | Stud3 | Stud4 | |
+-------+-------+-------+--+
Dataframe2 包含
+-------+-------+----------+--+
| Name | Point | Category | |
+-------+-------+----------+--+
| Stud1 | 90 | Good | |
| Stud2 | 80 | Average | |
| Stud3 | 95 | Good | |
| Stud4 | 55 | Poor | |
+-------+-------+----------+
我正在尝试为每个学生在同一数据集中映射标记。
+-------+-------+-------+----+----+----+
| Stud1 | Stud2 | Stud3 | 90 | 80 | 95 |
| Stud2 | Stud4 | Stud1 | 80 | 55 | 90 |
| Stud1 | Stud3 | Stud4 | 90 | 95 | 55 |
+-------+-------+-------+----+----+----+
我尝试了下面的代码,但它正在一一替换值。
s = df3['p1'].map(dfnamepoints.set_index('name')['points'])
df4 = df3.drop('p1', 1).assign(points = s)
最佳答案
如果 Name
列中存在 df3
中的所有值,解决方案的工作原理相同:
s = dfnamepoints.set_index('Name')['Point']
df = df3.join(df3.replace(s).add_prefix('new_'))
或者:
df = df3.join(df3.apply(lambda x: x.map(s)).add_prefix('new_'))
或者:
df = df3.join(df3.applymap(s.get).add_prefix('new_'))
print (df)
s1 s2 s3 new_s1 new_s2 new_s3
0 Stud1 Stud2 Stud3 90 80 95
1 Stud2 Stud4 Stud1 80 55 90
2 Stud1 Stud3 Stud4 90 95 55
如果不存在,输出会有所不同 - 对于不存在的值 (Stud1
),会得到 NaN
s:
print (dfnamepoints)
Name Point Category
0 Stud2 80 Average
1 Stud3 95 Good
2 Stud4 55 Poor
df = df3.join(df3.applymap(s.get).add_prefix('new_'))
#or
df = df3.join(df3.applymap(s.get).add_prefix('new_'))
print (df)
s1 s2 s3 new_s1 new_s2 new_s3
0 Stud1 Stud2 Stud3 NaN 80 95.0
1 Stud2 Stud4 Stud1 80.0 55 NaN
2 Stud1 Stud3 Stud4 NaN 95 55.0
对于替换
获取原始值:
df = df3.join(df3.replace(s).add_prefix('new_'))
print (df)
s1 s2 s3 new_s1 new_s2 new_s3
0 Stud1 Stud2 Stud3 Stud1 80 95
1 Stud2 Stud4 Stud1 80 55 Stud1
2 Stud1 Stud3 Stud4 Stud1 95 55
关于python - 如何在基于其他数据帧的数据帧中创建联接?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54607098/