我有以下数据框df1
:
X Y Order_ NEW_ID
0 484970.4517 408844.0920 95083 1320437
1 478512.3233 415791.5395 96478 1320727
2 504516.3032 452923.4420 105246 1321260
3 485147.0529 428172.1055 99633 1320979
还有一个,df2
:
Order_ Loc
0 83158 239,211
1 83159 239,212
2 83160 239,213
3 83161 239,214
我想将其与第一个合并,以便将正确的值添加到 Loc
列到 df1
中。为了进行合并,我使用 map
执行左合并,首先将 Loc
值转换为字符串:
df2['Loc'] = df2['Loc'].astype(str)
df1['Loc']=df1.Order_.map(df2.Loc)
结果很奇怪,df1
中出现的 Loc
值是 NaN
类型:
X Y Order_ NEW_ID Loc
0 484970.4517 408844.0920 95083 1320437 NaN
1 478512.3233 415791.5395 96478 1320727 NaN
2 504516.3032 452923.4420 105246 1321260 NaN
3 485147.0529 428172.1055 99633 1320979 NaN
而我希望它们是字符串并以 239,211
方式出现(包含逗号的字符串)。当调查 df2
中 Loc
的 dtype 时,我得到:
Order_ int64
Loc object
dtype: object
我的问题:如何执行从对象到字符串的类型更改,以便我能够有效读取 Loc
值,并避免它们变成 NaN
?
最佳答案
我认为如果需要相同的 dtypes
,您需要将 Order_
转换为 int
:
df1['Order_'] = df1['Order_'].astype(int)
但也许问题是您需要按 Series
或 dict
进行映射,因此 Order_
必须设置为索引:
d = df2.set_index('Order_')['Loc'].to_dict()
df1['Loc']= df1.Order_.map(d)
示例:
print (df1)
X Y Order_ NEW_ID
0 484970.4517 408844.0920 95083 1320437
1 478512.3233 415791.5395 96478 1320727
2 504516.3032 452923.4420 105246 1321260
3 485147.0529 428172.1055 99633 1320979
print (df2)
Order_ Loc
0 95083 239,211 <-first value was changed for align
1 83159 239,212
2 83160 239,213
3 83161 239,214
#check if same dtypes
print (df1['Order_'].dtypes)
int64
print (df2['Order_'].dtypes)
int64
d = df2.set_index('Order_')['Loc'].to_dict()
print (d)
{83160: '239,213', 83161: '239,214', 95083: '239,211', 83159: '239,212'}
df1['Loc']= df1.Order_.map(d)
print (df1)
X Y Order_ NEW_ID Loc
0 484970.4517 408844.0920 95083 1320437 239,211
1 478512.3233 415791.5395 96478 1320727 NaN
2 504516.3032 452923.4420 105246 1321260 NaN
3 485147.0529 428172.1055 99633 1320979 NaN
关于python - Pandas:使用 `map` 进行左合并返回 NaN,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42856034/