python-3.x - 根据 Pandas 中的一个公共(public)列从另一个数据框中更新多列

给定以下两个数据框:

df1:

   id city district  year  price
0   1  bjs      cyq  2018     12
1   2  bjs      cyq  2019      6
2   3   sh       hp  2018      4
3   4  shs      hpq  2019      3

df2:

   id city district  year
0   1   bj       cy  2018
1   2   bj       cy  2019
2   4   sh       hp  2019

假设 df1 的 city 和 district 中的某些值有错误，因此我需要更新 city和 df1 中的 district 值与基于 id 的 df2 值，我的预期结果是这样的:

   id city district  year  price
0   1   bj       cy  2018     12
1   2   bj       cy  2019      6
2   3   sh       hp  2018      4
3   4   sh       hp  2019      3

我怎么能在 Pandas 中做到这一点？谢谢。

更新:

解决方案一:

cities = df2.set_index('id')['city']
district = df2.set_index('id')['district']

df1['city'] = df1['id'].map(cities)
df1['district'] = df1['id'].map(district)

解决方案 2:

df1[["city","district"]] = pd.merge(df1,df2,on=["id"],how="left")[["city_y","district_y"]]

print(df1)

输出:

   id city district  year  price
0   1   bj       cy  2018     12
1   2   bj       cy  2019      6
2   3  NaN      NaN  2018      4
3   4   sh       hp  2019      3

请注意 city 和 district for id 是 3 是 NaN ，但我想保留 df1 中的值。

最佳答案

尝试combine_first:

df2.set_index('id').combine_first(df1.set_index('id')).reset_index()

输出:

   id city district  price    year
0   1   bj       cy   12.0  2018.0
1   2   bj       cy    6.0  2019.0
2   3   sh       hp    4.0  2018.0
3   4   sh       hp    3.0  2019.0

关于python-3.x - 根据 Pandas 中的一个公共(public)列从另一个数据框中更新多列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61435505/

上一篇：Perl::Brutal 模式下的 Critic

下一篇：c# - 确定数组是否包含重复值的最快方法是什么？

相关文章：

python - .drop() 的 Pandas bool 索引错误

python - Python 3 中使用 sys 模块的奇怪行为

python - 在 Python 中向 __call__ 方法添加验证

python Pandas : change one column conditional on another

r - 无法按日期对数据框排序

r - 在 R 中的 dplyr 中分组后如何保留其他变量？

python - 无法将一些难以辨认的内容处理为可读

python - 使 df.apply 成为索引和列函数的最佳方法

python - pandas DataFrame 中 x 天内每个元素的累积乘积

python - 构造一行聚合数据框