整个星期我都在为这个问题苦苦挣扎。我有两个 DataFrame 如下:
df1:
Account| ID | Name
--------------------------------------
B36363 | 2019001 | John
G47281 | 2019002;2018101 | Alice;Emma
H46291 | 2019001 | John
df2:
Account | Col_B | Col_C
-----------------------------
B36363-0 | text_b1 | text_c1
01_G47281 | text_b2 | text_c2
X_H46291 | text_b3 | text_c3
II_G47281 | text_b4 | text_C4
我想在 df2.Account 包含 df1.Account 时合并这些 DataFrames on Account(不是与正常合并/加入的完全匹配!)
期望的输出:
df3:
Account | Col_B | Col_C | ID | Name
--------------------------------------------------------------
B36363-0 | text_b1 | text_c1 | 2019001 | John
01_G47281 | text_b2 | text_c3 | 2019002;2018101 | Alice;Emma
X_H46291 | text_b3 | text_c3 | 2019001 | John
II_G47281 | text_b4 | text_C4 | 2019002;2018101 | Alice;Emma
我没有示例代码,因为我不知道如何处理它。正常的合并/连接很顺利,但如果我想使用包含则不行。非常感谢您提前
最佳答案
你可以试试str.extract
与 join()
:
d=df1.set_index('Account').agg(list,axis=1).to_dict()
p='({})'.format('|'.join(df1.Account))
#'(B36363|G47281|H46291)'
m=pd.DataFrame(df2.Account.str.extract(p,expand=False).map(d).fillna('').tolist()
,columns=['ID','Name'],index=df2.index)
df2.join(m)
Account Col_B Col_C ID Name
1 B36363-0 text_b1 text_c1 2019001 John
2 01_G47281 text_b2 text_c2 2019002;2018101 Alice;Emma
3 X_H46291 text_b3 text_c3 2019001 John
4 II_G47281 text_b4 text_C4 2019002;2018101 Alice;Emma
关于python - 使用 `contains` 合并 DataFrame(不是完全匹配!),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57852601/