我有一个看起来像这样的 df
first_name last_name
John Doe
Kelly Stevens
Dorey Chang
还有一个看起来像这样的
name email
John Doe <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8ce6e8e3e9bebfccebe1ede5e0a2efe3e1" rel="noreferrer noopener nofollow">[email protected]</a>
Kelly M Stevens <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9cf7f9f0f0e5b2efe8f9eaf9f2efdcf4f3e8f1fdf5f0b2fff3f1" rel="noreferrer noopener nofollow">[email protected]</a>
D Chang <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bfdcd7ded1d88886ffc6ded7d0d091dcd0d2" rel="noreferrer noopener nofollow">[email protected]</a>
合并这两个表,最终结果是
first_name last_name email
John Doe <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6208060d07505122050f030b0e4c010d0f" rel="noreferrer noopener nofollow">[email protected]</a>
Kelly Stevens <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ff949a939386d18c8b9a899a918cbf97908b929e9693d19c9092" rel="noreferrer noopener nofollow">[email protected]</a>
Dorey Chang <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="305358515e570709704951585f5f1e535f5d" rel="noreferrer noopener nofollow">[email protected]</a>
我无法合并姓名,但所有电子邮件都包含每个人的姓氏,即使整体格式不同。有没有办法仅使用部分字符串匹配来合并这些?
我尝试过类似的事情但没有成功:
df1['email']= df2[df2['email'].str.contains(df['last_name'])==True]
最佳答案
IIUC,您可以对提取结果进行合并
:
df1.merge(df2.assign(last_name=df2['name'].str.extract(' (\w+)$'))
.drop('name', axis=1),
on='last_name',
how='left')
输出:
first_name last_name email
0 John Doe <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="dfb5bbb0baedec9fb8b2beb6b3f1bcb0b2" rel="noreferrer noopener nofollow">[email protected]</a>
1 Kelly Stevens <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b1dad4ddddc89fc2c5d4c7d4dfc2f1d9dec5dcd0d8dd9fd2dedc" rel="noreferrer noopener nofollow">[email protected]</a>
2 Dorey Chang <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="04676c656a63333d447d656c6b6b2a676b69" rel="noreferrer noopener nofollow">[email protected]</a>
关于python - 基于pandas dfs中的部分字符串匹配进行合并,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59534955/