python - 基于pandas dfs中的部分字符串匹配进行合并

标签 python python-3.x pandas

我有一个看起来像这样的 df

first_name last_name
John       Doe
Kelly      Stevens
Dorey      Chang

还有一个看起来像这样的

name             email
John Doe         <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8ce6e8e3e9bebfccebe1ede5e0a2efe3e1" rel="noreferrer noopener nofollow">[email protected]</a>
Kelly M Stevens  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9cf7f9f0f0e5b2efe8f9eaf9f2efdcf4f3e8f1fdf5f0b2fff3f1" rel="noreferrer noopener nofollow">[email protected]</a>
D Chang          <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bfdcd7ded1d88886ffc6ded7d0d091dcd0d2" rel="noreferrer noopener nofollow">[email protected]</a>

合并这两个表,最终结果是

first_name last_name email
    John   Doe       <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6208060d07505122050f030b0e4c010d0f" rel="noreferrer noopener nofollow">[email protected]</a>
    Kelly  Stevens   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ff949a939386d18c8b9a899a918cbf97908b929e9693d19c9092" rel="noreferrer noopener nofollow">[email protected]</a>
    Dorey  Chang     <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="305358515e570709704951585f5f1e535f5d" rel="noreferrer noopener nofollow">[email protected]</a>

我无法合并姓名,但所有电子邮件都包含每个人的姓氏,即使整体格式不同。有没有办法仅使用部分字符串匹配来合并这些?

我尝试过类似的事情但没有成功:

df1['email']= df2[df2['email'].str.contains(df['last_name'])==True]

最佳答案

IIUC,您可以对提取结果进行合并:

df1.merge(df2.assign(last_name=df2['name'].str.extract(' (\w+)$'))
             .drop('name', axis=1),
          on='last_name',
          how='left')

输出:

  first_name last_name                      email
0       John       Doe           <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="dfb5bbb0baedec9fb8b2beb6b3f1bcb0b2" rel="noreferrer noopener nofollow">[email protected]</a>
1      Kelly   Stevens  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b1dad4ddddc89fc2c5d4c7d4dfc2f1d9dec5dcd0d8dd9fd2dedc" rel="noreferrer noopener nofollow">[email protected]</a>
2      Dorey     Chang          <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="04676c656a63333d447d656c6b6b2a676b69" rel="noreferrer noopener nofollow">[email protected]</a>

关于python - 基于pandas dfs中的部分字符串匹配进行合并,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59534955/

相关文章:

python - 让 pandas 打印完整的字符串

json - 使用 Pandas .to_sql 将 JSON 列写入 Postgres

python - 通过 Python/Linux 设置时间和日期

使用 Selenium 进行 Python 网页抓取 - 通过 href 链接进行迭代

python - from .. import 和 from .. 之间的区别进口

python - 如何删除未连接到二进制图像中循环的白色像素

python-3.x - 如何在 python selenium 中设置 Chrome 实验性选项 same-site-by-default-cookie

python-3.x - PEP8 与 Google Cloud Build 集成

python - Pandas 在 LOC 函数中使用 and 运算符

python - 将词袋 scikits 分类器与任意数字字段合并