python - 使用正则表达式查找和替换电子邮件地址

标签 python

Python 新手,希望将其与正则表达式一起使用来处理 5k+ 电子邮件地址的列表。我需要用引号更改封装每个地址。我使用 \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b 来识别每个电子邮件地址。我将如何替换当前条目 [email protected]到“[email protected] ”在每个 5k 电子邮件地址周围添加引号?

最佳答案

您可以使用re.sub模块并使用像这样的反向引用:

>>> a = "this is email: <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="087b67656d67666d4865696164266b6765" rel="noreferrer noopener nofollow">[email protected]</a> and this one is another email <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="87e1e8e8c7e5e6f5a9e4e8ea" rel="noreferrer noopener nofollow">[email protected]</a>"
>>> re.sub('([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', a)

'this is email: "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="63100c0e060c0d06230e020a0f4d000c0e" rel="noreferrer noopener nofollow">[email protected]</a>" and this one is another email "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9afcf5f5daf8fbe8b4f9f5f7" rel="noreferrer noopener nofollow">[email protected]</a>"'

更新:如果您有一个文件想要替换其每一行中的电子邮件,您可以使用 readlines(),如下所示:

import re

with open("email.txt", "r") as file:
    lines = file.readlines()

new_lines = []
for line in lines:
    new_lines.append(re.sub('([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', line))

with open("email-new.txt", "w") as file:
    file.writelines(new_lines)

电子邮件.txt:

this is <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="5b2f3e282f1b2834363e2f3332353c75383436" rel="noreferrer noopener nofollow">[email protected]</a> and another email here <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="47212828072526356924282a" rel="noreferrer noopener nofollow">[email protected]</a>
another email <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d0b1b2b390b2b3b4feb3bfbd" rel="noreferrer noopener nofollow">[email protected]</a>
still remaining <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e5968a88808a8b80a5968a8880918d8c8b82cb868a88" rel="noreferrer noopener nofollow">[email protected]</a>

email-new.txt(运行代码后):

this is "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="027667717642716d6f67766a6b6c652c616d6f" rel="noreferrer noopener nofollow">[email protected]</a>" and another email here "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="13757c7c537172613d707c7e" rel="noreferrer noopener nofollow">[email protected]</a>"
another email "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d0b1b2b390b2b3b4feb3bfbd" rel="noreferrer noopener nofollow">[email protected]</a>"
still remaining "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="71021e1c141e1f1431021e1c140519181f165f121e1c" rel="noreferrer noopener nofollow">[email protected]</a>"

关于python - 使用正则表达式查找和替换电子邮件地址,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55365443/

相关文章:

python - scikit-learn "scoring must return a number"cross_val_score 错误

linux - 如何将 "tar"shell 命令翻译成 Python

Python 固定长度数据包

Python 2.6 unittest - 如何设置用于您正在测试的函数中的全局变量的值

python - 暗网 : No weights created after training custom objects

Python:用于快速全屏 jpg/png 显示的 OSX 库

python - 如何让程序根据给定的用户输入再次运行?

javascript - onclick 加载 {% include 'test.html' %}

python - Flask-SQLAlchemy 超时错误

python - mysql 中的正则表达式解析存储