python - 使用正则表达式查找和替换电子邮件地址

Python 新手，希望将其与正则表达式一起使用来处理 5k+ 电子邮件地址的列表。我需要用引号更改封装每个地址。我使用 \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b 来识别每个电子邮件地址。我将如何替换当前条目 [email protected]到“[email protected] ”在每个 5k 电子邮件地址周围添加引号？

最佳答案

您可以使用re.sub模块并使用像这样的反向引用:

>>> a = "this is email: <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="087b67656d67666d4865696164266b6765" rel="noreferrer noopener nofollow">[email protected]</a> and this one is another email <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="87e1e8e8c7e5e6f5a9e4e8ea" rel="noreferrer noopener nofollow">[email protected]</a>"
>>> re.sub('([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', a)

'this is email: "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="63100c0e060c0d06230e020a0f4d000c0e" rel="noreferrer noopener nofollow">[email protected]</a>" and this one is another email "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9afcf5f5daf8fbe8b4f9f5f7" rel="noreferrer noopener nofollow">[email protected]</a>"'

更新:如果您有一个文件想要替换其每一行中的电子邮件，您可以使用 readlines()，如下所示:

import re

with open("email.txt", "r") as file:
    lines = file.readlines()

new_lines = []
for line in lines:
    new_lines.append(re.sub('([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', line))

with open("email-new.txt", "w") as file:
    file.writelines(new_lines)

电子邮件.txt:

this is <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="5b2f3e282f1b2834363e2f3332353c75383436" rel="noreferrer noopener nofollow">[email protected]</a> and another email here <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="47212828072526356924282a" rel="noreferrer noopener nofollow">[email protected]</a>
another email <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d0b1b2b390b2b3b4feb3bfbd" rel="noreferrer noopener nofollow">[email protected]</a>
still remaining <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e5968a88808a8b80a5968a8880918d8c8b82cb868a88" rel="noreferrer noopener nofollow">[email protected]</a>

email-new.txt(运行代码后):

this is "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="027667717642716d6f67766a6b6c652c616d6f" rel="noreferrer noopener nofollow">[email protected]</a>" and another email here "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="13757c7c537172613d707c7e" rel="noreferrer noopener nofollow">[email protected]</a>"
another email "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d0b1b2b390b2b3b4feb3bfbd" rel="noreferrer noopener nofollow">[email protected]</a>"
still remaining "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="71021e1c141e1f1431021e1c140519181f165f121e1c" rel="noreferrer noopener nofollow">[email protected]</a>"

关于python - 使用正则表达式查找和替换电子邮件地址，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55365443/

python - 使用正则表达式查找和替换电子邮件地址

上一篇：python - Pandas:如何通过最接近的索引匹配来组合两个数据帧？

下一篇：python - 对于范围(5)中的 i 和范围(2)中的 k : TypeError: 'bool' object is not iterable