Python 新手,希望将其与正则表达式一起使用来处理 5k+ 电子邮件地址的列表。我需要用引号更改封装每个地址。我使用 \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b
来识别每个电子邮件地址。我将如何替换当前条目 [email protected]到“[email protected] ”在每个 5k 电子邮件地址周围添加引号?
最佳答案
您可以使用re.sub模块并使用像这样的反向引用:
>>> a = "this is email: <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="087b67656d67666d4865696164266b6765" rel="noreferrer noopener nofollow">[email protected]</a> and this one is another email <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="87e1e8e8c7e5e6f5a9e4e8ea" rel="noreferrer noopener nofollow">[email protected]</a>"
>>> re.sub('([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', a)
'this is email: "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="63100c0e060c0d06230e020a0f4d000c0e" rel="noreferrer noopener nofollow">[email protected]</a>" and this one is another email "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9afcf5f5daf8fbe8b4f9f5f7" rel="noreferrer noopener nofollow">[email protected]</a>"'
更新:如果您有一个文件想要替换其每一行中的电子邮件,您可以使用 readlines()
,如下所示:
import re
with open("email.txt", "r") as file:
lines = file.readlines()
new_lines = []
for line in lines:
new_lines.append(re.sub('([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', line))
with open("email-new.txt", "w") as file:
file.writelines(new_lines)
电子邮件.txt:
this is <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="5b2f3e282f1b2834363e2f3332353c75383436" rel="noreferrer noopener nofollow">[email protected]</a> and another email here <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="47212828072526356924282a" rel="noreferrer noopener nofollow">[email protected]</a>
another email <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d0b1b2b390b2b3b4feb3bfbd" rel="noreferrer noopener nofollow">[email protected]</a>
still remaining <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e5968a88808a8b80a5968a8880918d8c8b82cb868a88" rel="noreferrer noopener nofollow">[email protected]</a>
email-new.txt(运行代码后):
this is "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="027667717642716d6f67766a6b6c652c616d6f" rel="noreferrer noopener nofollow">[email protected]</a>" and another email here "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="13757c7c537172613d707c7e" rel="noreferrer noopener nofollow">[email protected]</a>"
another email "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d0b1b2b390b2b3b4feb3bfbd" rel="noreferrer noopener nofollow">[email protected]</a>"
still remaining "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="71021e1c141e1f1431021e1c140519181f165f121e1c" rel="noreferrer noopener nofollow">[email protected]</a>"
关于python - 使用正则表达式查找和替换电子邮件地址,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55365443/