python - 如何在没有正则表达式的情况下标准化电子邮件地址

问题是，无论我做什么，如果没有正则表达式的帮助，我似乎都无法创建一个标准化(而不是验证)的函数。例如，我希望代码不打印 invalid 或 valid 电子邮件地址，而是:

过滤掉@gmail.com部分之前的+符号、.符号等;
确保程序不区分大写字母和非大写字母(返回相同的内容)。

以下是我当前尝试过滤的电子邮件地址:

<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="157f7a7d7b66787c617d3e65747b7067747767707471557278747c793b767a78" rel="noreferrer noopener nofollow">[email protected]</a>
# should return <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e18b8e898f928c889589a1868c80888dcf828e8c" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1f75507751316c52764b775f78727e7673317c7072" rel="noreferrer noopener nofollow">[email protected]</a>
# should return <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1f757077716c72766b775f78727e7673317c7072" rel="noreferrer noopener nofollow">[email protected]</a>

这是我到目前为止所做的工作:

def normalizeEmail(emailIn):
    if emailIn != regex:
        ch_1 = '+'
        ch_2 = '.'
        new_emailIn = (emailIn.lower().split(ch_1, 1)[0]).replace() + '@gmail.com'
        if emailIn.endswith('@gmail.com'):
            return new_emailIn 
        if new_emailIn != new_emailIn.endswith('gmailcom'):
            return (new_emailIn.lower().split(ch_2, 1)[0]) 

if __name__ == "__main__":
    print(normalizeEmail('<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3f555057514c52564b57144f5e515a4d5e5d4d5a5e5b7f58525e5653115c5052" rel="noreferrer noopener nofollow">[email protected]</a>'))
    print(normalizeEmail('<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="62082d0a2c4c112f0b162a22050f030b0e4c010d0f" rel="noreferrer noopener nofollow">[email protected]</a>'))

需要替换 .，并删除 + 之后的任何内容。

之前，我尝试使用正则表达式，但它似乎从未正确规范化电子邮件，并且会返回我自定义的异常错误:

Invalid Email Address

使用正则表达式，我尝试过:

import re
    def normalizeEmail(emailIn)
    regex  = '/^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/'
    if emailIn == regex: 
      return 'emailIn'
    if emailIn != regex
      return 'Invalid Email Address'

输出:

Invalid Email Address

我已经破解这个程序有一段时间了，但我无法破解它，这真的让我很沮丧。

请注意，该函数不能太具体，因为它需要标准化包含 35 个其他电子邮件地址的字典。然后，我会将列表汇总为标准化电子邮件并创建一个列表。

请帮忙。这是我编程的前 3 个月，我已经快要死了。

我尝试了正则表达式，并期望返回预期的电子邮件格式，但事实并非如此，因此我正在寻求不使用正则表达式的解决方案。

最佳答案

对于 str 类方法来说，这似乎是一个微不足道的任务:

addresses = [
    '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3c565354524f51554854174c5d52594e5d5e4e595d587c5b515d5550125f5351" rel="noreferrer noopener nofollow">[email protected]</a>',
    '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e288ad8aaccc91af8bb68aa2858f838b8ecc818d8f" rel="noreferrer noopener nofollow">[email protected]</a>'
]

for address in addresses:
    name, domain = map(str.lower, address.split('@'))
    if domain == 'gmail.com':
        name = name.replace('.', '')
        if '+' in name:
            name, tag = name.split('+', 1)
    print(f'{name}@{domain}')

上面的代码将产生以下输出:

<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="97fdf8fff9e4fafee3ffd7f0faf6fefbb9f4f8fa" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="cca6a3a4a2bfa1a5b8a48caba1ada5a0e2afa3a1" rel="noreferrer noopener nofollow">[email protected]</a>

或者，如果您希望将其作为函数:

def normalize_address(address):
    name, domain = map(str.lower, address.split('@'))
    if domain == 'gmail.com':
        name = name.replace('.', '')
        if '+' in name:
            name, tag = name.split('+', 1)
    return f'{name}@{domain}'


addresses = [
    '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8ae0e5e2e4f9e7e3fee2a1faebe4eff8ebe8f8efebeecaede7ebe3e6a4e9e5e7" rel="noreferrer noopener nofollow">[email protected]</a>',
    '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c7ad88af89e9b48aae93af87a0aaa6aeabe9a4a8aa" rel="noreferrer noopener nofollow">[email protected]</a>'
]

for address in addresses:
    print(normalize_address(address))

关于python - 如何在没有正则表达式的情况下标准化电子邮件地址，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/74622111/

python - 如何在没有正则表达式的情况下标准化电子邮件地址

上一篇：php - 判断一个点是否位于圆内 PHP

下一篇：python - 使用正则表达式根据特定条件排除数字