PostgreSQL:匹配带或不带子域的电子邮件地址

标签 postgresql pattern-matching email-address

场景

在其历史的大部分时间里,我的公司在电子邮件地址中使用子域,主要按州划分,但其他公司则有部门子域。我们拥有的一些示例包括:

mo.widgits.com
sd.widgits.com
va.widgits.com
nhq.widgits.com
gis.widgits.com
tech.widgits.com

...等等。

新范式

几年前,高层管理人员决定希望我们成为一个幸福的家庭;作为这种文化调整的一部分,他们将每个人的电子邮件地址更改为单一域,格式为 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3a5c5348494e545b575f14565b494e545b575f7a4d535e5d534e4914595557" rel="noreferrer noopener nofollow">[email protected]</a>

当前挑战

在我们的许多公司数据库中,我们发现混合使用旧格式和新格式的记录。例如,同一个人可能在员工系统中拥有 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d0a0bfa2bba9fea0b9b790a7b9b4b7b9a4a3feb3bfbd" rel="noreferrer noopener nofollow">[email protected]</a>,在培训系统中拥有 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="deaeb1acb5a7f0aeb7b99eb7b0f0a9b7bab9b7aaadf0bdb1b3" rel="noreferrer noopener nofollow">[email protected]</a>。我需要在各个系统中匹配个人,无论该系统中使用哪种格式的电子邮件。

所需的比赛

<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="84f4ebf6effdaaf4ede3c4edeaaaf3ede0e3edf0f7aae7ebe9" rel="noreferrer noopener nofollow">[email protected]</a> = <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1f6f706d7466316f76785f68767b78766b6c317c7072" rel="noreferrer noopener nofollow">[email protected]</a> -> true
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="442925363d6a342b34342d2a3704332d20232d30376a272b29" rel="noreferrer noopener nofollow">[email protected]</a> = <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9cf1fdeee5b2ecf3ececf5f2efdcf2f4edb2ebf5f8fbf5e8efb2fff3f1" rel="noreferrer noopener nofollow">[email protected]</a> -> true
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="73111c115d111218160133041a17141a07005d101c1e" rel="noreferrer noopener nofollow">[email protected]</a> = <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4a28252864282b38212f380a2d2339643d232e2d233e3964292527" rel="noreferrer noopener nofollow">[email protected]</a> -> false

如何实现这一点?

是否有一个正则表达式模式可以用来匹配电子邮件地址,无论它们是什么格式?或者我是否需要在尝试匹配它们之前手动提取子域?

最佳答案

在我看来,您可以在比较之前从所有电子邮件地址中删除子域(即仅比较电子邮件名称和域)。像这样的事情:

SELECT *
FROM emails
WHERE REGEXP_REPLACE(email1, '^(.*@).*?([^.]+\.[^.]+)$', '\1\2') =
      REGEXP_REPLACE(email2, '^(.*@).*?([^.]+\.[^.]+)$', '\1\2');

screen capture from demo link below

Demo

数据:

WITH emails AS (
    SELECT '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="52223d20392b7c223b35123b3c7c253b36353b26217c313d3f" rel="noreferrer noopener nofollow">[email protected]</a>' AS email1, '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="21514e534a580f51484661564845464855520f424e4c" rel="noreferrer noopener nofollow">[email protected]</a>' AS email2 UNION ALL
    SELECT '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e08d819299ce908f9090898e93a097898487899493ce838f8d" rel="noreferrer noopener nofollow">[email protected]</a>', '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="573a36252e79273827273e392417393f2679203e33303e23247934383a" rel="noreferrer noopener nofollow">[email protected]</a>' UNION ALL
    SELECT '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bedcd1dc90dcdfd5dbccfec9d7dad9d7cacd90ddd1d3" rel="noreferrer noopener nofollow">[email protected]</a>','<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c3a1aca1eda1a2b1a8a6b183a4aab0edb4aaa7a4aab7b0eda0acae" rel="noreferrer noopener nofollow">[email protected]</a>'
)

以下是所使用的正则表达式模式的说明:

^                   start of the email
    (.*@)           match email name including @ in \1
    .*?             consume content up, but not including
    ([^.]+\.[^.]+)  final domain only (e.g. google.com)
$                   end of the email

然后,我们替换为 \1\2 以有效删除任何子域组件。

关于PostgreSQL:匹配带或不带子域的电子邮件地址,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66979182/

相关文章:

c# - xamarin.android visual studio 2015 中的电子邮件地址验证编码?

sql - 根据单一标准从两个表中计数

postgresql - PHPStorm:无法解析 PostgreSQL 准备查询中的列 '$1'

JavaScript 模式匹配

erlang - 模式在 Erlang 中不匹配

根据 RFC5321/RFC5322 对电子邮件地址进行正则表达式验证

java - Spring Social Facebook 2.0.2 无法获取电子邮件

postgresql - Excel 格式的两个日期 postgresql 之间的差异

postgresql - 在 Laravel 查询生成器中使用 CTE(WITH 查询)

java - 正则表达式模式匹配器