sql - oracle正则表达式提取标签内的字符串

标签 sql regex oracle

这是我的练习,我有这样的文字:

 "lovely heart"<<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="244546470a404142644c4b5049454d480a474b49" rel="noreferrer noopener nofollow">[email protected]</a>>,
 "<<*>>Freeeky<<*>> Jack" <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4a2b2b282829290a2d272b232664292527" rel="noreferrer noopener nofollow">[email protected]</a>>,
 "heavens's kingk*ng '-'asdf" <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="71131313310810191e1e5f121e5f181f" rel="noreferrer noopener nofollow">[email protected]</a>>
 "sample[^-^]"<<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="99eaf8f4e9f5fcd9eaeab7faf6f4" rel="noreferrer noopener nofollow">[email protected]</a>>

我只需要提取:

<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="91f0f3f2bff5f4f7d1f9fee5fcf0f8fdbff2fefc" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7c1d1d1e1e1f1f3c1b111d1510521f1311" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a1c3c3c3e1d8c0c9cece8fc2ce8fc8cf" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3340525e435f567340401d505c5e" rel="noreferrer noopener nofollow">[email protected]</a>

这是我的尝试,但仍然完成了一半或更少。

WITH t AS
     (SELECT '"lovely heart"<<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f3929190dd979695b39b9c879e929a9fdd909c9e" rel="noreferrer noopener nofollow">[email protected]</a>>,
"<<*>>Freeeky<<*>> Jack" <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="dfbebebdbdbcbc9fb8b2beb6b3f1bcb0b2" rel="noreferrer noopener nofollow">[email protected]</a>>, 
"heavens''s kingk*ng ''-''asdf" <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3e5c5c5c7e475f565151105d5153" rel="noreferrer noopener nofollow">[email protected]</a>>' word
     FROM dual
     )
SELECT regexp_substr(word, '<(.*@.*)>',1,LEVEL, NULL,1)
FROM t
     CONNECT BY level <= regexp_count(word, '<(.*@.*)>');

一些结果如下:

<*>>Freeeky<<*>> Jack" <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9bfafaf9f9f8f8dbfcf6faf2f7b5f8f4f6" rel="noreferrer noopener nofollow">[email protected]</a>

请问有什么好的解决办法吗

谢谢

最佳答案

你的正则表达式的问题是,第一个 .*之后<将匹配 @ 之前的所有字符,作为 dot(.)正则表达式中的 可以匹配除换行符之外的任何字符。所以,它甚至会匹配 <> 。以下是它与您的字符串的匹配方式:

'"< <<*>>Freeeky<<*>> Jack" <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7e1f1f1c1c1d1d3e19131f1712501d1113" rel="noreferrer noopener nofollow">[email protected]</a>  >"'
  ^ ^                                       ^  ^
  | -----------------------------------------  |
  |                      |                     |
 Match the first `<`   (.*@.*)           Match the last `>`.

因此,捕获的组是:

<<*>>Freeeky<<*>> Jack" <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4223232020212102252f232b2e6c212d2f" rel="noreferrer noopener nofollow">[email protected]</a>

这就是你得到的。您可以更改.*[^<>]*匹配除 < 之外的任何字符和> :

使用以下正则表达式:

'<([^<>]*@[^<>]*)>'

关于sql - oracle正则表达式提取标签内的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18014379/

相关文章:

python - 如何通过交替获取正则表达式中的匹配词?

sql - 如何从 Oracle 中的选定行中获取行号

java.sql.SQLException : Invalid state, CallableStatement 对象已关闭

php - 我需要帮助将链接插入我的数据库

php - MySQL:多行结果 VS 单行结果

java - 获取双引号内的字符串以及字符串的其余部分

sql - 要清理的电话号码模式

java - 在 Google Refine 中解析部分字符串 - 错误消息

oracle - 如何在 PL/SQL 中使用 select 语句为 SQL*Plus 变量赋值?

node.js - 如何获取 node-oracledb 中过程的 dbms_output.put_line 输出