ruby - 我需要一个正则表达式来找到一个不在任何 html 标签或任何 html 标签的属性值内的 url

我在下面的文本中有 html 内容。

    "This is my text to be parsed which contains url 
    http://someurl.com?param1=foo&params2=bar 
 <a href="http://thisshouldnotbetampered.com">
    some text and a url http://someotherurl.com test 1q2w
 </a> <img src="http://someasseturl.com/abc.jpeg"/>
    <span>i have a link too http://someurlinsidespan.com?xyz=abc </span> 
    "

需要一个将纯 url 转换为超链接的正则表达式(不篡改现有超链接)

预期结果:

    "This is my text to be parsed which contains url 
    <a href="http://someurl.com?param1=foo&params2=bar">
http://someurl.com?param1=foo&params2=bar</a> 
 <a href="http://thisshouldnotbetampered.com">
    some text and a url http://someotherurl.com test 
1q2w </a> <img src="http://someasseturl.com/abc.jpeg"/>
    <span>i have a link too <a href="http://someurlinsidespan.com?xyz=abc">http://someurlinsidespan.com?xyz=abc</a> </span> "

最佳答案

_{Disclaimer: You shouldn't use regex for this task, use an html parser. This is a POC to demonstrate that it's possible if you expect a good formatted HTML (which you won't have anyway).}

这就是我想出的:
(https?:\/\/(?:w{1,3}.)?[^\s]*?(?:\.[a-z]+)+)(?![^<]*?(?:<\/\w+>|\/?>))

这是什么意思？

( : 第一组
https? : 匹配http或 https
\/\/ : 匹配//
(?:w{1,3}.)? : 可选地匹配 w. , ww.或 www.
[^\s]*? : 匹配除空格之外的任何内容零次或多次 ungreedy
(?:\.[a-z]+)+) : 匹配后跟 [a-z] 的点字符，重复一次或多次
(?! : 负前瞻
- [^<]*? : 匹配除 < 以外的任何内容零次或多次不贪心
- (?:<\/\w+>|\/?>) : 匹配结束标签或 />或 >
- ) : 前瞻结束
) : 第一组结束

regex101 online demo rubular online demo

关于ruby - 我需要一个正则表达式来找到一个不在任何 html 标签或任何 html 标签的属性值内的 url，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17038220/

ruby - 我需要一个正则表达式来找到一个不在任何 html 标签或任何 html 标签的属性值内的 url

上一篇：ruby - 为什么 bundler 看不到自定义的 gem 源？

下一篇：ruby - Sinatra 应用程序的 Heroku 生产问题(错误 R10)