sql-server - 如何从文本 (NVARCHAR(MAX)) 列中提取一个或多个 URL

标签 sql-server regex tsql substring

我的表中有一个数据列,在该列中,每行上的其他文本可以有零个、一个或多个 URL。我想将这些 URL 提取到仅包含这些 URL 的新数据集中。

为什么?因为我想将其中一些 URL 添加到数据库中的阻止列表中以防止垃圾邮件。

例如,我在数据列中有以下文本:

hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV.

然后我想要文本中的所有网址:

httx://portugal-forex.com/
httx://phen375treatment.com/
httx://priligy2000.org/
And so on.

我真的不知道从哪里开始在 SQL 中执行此操作。

最佳答案

这是一个例子。我从“httx://”搜索字符串到第一个“/”:

无论如何,您都需要一行一行地进行操作。

将代码放入函数

CREATE FUNCTION Temporary.getLinksFromText (@Tekstas NVARCHAR(MAX))
RETURNS @Data TABLE(TheLink NVARCHAR(500))
AS
BEGIN

    DECLARE @FirstIndexOfChar INT,
            @LastIndexOfChar INT,
            @LengthOfStringBetweenChars INT ,
            @String VARCHAR(MAX)

   SET @FirstIndexOfChar    = CHARINDEX('httx://',@Tekstas,0) 

    WHILE @FirstIndexOfChar > 0
    BEGIN

        SET @String = ''
        SET @LastIndexOfChar    = CHARINDEX('/',@Tekstas,@FirstIndexOfChar+7)
        SET @LengthOfStringBetweenChars = @LastIndexOfChar - @FirstIndexOfChar + 1

        SET @String = SUBSTRING(@Tekstas,@FirstIndexOfChar,@LengthOfStringBetweenChars)
        INSERT INTO @Data (TheLink) VALUES (@String);

        SET @Tekstas = SUBSTRING(@Tekstas, @LastIndexOfChar, LEN(@Tekstas))
        SET @FirstIndexOfChar = CHARINDEX('httx://',@Tekstas, 0) 

    END 

    RETURN
END

创建一些测试数据:

CREATE TABLE  #Data(weLink NVARCHAR(MAX));
INSERT INTO #Data VALUES 
('hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV.'),
('hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV.')

你可以像这样执行它(没有光标)

SELECT allLinks.*
FROM #Data AS D
OUTER APPLY Temporary.getLinksFromText (D.weLink) AS allLinks

关于sql-server - 如何从文本 (NVARCHAR(MAX)) 列中提取一个或多个 URL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22656154/

相关文章:

sql - 在SSMS中,如何找到包含星号,一些文本和另一个星号的字符串?

sql - 从具有自动增量列的表中复制数据

php - 如何使用 php preg_split 从字符串中获取除括号之外的所有内容?

javascript - 如何否定捕获组?

c# - 如何使用 linq 对结果集进行分组?

mysql - SELECT 、 FROM 或 WHERE 中的嵌套 SELECT 子句及其区别

SQL - 日期查询问题 - varchar 到日期时间的转换导致值超出范围

sql - 如何使用 SQL Server 在变量中创建 Insert with Select 查询

javascript - 可以制作确定函数返回类型的正则表达式吗?

mysql - Statement.RETURN_GENERATED_KEYS 是否会生成任何额外的往返行程来获取新创建的标识符?