sql - 如何在 SQL 中根据匹配百分比比较两个字符串

标签 sql sql-server

我想针对我在 T-SQL 中遇到的一个有趣问题发布一个解决方案。

问题: 根据匹配百分比比较两个字符串字段。 另外,这两个字符串中可能含有易位的单词。

例如:“Joni Bravo”和“Bravo Joni”。这两个字符串应返回 100% 的匹配,这意味着位置不相关。还有一些值得注意的事情是,此代码是为了比较其中以空格作为分隔符的字符串。如果第一个字符串没有空格,则匹配设置为 100%,无需实际检查。这没有被开发,因为该函数要比较的字符串总是包含两个或更多单词。另外,如果符合的话,它是在 MS SQL Server 2017 上编写的。

最佳答案

所以这是解决方案,希望这对任何人都有帮助:) GL

    /****** Object:  UserDefinedFunction [dbo].[STRCOMP]    Script Date: 29/03/2018 15:31:45 ******/
    SET ANSI_NULLS ON
    GO
    
    SET QUOTED_IDENTIFIER ON
    GO
    
    CREATE FUNCTION [dbo].[STRCOMP] (
        -- Add the parameters for the function here
        @name_1 varchar(255),@name_2 varchar(255)
    )
    RETURNS float
    AS
    BEGIN
        

-- Declare the return variable and any needed variable here
    declare @p int = 0;
    declare @c int = 0;
    declare @br int = 0;
    declare @p_temp int = 0;
    declare @emergency_stop int = 0;
    declare @fixer int = 0;
    declare @table1_temp table (
    row_id int identity(1,1),
    str1 varchar (255));
    declare @table2_temp table (
    row_Id int identity(1,1),
    str2 varchar (255));
    declare @n int = 1;
    declare @count int = 1;
    declare @result int = 0;
    declare @total_result float = 0;
    declare @result_temp int = 0;
    declare @variable float = 0.0;
    
--clean the two strings from unwanted symbols and numbers

    set @name_1 = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@name_1,'!',''),'  ',' '),'1',''),'2',''),'3',''),'4',''),'5',''),'0',''),'6',''),'7',''),'8',''),'9','');
    set @name_2 = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@name_2,'!',''),'  ',' '),'1',''),'2',''),'3',''),'4',''),'5',''),'0',''),'6',''),'7',''),'8',''),'9','');

--check if the first string has more than one words inside. If the string does 
--not have more than one words, return 100%
set @c = charindex(' ',substring(@name_1,@p,len(@name_1)));


IF(@c = 0)
BEGIN
RETURN 100.00
END;

--main logic of the operation. This is based on sound indexing and comparing the 
--outcome. This loops through the string whole words and determines their soundex
--code and then compares it one against the other to produce a definitive number --showing the raw match between the two strings @name_1 and @name_2.
WHILE (@br != 2 or @emergency_stop = 20)
BEGIN

insert into @table1_temp(str1)
select substring (@name_1,@p,@c);
set @p = len(substring (@name_1,@p,@c))+2;
set @p = @p + @p_temp - @fixer;
set @p_temp = @p;
set @c = CASE WHEN charindex(' ',substring(@name_1,@p,len(@name_1))) = 0 THEN len(@name_1) ELSE charindex(' ',substring(@name_1,@p,len(@name_1))) END;
set @fixer = 1;
set @br = CASE WHEN charindex(' ',substring(@name_1,@p,len(@name_1))) = 0 THEN @br + 1 ELSE 0 END;
set @emergency_stop = @emergency_stop +1;
END;

set @p = 0;
set @br = 0;
set @emergency_stop = 0;
set @fixer = 0;
set @p_temp = 0;
set @c = charindex(' ',substring(@name_2,@p,len(@name_2)));

WHILE (@br != 2 or @emergency_stop = 20)
BEGIN

insert into @table2_temp(str2)
select substring (@name_2,@p,@c);
set @p = len(substring (@name_2,@p,@c))+2;
set @p = @p + @p_temp - @fixer;
set @p_temp = @p;
set @c = CASE WHEN charindex(' ',substring(@name_2,@p,len(@name_2))) = 0 THEN len(@name_2) ELSE charindex(' ',substring(@name_2,@p,len(@name_2))) END;
set @fixer = 1;
set @br = CASE WHEN charindex(' ',substring(@name_2,@p,len(@name_2))) = 0 THEN @br + 1 ELSE 0 END;
set @emergency_stop = @emergency_stop +1;
END;

WHILE((select str1 from @table1_temp where row_id = @n) is not null)
BEGIN
    set @count = 1;
    set @result = 0;
    WHILE((select str2 from @table2_temp where row_id = @count) is not null)
    BEGIN
        set @result_temp = DIFFERENCE((select str1 from @table1_temp where row_id = @n),(select str2 from @table2_temp where row_id = @count));
        IF(@result_temp > @result)
            BEGIN
                set @result = @result_temp;
                
            END;
            
        set @count = @count + 1;         
    END;
    
    set @total_result = @total_result + @result;
    set @n = @n + 1;
END;

--gather the results and transform them in a percent match.
set @variable = (select @total_result / (select max(row_count) from (
select max(row_id) as row_count from @table1_temp
union
select max(row_id) as row_count from @table2_temp) a));
RETURN @variable/4 * 100;

END
GO

PS:我决定将其编写在用户定义函数中,只是为了满足我项目的需要。

关于sql - 如何在 SQL 中根据匹配百分比比较两个字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49556442/

相关文章:

sql - 我可以从哪里开始基本的 sql 查询并成为这方面的专家?

sql - 从字符串转换日期和/或时间时转换失败

sql - oracle中count(1)和count(*)的区别

c# - 从数据库中回调 jQuery widget 信息,并在合适的位置显示 widget

sql - System.Data.SqlClient 和 SQLNCLI10.1 提供程序之间有什么区别?

sql-server - 为什么 SQL Server 不允许在配置后恰好删除分发服务器?

mysql - SQL 中三表的方差差异

sql - 将 Select(多个结果)的结果存储在一个变量中(稍后在 in 语句中使用)

sql - 获取两个日期列之间差异最小的行

sql-server - 合并与截断和插入