mysql - vb.net + mysql - 搜索表中与输入值最相似的前 5 行

标签 mysql vb.net match

我有一个包含许多列的数据库,其中一列包含名称。

我的 vb.net 软件充当电报服务器并等待用户发送其全名。

数据库的名称可以有不同的拼写,例如“Marco Dell'Orso”可以拼写为“Marco Dellorso”或“Marco Dell Orso”或“Dell Orso Marco”或其他任何形式。用户还可能拼错他的名字并颠倒两个字母。例如“MaCRo Dell'Orso”

我需要一种方法来返回与查询中使用的单词最相似的 5 行。最好的方法是什么?我正在考虑将名称拆分为白色字符,然后在查询中对单个单词使用 LIKE,但这不适用于输入错误的单词。

编辑:

我当前的计划是,如果数据库包含多于或少于一行具有确切名称的行,则将输入拆分为单个单词并返回包含任何输入单词的所有字符串。这应该会将要分析的行数从 42000 行减少到几百行。一旦我有了这几百行,我就可以在这些行上运行 Levenshtein 函数并返回 5 个最匹配的..

这是个好主意吗?

最佳答案

通过将我的自定义函数与此链接中的预制 Levenshtein 函数相结合来解决此问题:How to calculate distance similarity measure of given 2 strings? 。我为另一个复合词中出现的每个单词分配一个分数。然后我根据每个单词与另一个单词的编辑比较添加分数。效果很好:

Public Class Form1

Private Sub TextBox1_KeyUp(sender As Object, e As KeyEventArgs) Handles TextBox1.KeyUp
    calc()
End Sub

Private Sub TextBox2_KeyUp(sender As Object, e As KeyEventArgs) Handles TextBox2.KeyUp
    calc()
End Sub


Sub calc()
    Label1.Text = compare(TextBox1.Text, TextBox2.Text)
End Sub

Public Function compare(source As String, target As String) As Integer
    Dim score As Double

    Dim sourcewords As String() = source.Split(New Char() {" "c, "'"c, "`"c, "´"c})
    Dim targetwords As String() = target.Split(New Char() {" "c, "'"c, "`"c, "´"c})

    For Each s In sourcewords
        If target.Contains(s) Then score = score + 1
        For Each t In targetwords
            score = score + 1 / (DamerauLevenshteinDistance(s, t, 100) + 1)
        Next
    Next

    For Each s In targetwords
        If source.Contains(s) Then score = score + 1
        For Each t In sourcewords
            score = score + 1 / (DamerauLevenshteinDistance(s, t, 100) + 1)
        Next
    Next



    Return score
End Function

''' <summary>
''' Computes the Damerau-Levenshtein Distance between two strings, represented as arrays of
''' integers, where each integer represents the code point of a character in the source string.
''' Includes an optional threshhold which can be used to indicate the maximum allowable distance.
''' </summary>
''' <param name="source">An array of the code points of the first string</param>
''' <param name="target">An array of the code points of the second string</param>
''' <param name="threshold">Maximum allowable distance</param>
''' <returns>Int.MaxValue if threshhold exceeded; otherwise the Damerau-Leveshteim distance between the strings</returns>
Public Shared Function DamerauLevenshteinDistance(source As String, target As String, threshold As Integer) As Integer

    Dim length1 As Integer = source.Length
    Dim length2 As Integer = target.Length

    ' Return trivial case - difference in string lengths exceeds threshhold
    If Math.Abs(length1 - length2) > threshold Then
        Return Integer.MaxValue
    End If

    ' Ensure arrays [i] / length1 use shorter length 
    If length1 > length2 Then
        Swap(target, source)
        Swap(length1, length2)
    End If

    Dim maxi As Integer = length1
    Dim maxj As Integer = length2

    Dim dCurrent As Integer() = New Integer(maxi) {}
    Dim dMinus1 As Integer() = New Integer(maxi) {}
    Dim dMinus2 As Integer() = New Integer(maxi) {}
    Dim dSwap As Integer()

    For i As Integer = 0 To maxi
        dCurrent(i) = i
    Next

    Dim jm1 As Integer = 0, im1 As Integer = 0, im2 As Integer = -1

    For j As Integer = 1 To maxj

        ' Rotate
        dSwap = dMinus2
        dMinus2 = dMinus1
        dMinus1 = dCurrent
        dCurrent = dSwap

        ' Initialize
        Dim minDistance As Integer = Integer.MaxValue
        dCurrent(0) = j
        im1 = 0
        im2 = -1

        For i As Integer = 1 To maxi

            Dim cost As Integer = If(source(im1) = target(jm1), 0, 1)

            Dim del As Integer = dCurrent(im1) + 1
            Dim ins As Integer = dMinus1(i) + 1
            Dim [sub] As Integer = dMinus1(im1) + cost

            'Fastest execution for min value of 3 integers
            Dim min As Integer = If((del > ins), (If(ins > [sub], [sub], ins)), (If(del > [sub], [sub], del)))

            If i > 1 AndAlso j > 1 AndAlso source(im2) = target(jm1) AndAlso source(im1) = target(j - 2) Then
                min = Math.Min(min, dMinus2(im2) + cost)
            End If

            dCurrent(i) = min
            If min < minDistance Then
                minDistance = min
            End If
            im1 += 1
            im2 += 1
        Next
        jm1 += 1
        If minDistance > threshold Then
            Return Integer.MaxValue - 1
        End If
    Next

    Dim result As Integer = dCurrent(maxi)
    Return If((result > threshold), Integer.MaxValue, result)
End Function

Private Shared Sub Swap(Of T)(ByRef arg1 As T, ByRef arg2 As T)
    Dim temp As T = arg1
    arg1 = arg2
    arg2 = temp
End Sub

End Class

关于mysql - vb.net + mysql - 搜索表中与输入值最相似的前 5 行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43673316/

相关文章:

php - 如何从 Doctrine Fixture 引用中获取实体?

Android RelativeLayout 和 layout_width ="match_parent"的 child

c# - 如何将这一行转换成 vb.net

vb.net - 使用 VB.net 由 crlf 拆分

javascript - CSS 'translate(0px, 0px)' 元素样式的正则表达式匹配值

php - 当使用匹配语句使用转义斜杠时,Mysql 不返回结果

php 脚本回显了 php 的一部分而不是预期的内容

php - Codeigniter 连接多个 id 并显示到 View

jquery - 检查逗号分隔值中存在或不存在的值

mysql - 统计成员(member)id的记录数