我在为 .net 寻找最接近匹配字符串的实现时遇到问题
我想匹配一个字符串列表,例如:
输入字符串:“Publiczna Szkoła Podstawowa im.Bolesława Chrobrego wąsoszu”
字符串列表:
Publiczna Szkoła Podstawowa im。 B. Chrobrego wąsoszu
Szkoła Podstawowa Specjalna
Szkoła Podstawowa im.Henryka Sienkiewicza 和 Wąsoszu
Szkoła Podstawowa im。 Romualda Traugutta wąsoszu Górnym
这显然需要与“Publiczna Szkoła Podstawowa im. B. Chrobrego wąsoszu”相匹配。
.net 有哪些可用的算法?
最佳答案
Edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.
Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.
Fast, memory efficient Levenshtein algorithm
using System;
/// <summary>
/// Contains approximate string matching
/// </summary>
static class LevenshteinDistance
{
/// <summary>
/// Compute the distance between two strings.
/// </summary>
public static int Compute(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
// Step 1
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
// Step 6
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
}
class Program
{
static void Main()
{
Console.WriteLine(LevenshteinDistance.Compute("aunt", "ant"));
Console.WriteLine(LevenshteinDistance.Compute("Sam", "Samantha"));
Console.WriteLine(LevenshteinDistance.Compute("flomax", "volmax"));
}
}
关于c# - 在字符串列表中查找与输入字符串最接近的匹配项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13793560/