c# - 如何最有效(快速)匹配2个列表？

我有 2 lists<string>项目、来源和目标。源列表中的项目将在目标列表中有 0 到 n 个匹配，但不会有重复匹配。

考虑到两个列表都已排序，您将如何在性能方面最有效地进行匹配。

例子:

source = {"1", "2", "A", "B", ...}
target = {"1 - new music", "1 / classic", "1 | pop", "2 edit", "2 no edit", "A - sing", "B (listen)", ...}

基本上匹配是简单的前缀匹配，但是假设您有一个名为 MatchName 的方法.如果您要进行更优化的搜索，您可以使用新功能。 NameMatch只需比较 2 个字符串并返回一个 bool 值。

最后 source[0] 将有 source[0]。在这种情况下匹配包含 target[0, 1 和 2]。

最佳答案

我不确定这是否值得尝试优化很远。你可以用它来实现某种二进制搜索，但它的有效性会相当有限。我们在谈论多少元素？

目标中没有不匹配的元素

假设列表已排序，并且 target 中不存在无法与 source 匹配的元素:

static List<string>[] FindMatches(string[] source, string[] target)
{
    // Initialize array to hold results
    List<string>[] matches = new List<string>[source.Length];
    for (int i = 0; i < matches.Length; i++)
        matches[i] = new List<string>();

    int s = 0;
    for (int t = 0; t < target.Length; t++)
    {
        while (!MatchName(source[s], target[t]))
        {
            s++;
            if (s >= source.Length)
                return matches;
        }

        matches[s].Add(target[t]);
    }

    return matches;
}

具有不匹配的元素

如果 target 中存在的元素可能在 source 中没有匹配项，则上面的代码将中断(如果元素不在 target 的末尾) .要解决这个问题，最好使用不同的实现来进行比较。我们需要它返回“小于”、“等于”或“大于”，而不是 bool 值，就像在排序中使用的比较器一样:

static List<string>[] FindMatches(string[] source, string[] target)
{
    // Initialize array to hold results
    List<string>[] matches = new List<string>[source.Length];
    for (int i = 0; i < matches.Length; i++)
        matches[i] = new List<string>();

    int s = 0;
    for (int t = 0; t < target.Length; t++)
    {
        int m = CompareName(source[s], target[t]);
        if (m == 0)
        {
            matches[s].Add(target[t]);
        }
        else if (m > 0)
        {
            s++;
            if (s >= source.Length)
                return matches;
            t--;
        }
    }

    return matches;
}

static int CompareName(string source, string target)
{
    // Whatever comparison you need here, this one is really basic :)
    return target[0] - source[0];
}

两者在其他方面基本相同。如您所见，您循环遍历目标元素一次，当您不再找到匹配项时，将索引推进到源数组。

如果源元素的数量有限，可能值得进行更智能的搜索。如果源元素的数量也很大，那么预期的好处就会减少。

然后，在 Debug模式下，第一个算法在我的机器上用了 0.18 秒 100 万个目标元素。第二个甚至更快(0.03 秒)，但这是因为正在进行的比较更简单。可能您必须比较第一个空白字符之前的所有内容，这使得速度明显变慢。

关于c# - 如何最有效(快速)匹配2个列表？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1250998/

c# - 如何最有效(快速)匹配2个列表？

目标中没有不匹配的元素

具有不匹配的元素

上一篇：c# - 你调用的对象是空的。 - 默认值

下一篇：c# - 在 C# 中接收(拖放)删除 Outlook 联系人？