algorithm - 查找包含所有不同输入字符串的连续子系列的最小长度

标签 algorithm hash

我有以下问题,我不确定如何处理。我想要一些帮助/提示来设计满足以下要求的高效算法

输入 输入的第一行包含一个整数 N,它是序列的长度。 接下来是 N 行,每行包含一个仅包含小写字符的字符串。 1<=N<=100000。 每个字符串的长度在 1 到 10 之间(含)。

输出 输出包含所有不同字符串的连续子系列的最小长度。

示例输入

6 随它去 米洪 米洪 近身 近身 顺其自然

示例输出

18

解释:最后4个连续的字符串包含所有具有最小长度(最少字符数)的唯一字符串

最佳答案

如果我没理解错的话,你想要的子系列是:

  1. 包含至少 1 个“letitbe”、“mihon”和“omi”实例
  2. 具有尽可能小的字符串长度总和

以下是如何有效地执行此操作,使用 C# 编写代码,在注释中解释算法:

    static void Main(string[] args)
    {
        // Input
        var elements = new List<string> { "letitbe", "mihon", "mihon", "omi", "omi", "letitbe" };

        // Find distinct elements
        var distinctElements = elements.Distinct().ToList();

        // Create a dictionary that tells us how many copies of each element we have in the current subseries, initialize all values to 0
        var copiesOfElementInCurrentSubseries = distinctElements.ToDictionary(key => key, value => 0);

        // The sum of lengths of strings in the current subseries
        // Our goal is to minimize this
        var lengthOfCurrentSubseries = 0;

        // How many distinct elements are covered by the current subseries
        // The condition under which we minimize lengthOfCurrentSubseries is that numberOfElementsCoveredByCurrentSubseries equals distinctElements
        var numberOfElementsCoveredByCurrentSubseries = 0;

        // We remember the solution in these
        var bestStartIndex = 0;
        var bestLength = elements.Sum(e => e.Length);
        var bestNum = elements.Count;

        // Start with startIndex and endIndex at 0, increase endIndex until we cover all distinct elements
        // The subseries from startIndex to endIndex (inclusive) is our current subseries
        for (int startIndex = 0, endIndex = 0; endIndex < elements.Count; endIndex++)
        {
            // We add the element at endIndex to our current subseries:

            // If we found an element that previously wasn't covered, increase the count of covered elements
            // Note that we never decrease this, because once we find a solution that covers all elements, we never make a change which "loses" some element
            if (copiesOfElementInCurrentSubseries[elements[endIndex]] == 0)
            {
                numberOfElementsCoveredByCurrentSubseries++;
            }
            // Increase the number of copies of the element we added
            copiesOfElementInCurrentSubseries[elements[endIndex]]++;
            // Increase the total length of subseries by this element's length
            lengthOfCurrentSubseries += elements[endIndex].Length;

            // Initially, we will just loop increasing endIndex until all elements are covered
            // Once we are covering all elements, try to improve the solution
            if (numberOfElementsCoveredByCurrentSubseries == distinctElements.Count)
            {
                // Move startIndex to the right as far as possible while still covering all elements
                while (copiesOfElementInCurrentSubseries[elements[startIndex]] > 1)
                {
                    lengthOfCurrentSubseries -= elements[startIndex].Length;
                    copiesOfElementInCurrentSubseries[elements[startIndex]]--;
                    startIndex++;
                }

                // If the new solution is better, remember it
                if (lengthOfCurrentSubseries < bestLength)
                {
                    bestLength = lengthOfCurrentSubseries;
                    bestStartIndex = startIndex;
                    bestNum = endIndex - startIndex + 1;
                }
            }

            // Now we add another element by moving endIndex one place to the right, then try improving the solution by moving startIndex to the right, and we repeat this process...
        }

        Console.WriteLine(string.Join(" ", elements.Skip(bestStartIndex).Take(bestNum)));
    }

请注意,即使这有嵌套循环,内部 while 循环在内部循环的所有传递组合中最多可以有 length of input 总步数,因为 startIndex 保持其值并始终向右移动。

如果您不熟悉 C# - Dictionary 基本上是一个哈希表 - 它可以根据键有效地查找值(只要键具有良好的哈希函数,字符串就可以)。

关于algorithm - 查找包含所有不同输入字符串的连续子系列的最小长度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33822447/

相关文章:

algorithm - 如何在二值图像中找到连通分量?

hash - 使用哈希跟踪文件的唯一版本

perl - 仅用一个键在哈希中查找键名?

algorithm - BFS 和 DFS 算法有什么区别?

c++ - boost::ifind_first 与 std::string 对象

javascript - jQuery .trigger() 或 $(window) 在谷歌浏览器中不起作用

c - 如何在 C 中编写哈希函数?

algorithm - 哈希片段是否具有抗冲突性?

algorithm - 将堆栈实现为数组的成本分析?

algorithm - 处理评级系统的最佳方式