algorithm - 查找包含所有不同输入字符串的连续子系列的最小长度

我有以下问题，我不确定如何处理。我想要一些帮助/提示来设计满足以下要求的高效算法

输入输入的第一行包含一个整数 N，它是序列的长度。接下来是 N 行，每行包含一个仅包含小写字符的字符串。 1<=N<=100000。每个字符串的长度在 1 到 10 之间(含)。

输出输出包含所有不同字符串的连续子系列的最小长度。

示例输入

6 随它去米洪米洪近身近身顺其自然

示例输出

解释:最后4个连续的字符串包含所有具有最小长度(最少字符数)的唯一字符串

最佳答案

如果我没理解错的话，你想要的子系列是:

包含至少 1 个“letitbe”、“mihon”和“omi”实例
具有尽可能小的字符串长度总和

以下是如何有效地执行此操作，使用 C# 编写代码，在注释中解释算法:

    static void Main(string[] args)
    {
        // Input
        var elements = new List<string> { "letitbe", "mihon", "mihon", "omi", "omi", "letitbe" };

        // Find distinct elements
        var distinctElements = elements.Distinct().ToList();

        // Create a dictionary that tells us how many copies of each element we have in the current subseries, initialize all values to 0
        var copiesOfElementInCurrentSubseries = distinctElements.ToDictionary(key => key, value => 0);

        // The sum of lengths of strings in the current subseries
        // Our goal is to minimize this
        var lengthOfCurrentSubseries = 0;

        // How many distinct elements are covered by the current subseries
        // The condition under which we minimize lengthOfCurrentSubseries is that numberOfElementsCoveredByCurrentSubseries equals distinctElements
        var numberOfElementsCoveredByCurrentSubseries = 0;

        // We remember the solution in these
        var bestStartIndex = 0;
        var bestLength = elements.Sum(e => e.Length);
        var bestNum = elements.Count;

        // Start with startIndex and endIndex at 0, increase endIndex until we cover all distinct elements
        // The subseries from startIndex to endIndex (inclusive) is our current subseries
        for (int startIndex = 0, endIndex = 0; endIndex < elements.Count; endIndex++)
        {
            // We add the element at endIndex to our current subseries:

            // If we found an element that previously wasn't covered, increase the count of covered elements
            // Note that we never decrease this, because once we find a solution that covers all elements, we never make a change which "loses" some element
            if (copiesOfElementInCurrentSubseries[elements[endIndex]] == 0)
            {
                numberOfElementsCoveredByCurrentSubseries++;
            }
            // Increase the number of copies of the element we added
            copiesOfElementInCurrentSubseries[elements[endIndex]]++;
            // Increase the total length of subseries by this element's length
            lengthOfCurrentSubseries += elements[endIndex].Length;

            // Initially, we will just loop increasing endIndex until all elements are covered
            // Once we are covering all elements, try to improve the solution
            if (numberOfElementsCoveredByCurrentSubseries == distinctElements.Count)
            {
                // Move startIndex to the right as far as possible while still covering all elements
                while (copiesOfElementInCurrentSubseries[elements[startIndex]] > 1)
                {
                    lengthOfCurrentSubseries -= elements[startIndex].Length;
                    copiesOfElementInCurrentSubseries[elements[startIndex]]--;
                    startIndex++;
                }

                // If the new solution is better, remember it
                if (lengthOfCurrentSubseries < bestLength)
                {
                    bestLength = lengthOfCurrentSubseries;
                    bestStartIndex = startIndex;
                    bestNum = endIndex - startIndex + 1;
                }
            }

            // Now we add another element by moving endIndex one place to the right, then try improving the solution by moving startIndex to the right, and we repeat this process...
        }

        Console.WriteLine(string.Join(" ", elements.Skip(bestStartIndex).Take(bestNum)));
    }

请注意，即使这有嵌套循环，内部 while 循环在内部循环的所有传递组合中最多可以有 length of input 总步数，因为 startIndex 保持其值并始终向右移动。

如果您不熟悉 C# - Dictionary 基本上是一个哈希表 - 它可以根据键有效地查找值(只要键具有良好的哈希函数，字符串就可以)。

关于algorithm - 查找包含所有不同输入字符串的连续子系列的最小长度，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33822447/

algorithm - 查找包含所有不同输入字符串的连续子系列的最小长度

上一篇：java - 努力理解这段代码如何输出欧几里得算法

下一篇：mysql - 比较多个字段的算法