java - 导致给定后缀数组的不同字符的最小数量

标签 java algorithm dynamic-programming suffix-array

给定一个 suffix array ,来自 SRM 630 的 TopCoder 任务要求找到字符串中不同字符的最小数量,这些字符可以形成具有给定后缀数组的字符串。 full problem statement can be found on the TopCoder website .

我找到的最佳解决方案就在这里:https://github.com/ftiasch/acm-icpc/blob/6db1ed02a727611830b974a1d4de38bab8f390f9/topcoder/single-round-match/single-round-match-630/SuffixArrayDiv1.java

这是ftiasch写的算法:

public int minimalCharacters(int[] array) {
    int n = array.length;
    int[] position = new int[n + 1];
    for (int i = 0; i < n; ++i) {
        position[array[i]] = i;
    }
    position[n] = -1;
    int[] minimum = new int[n + 1];
    for (int i = n - 1; i >= 0; --i) {
        minimum[i] = Integer.MAX_VALUE;
        for (int j = i + 1; j <= n; ++j) {
            boolean valid = true;
            for (int x = i; x < j; ++x) {
                for (int y = x + 1; y < j; ++y) {
                    valid &= position[array[x] + 1] < position[array[y] + 1];
                }
            }
            if (valid && minimum[j] < Integer.MAX_VALUE) {
                minimum[i] = Math.min(minimum[i], minimum[j] + 1);
            }
        }
    }
    return minimum[0];
}

我知道这是一个动态规划算法,但它是如何工作的?我真的需要帮助来理解它。

编辑

这是 ftiasch 给我的回信:

hi Ariel,

First of all, thanks to your compliment. Frankly speaking, my solution is not the best solution to the problem. The optimal one runs in O(n) time but mine takes O(n^4). I just picked this idea during the contest because n is relatively small.

Keep in mind that same characters become continuous in the SA. Since the problem asked for the least number of characters, so I decided to use dynamic programming to partition the SA into consecutive segments so that each segments start with the same character.

Which condition is necessary for S[SA[i]] == S[SA[j]] assumed that i < j? The lexicographic comparison suggests that suffix(SA[i] + 1) should be smaller than suffix(SA[j] + 1). We can easily find that the condition is also sufficient.

Write to me if you have any other question. :)

编辑1

感谢 David,我们终于成功了。这是来自 David 的 Python 版本的 java 中的线性时间算法:

public int minimalCharacters(int[] array) {
    int n = array.length, i;
    if (n == 0)
        return 0;
    int[] array1 = new int[n + 1];
    for (i = 0; i < n; i++)
        array1[1 + i] = array[i];
    int[] position = new int[n + 1];
    for (i = 0; i < n + 1; i++)
        position[array1[i]] = i;
    int k = 1;
    for (i = n; i > 1; i--) {
        if (position[array1[i] + 1] <= position[array1[i - 1] + 1])
            k++;
    }
    return k;
}

最佳答案

这是一个二次时间算法。后缀数组指定每个 一对后缀 他们如何按字典顺序比较(和空的 后缀总是小于所有的)。让s是未知字符串 并假设我们正在比较后缀 s[i...]带后缀 s[j...] . 如果s[i] != s[j] , 然后比较 s[i]s[j]解决它。 否则,结果与我们比较 s[i+1...] 相同。和 s[j+1...] .

假设我们希望确保s[i...] < s[j...] .显然我们需要 s[i] <= s[j] .事实上,除非 s[i+1...] < s[j+1...] ,我们需要 严格不等式s[i] < s[j] ,否则决胜局将进入 错误的方法。否则,s[i] == s[j]不管其余的就足够了 的字符串。将所有不等式收集为有向弧 顶点对应于 s 中的位置的图.这个图是 根据后缀的总顺序必然是非循环的。使每个弧长 如果不等式是严格的则为 1,否则长度为 0。输出长度 最长路径的加一(如果图形为空,则为零)。

相应的至少需要这么多不同的字母 不平等链。可能不太清楚的是,这么多 不同的字母就足够了,但是如果我们确定每个字母的标签 s 中的顶点/位置从那里开始的最长路径的长度, 然后适本地标记每个弧的头部和尾部。

为了得到线性时间,我们可以利用 图形。这很简单(虽然不是微不足道的;图表指标 毕竟)表明访问图中所有顶点的路径是 最长的,所以我们只需要计算它的长度。

以下是示例代码 ( minChars1 ) 的音译版本, 直接从上面的描述( minChars2 ,现在 剥离所有理解用法),一个蛮力解决方案 ( minChars3 ),以及线性时间解 ( minChars4 )。 导入 itertools

def minChars1(array):
    n = len(array)
    position = [-1] * (n + 1)
    for i in range(n):
        position[array[i]] = i
    infinity = n + 1
    minimum = [infinity] * (n + 1)
    minimum[n] = 0
    for i in range(n - 1, -1, -1):
        for j in range(i + 1, n + 1):
            valid = True
            for x in range(i, j):
                for y in range(x + 1, j):
                    valid = valid and position[array[x] + 1] < position[array[y] + 1]
            if valid and minimum[j] < infinity:
                minimum[i] = min(minimum[i], minimum[j] + 1)
    return minimum[0]


def lengthOfLongestPath(graph, memo, i):
    if i not in memo:
        result = 0
        for w, j in graph[i]:
            result = max(result, w + lengthOfLongestPath(graph, memo, j))
        memo[i] = result
    return memo[i]


def minChars2(array):
    n = len(array)
    position = [-1] * (n + 1)
    for i in range(n):
        position[array[i]] = i
    graph = {}
    for i in range(n):
        graph[i] = []
        for j in range(n):
            if position[i] > position[j]:
                w = 0 if position[i + 1] > position[j + 1] else 1
                graph[i].append((w, j))
    memo = {None: -1}
    for i in range(n):
        lengthOfLongestPath(graph, memo, i)
    return max(memo.values()) + 1


def minChars3(array):
    n = len(array)
    position = [None] * n
    for i in range(n):
        position[array[i]] = i
    for k in range(n):
        for s in itertools.product(range(k), repeat=n):
            valid = True
            for i in range(n):
                for j in range(n):
                    valid = valid and (s[i:] < s[j:]) == (position[i] < position[j])
            if valid:
                return k
    return n


def minChars4(array):
    n = len(array)
    if n == 0:
        return 0
    array1 = [n] * (n + 1)
    for i in range(n):
        array1[1 + i] = array[i]
    position = [None] * (n + 1)
    for i in range(n + 1):
        position[array1[i]] = i
    k = 1
    for i in range(n, 1, -1):
        if position[array1[i] + 1] <= position[array1[i - 1] + 1]:
            k += 1
    return k


def test():
    for n in range(7):
        for array in itertools.permutations(range(n)):
            assert minChars1(array) == minChars2(array) == minChars3(array) == minChars4(array)


test()

关于java - 导致给定后缀数组的不同字符的最小数量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25676463/

相关文章:

java - 表达式树类中的递归evaluate()

algorithm - 图表中的移动可达性

php - 不仅返回小写字母和数字,还返回符号和大写字母的哈希函数

python - 地下城游戏解决方案说明

arrays - 从只有 3 个有效移动的数组制作最大长度升序子数组

java - 限制在 Spring (Boot) 中调用方法的线程数

java - 如何在JAVA中检测文件的粘贴

algorithm - 最大产品前缀字符串

java - 如何从重定向网址获取授权码? OAuth2

algorithm - 如何停止进程并开始其他进程而不返回到 Arduino 中的初始进程