string - 查找给定字符串的显性循环子串

我正在寻找一种算法来查找给定字符串的主要循环子字符串。

循环子串:

相邻重复两次或多次的子字符串。

显性循环子串:

相邻重复次数最多的子串占主导地位
(相邻重复发生的次数相同)
- 最长的子串占主导地位
(关于长度和相邻重复的关系)
- 首先出现的子字符串占主导地位

示例 1:

prefixgarbagecyclecyclecyclecyclesufixgarbage
返回cycle :=> cycle是重复次数最多的相邻子串

示例 2:

prefixgarbagecyclepadinggarbagecyclesufixgarbage
返回g :=> cycle 的出现相邻不重复，g相邻重复两次

示例 3:

prefixgarbagecyclecyclepadinggarbageprefixgarbage
返回cycle :=> cycle & g相邻重复两次但 cycle比 g 更长

示例 4:

prefixgarbagecyclecyclecycleroundroundroundprefixgarbage
返回cycle :=> cycle & round相邻重复三次且长度相同但 cycle首先出现

示例 5:

abcdefghijklmnopqrstuvqxyz
返回<empty string>因为没有重复的相邻子串

实现该算法的最佳方法是什么？

最佳答案

找不到比这个二次时间算法更好的算法(用 Python 实现):

IREP, IPER, IPOS = 0, 1, 2

def find_dominant(src):
  best = (0, 0, 0) # repetitions-1, period, position

  period = 0
  while period < len(src) // max(2, 1 + best[IREP]):
    period += 1
    length = 0

    for pos in range(len(src) - 1 - period, -1, -1):
      if src[pos] == src[pos + period]:
        length += 1
        repetitions = length // period
        if repetitions >= best[IREP]:
          best = (repetitions, period, pos)
      else:
        length = 0

  return best


s = "prefixgarbagecyclecyclecyclecyclesufixgarbage"
res = find_dominant(s)
if res[0] == 0:
  print("nothing found")
else:
  print(res[IREP] + 1, '*', s[res[IPOS]: res[IPOS] + res[IPER]])

对于每个可能的周期，扫描字符串并记住最长的周期子序列。向后扫描以检查更少的条件。当没有发现进一步的改善时，停止增加周期。

时间复杂度为 O(N²/R)，其中 R 是主导子串的重复次数。空间复杂度为O(1)。

关于string - 查找给定字符串的显性循环子串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18792877/

string - 查找给定字符串的显性循环子串

上一篇：performance - 我如何比较 (3/2)^n 和 (log n)^(log n) 的增长率？(以 2 为底)。我尝试了极限测试，L'Hospital 的规则，但无济于事

下一篇：algorithm - Scala 代码 - 不可变集问题