python - 查找列表中每个字符串之间的最大差异

标签 python string algorithm performance

这是给我的面试问题:

Given a list of n strings with length l which at any given index can contain either the character 'a' or 'b', how would I go about figuring out the maximum difference (meaning the amount of characters that differ at each index) for each n string in the lowest possible time complexity.

An example of this would be ["abaab", "abbbb","babba"]

The output would be [5, 3, 5] as the difference between the first and third strings is 5 (none of the characters are the same), so the maximum difference between string 1 and any other string is 5. Similarly, the difference between string 2 and string 3 is 3, which is greater than the difference between string 2 and 1. And the difference between string 3 and string 1 is 5.

The limit for l is 20

我尝试了以下代码(其中 l 和 n 相应命名,strings 是字符串列表)。此代码中的变量是示例案例中的变量,但我想要一个通用的解决方案。

n = 3
l = 5
abmap = {}
strings = ["abaab", "abbbb","babba"]

for i in range(l):
    abmap[i] = {"a": [], "b": []}

for i in range(n):
    for j in range(l):
        if strings[i][j] == "a":
            abmap[j]["a"].append(i)

        else:
            abmap[j]["b"].append(i)

for string in strings:
    differences = n * [0]
    for i in range(l):
        if string[i] == "a":
            for index in abmap[i]["b"]:
                differences[index] += 1

        else:
            for index in abmap[i]["a"]:
                differences[index] += 1

    print(max(differences))

但是,该解的复杂度为 O(n2·l)。面试官让我进一步优化(比如优化到O(l·n·log(n))。我该如何实现呢?

此时间限制为 15 秒,且 n 小于 100000。

最佳答案

正如我之前的回答(由于限制现在不再可行),我将每个字符串转换为 L 位数字。它们是 2L ≤ 220 可能的 L 位数字的子集。将其视为 2L 个节点的图,如果两个节点相差一位,则它们之间存在一条边,并且输入数字是这些节点的子集。

现在...对于任何输入数字,最的输入数字是多少?我们可以通过查看反转数字(所有L位翻转,即距离L)并询问最接近输入数字来解决这个问题。因此,我们根据输入数字运行并行 BFS(广度优先搜索)。我们将它们标记为距离为 0。然后我们将所有数字标记为距离 2 一位变化。然后将所有数字标记为距离 2 一位变化。依此类推。最后,对于每个输入数字,我们查看倒数的距离,并从 L 中减去该距离。

基准结果,最坏情况需要大约 3 秒,远低于 15 秒的限制:

n=1000 L=20:
 0.14 s  solution1
 2.63 s  solution2

n=10000 L=20:
15.01 s  solution1
 3.01 s  solution2

n=100000 L=20:
 3.47 s  solution2

完整代码(解决方案1是我的旧解决方案,解决方案2是我上面提供的):

def solution1(strings):
    table = str.maketrans('ab', '01')
    numbers = [
        int(s.translate(table), 2)
        for s in strings
    ]
    return [
        max((x ^ y).bit_count() for y in numbers)
        for x in numbers
    ]

def solution2(strings):
    table = str.maketrans('ab', '01')
    numbers = [
        int(s.translate(table), 2)
        for s in strings
    ]
    L = len(strings[0])
    bits = [2**i for i in range(L)]
    dist = [None] * 2**L
    for x in numbers:
        dist[x] = 0
    horizon = numbers
    d = 1
    while horizon:
        horizon = [
            y
            for x in horizon
            for bit in bits
            for y in [x ^ bit]
            if dist[y] is None
            for dist[y] in [d]
        ]
        d += 1
    return [L - dist[~x] for x in numbers]

funcs = solution1, solution2

import random
from time import time

# Generate random input
def gen(n, L):
    return [
        ''.join(random.choices('ab', k=L))
        for _ in range(n)
    ]

# Correctness
for _ in range(100):
    strings = gen(100, 10)
    expect = funcs[0](strings)
    for f in funcs:
        result = f(strings)
        assert result == expect

# Speed
def test(n, L, funcs):
    print(f'{n=} {L=}:')

    for _ in range(1):
        strings = gen(n, L)
        expect = None
        for f in funcs:
            t = time()
            result = f(strings)
            print(f'{time()-t:5.2f} s ', f.__name__)
            if expect is None:
                result = expect
            else:
                assert result == expect
            del result
    print()

test(1000, 20, [solution1, solution2])
test(10000, 20, [solution1, solution2])
test(100000, 20, [solution2])

Attempt This Online!

关于python - 查找列表中每个字符串之间的最大差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75850697/

相关文章:

c# - 找到包含查询数组所有元素的输入数组的最小窗口

python - 寻找数字编码分类变量之间的相关性?

python - 未绑定(bind)本地错误 : local variable 'y' referenced before assignment

python - 从列表中创建所有可能的组合

c++ - 有趣的问题(货币套利)

Java 等价于 Python 的 "Got value: %s"% 变量?

python - 从 Python 中的 ISO 周数获取日期

python - 在 Pandas 数据框中分隔 'networks'

java - 检测全角和半角的空白 : regex VS Character. isWhitespace()

c++ - Strtol 第二个参数