这是给我的面试问题:
Given a list of n strings with length l which at any given index can contain either the character 'a' or 'b', how would I go about figuring out the maximum difference (meaning the amount of characters that differ at each index) for each n string in the lowest possible time complexity.
An example of this would be ["abaab", "abbbb","babba"]
The output would be [5, 3, 5] as the difference between the first and third strings is 5 (none of the characters are the same), so the maximum difference between string 1 and any other string is 5. Similarly, the difference between string 2 and string 3 is 3, which is greater than the difference between string 2 and 1. And the difference between string 3 and string 1 is 5.
The limit for l is 20
我尝试了以下代码(其中 l 和 n 相应命名,strings 是字符串列表)。此代码中的变量是示例案例中的变量,但我想要一个通用的解决方案。
n = 3
l = 5
abmap = {}
strings = ["abaab", "abbbb","babba"]
for i in range(l):
abmap[i] = {"a": [], "b": []}
for i in range(n):
for j in range(l):
if strings[i][j] == "a":
abmap[j]["a"].append(i)
else:
abmap[j]["b"].append(i)
for string in strings:
differences = n * [0]
for i in range(l):
if string[i] == "a":
for index in abmap[i]["b"]:
differences[index] += 1
else:
for index in abmap[i]["a"]:
differences[index] += 1
print(max(differences))
但是,该解的复杂度为 O(n2·l)。面试官让我进一步优化(比如优化到O(l·n·log(n))。我该如何实现呢?
此时间限制为 15 秒,且 n 小于 100000。
最佳答案
正如我之前的回答(由于限制现在不再可行),我将每个字符串转换为 L 位数字。它们是 2L ≤ 220 可能的 L 位数字的子集。将其视为 2L 个节点的图,如果两个节点相差一位,则它们之间存在一条边,并且输入数字是这些节点的子集。
现在...对于任何输入数字,最远的输入数字是多少?我们可以通过查看反转数字(所有L位翻转,即距离L)并询问最接近输入数字来解决这个问题。因此,我们根据输入数字运行并行 BFS(广度优先搜索)。我们将它们标记为距离为 0。然后我们将所有数字标记为距离 2 一位变化。然后将所有数字标记为距离 2 一位变化。依此类推。最后,对于每个输入数字,我们查看倒数的距离,并从 L 中减去该距离。
基准结果,最坏情况需要大约 3 秒,远低于 15 秒的限制:
n=1000 L=20:
0.14 s solution1
2.63 s solution2
n=10000 L=20:
15.01 s solution1
3.01 s solution2
n=100000 L=20:
3.47 s solution2
完整代码(解决方案1是我的旧解决方案,解决方案2是我上面提供的):
def solution1(strings):
table = str.maketrans('ab', '01')
numbers = [
int(s.translate(table), 2)
for s in strings
]
return [
max((x ^ y).bit_count() for y in numbers)
for x in numbers
]
def solution2(strings):
table = str.maketrans('ab', '01')
numbers = [
int(s.translate(table), 2)
for s in strings
]
L = len(strings[0])
bits = [2**i for i in range(L)]
dist = [None] * 2**L
for x in numbers:
dist[x] = 0
horizon = numbers
d = 1
while horizon:
horizon = [
y
for x in horizon
for bit in bits
for y in [x ^ bit]
if dist[y] is None
for dist[y] in [d]
]
d += 1
return [L - dist[~x] for x in numbers]
funcs = solution1, solution2
import random
from time import time
# Generate random input
def gen(n, L):
return [
''.join(random.choices('ab', k=L))
for _ in range(n)
]
# Correctness
for _ in range(100):
strings = gen(100, 10)
expect = funcs[0](strings)
for f in funcs:
result = f(strings)
assert result == expect
# Speed
def test(n, L, funcs):
print(f'{n=} {L=}:')
for _ in range(1):
strings = gen(n, L)
expect = None
for f in funcs:
t = time()
result = f(strings)
print(f'{time()-t:5.2f} s ', f.__name__)
if expect is None:
result = expect
else:
assert result == expect
del result
print()
test(1000, 20, [solution1, solution2])
test(10000, 20, [solution1, solution2])
test(100000, 20, [solution2])
关于python - 查找列表中每个字符串之间的最大差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75850697/