python - 如何更正用于比较包含按键的两个字符串的算法？

这是如果两个字符串相等则返回 true 的算法。该字符串可能包含退格键等按键。该代码使用光标和指针遍历字符串中的每个字母，如果找到按键(即\b)则跳过 2 个位置

#!/usr/bin/env python
import argparse
import __builtin__

# Given two different strings, one with backspaces (keypresses), find if they are equivalent or not

def main():
    parser = argparse.ArgumentParser(description="Enter two strings without or without backspaces")
    parser.add_argument("s1", type=str, help="The first string.")
    parser.add_argument("s2", type=str, help="The second string.")
    args = parser.parse_args()
    print(compare(args.s1, args.s2))

def compare(s1, s2):
    BACKSPACE = '\b'
    cursor = 0;
    pointer1 = 0; pointer2 = 0; # current position in backspaced string. 

    canon_len1 = len(s1); canon_len2 = len(s2); # length of the canonical string

    num_diff = 0
    while True:
        if s1[pointer1] == BACKSPACE or s2[pointer2] == BACKSPACE:
            # decrement the cursor and undo the previous compare
            cursor -= 1; 
            if s1[cursor] != s2[cursor]:
                num_diff -= 1
            # decrement the canonical lengths appropriately
            canon_len1 -= 2 if s1[pointer1] == BACKSPACE else 0
            canon_len2 -= 2 if s2[pointer2] == BACKSPACE else 0
        else:

            if s1[pointer1] != s2[pointer2]:
                num_diff += 1
            cursor += 1

        # increment the pointers, making sure we don't run off then end 
        pointer1 += 1; pointer2 += 1;
        if pointer1 == len(s1) and pointer2 == len(s2):
            break
        if pointer1 == len(s1): pointer1 -= 1
        if pointer2 == len(s2): pointer2 -= 1

    return num_diff == 0 and canon_len1 == canon_len2

if __name__ == "__main__":
    main()

#!/usr/bin/env python

import compare_strings
import unittest

class compare_strings_test(unittest.TestCase):

    def test_01(self):
        raised = False
        try:
            compare_strings.compare('Toronto', 'Cleveland')
        except:
            raised = True
        self.assertFalse(raised, 'Exception raised')

    def test_02(self):
        equivalent = compare_strings.compare('Toronto', 'Cleveland')
        self.assertEquals(equivalent, False)

    def test_03(self):
        equivalent = compare_strings.compare('Toronto', 'Toroo\b\bnto')
        self.assertEquals(equivalent, False)

    def test_04(self):
        equivalent = compare_strings.compare('Toronto', 'Torooo\b\bntt\bo')
        self.assertEquals(equivalent, True)

if __name__ == "__main__":
    unittest.main()

...F
======================================================================
FAIL: test_04 (__main__.compare_strings_test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "compare_strings_test.py", line 26, in test_04
    self.assertEquals(equivalent, True)
AssertionError: False != True

----------------------------------------------------------------------
Ran 4 tests in 0.001s

测试 4 失败，但 'Toronto' 和 'Torooo\b\bntt\bo' 应该是等价的减去退格键

最佳答案

我认为您当前代码中的问题源于这样一个事实，即您可以连续使用多个退格键，但您只能回头看“一个”字符。 (我在这一点上可能是错的，我没有使用 pdb 单步执行代码。)

正如评论中所建议的那样，解决此问题的一种不错的方法是将其分为以下两部分。

规范化/规范化两个输入字符串。这意味着一次处理一个，从每个字符串中去除退格键和相关的前一个字符。
比较两个规范化的字符串。

第 2 步很简单，只需使用内置的字符串比较方法(python 中的==)即可。

第 1 步有点困难，因为您可能在输入字符串的一行中有多个退格键。处理这个问题的一种方法是每次一个字符构建一个新字符串，并在每个退格键上删除最后添加的字符。这是一些示例代码。

def canonicalize(s):
    normalized_s = ""
    for i, c in enumerate(s):
        # Check for a backspace, taking care not to run off the end of the string.
        if c == BACKSPACE:
            normalized_s = normalized_s[:-1]
        else:
            normalized_s += c

    return normalized_s

这种方法的一个很好的副作用是前导退格不会导致任何错误，它们会被忽略。稍后我将尝试在其他实现中保留此属性。这段代码使用像 c++ 这样可以修改字符串的语言，可以相当容易地提高效率，因为它类似于将指针和条目更改为 char 数组。

在 python 中，每次编辑都会创建一个新字符串(或者至少，不能保证它不会分配一个新字符串)。我认为注意你自己的堆栈(也就是一个由字符组成的数组，指针指向末尾)可以产生更好的代码。有几种方法可以在 python 中管理堆栈，最熟悉的是列表，另一个不错的选择是 collections.deque。除非剖析器另有说明，否则我会使用更熟悉的列表。

def canonicalize(s):
    normalized_s = list()
    for c in s:
        # Check for a backspace, taking care not to run off the end of the string.
        if c == BACKSPACE:
            if normalized_s:
                normalized_s.pop()
        else:
            normalized_s.append(c)

    return "".join(normalized_s)

最终的比较方法可能类似于

def compare(s1, s2):
    return canonicalize(s1) == canonlicalize(s2)

上面的代码有两个问题。首先是几乎可以保证创建两个新字符串。第二个是它总共需要遍历四次字符串，每个输入字符串一次，每个清理后的字符串一次。

这可以通过向后而不是向前进行改进。通过向后迭代，您可以看到退格键，并提前知道哪些字符将被删除(忽略或跳过读取)。我们继续前进，直到出现不匹配，或者至少有一个字符串用完了字符。这种方法需要更多的簿记，但不需要额外的空间。它仅使用两个指针来跟踪每个字符串的当前进度，并使用一个计数器来跟踪要忽略的字符数。下面显示的代码不是特别 pythonic，它可以做得更好。如果您要使用(两个)生成器和一个 izip_longest，则可以去除所有样板。

def compare(s1, s2):
    i, j = len(s1) - 1, len(s2) - 1

    while i >= 0 or j >= 0:
        ignore = 0
        while i >= 0:
            if s1[i] == BACKSPACE:
                ignore += 1
            elif ignore > 0:
                ignore -= 1
            else:
                break
            i -= 1

        ignore = 0
        while j >= 0:
            if s2[j] == BACKSPACE:
                ignore += 1
            elif ignore > 0:
                ignore -= 1
            else:
                break
            j -= 1

        if i < 0 and j < 0:
            # No more characters to try and match
            return True

        if (i < 0 and j >= 0) or (i >= 0 and j < 0):
            # One string exhausted before the other
            return False

        if s1[i] != s2[j]:
            return False

        i -= 1
        j -= 1

    return True

编辑

这是我为上次执行比较而尝试的一些测试用例。

true_testcases = (
    ("abc", "abc"),
    ("abc", "abcde\b\b"),
    ("abcdef", "\b\babcdef\bf"),
    ("", "\b\b\b"),
    ("Toronto", "Torooo\b\bntt\bo"))

false_testcases = (
    ("a", "a\b"),
    ("a", "a\b\b"),
    ("abc", "abc\bd\be"),
)

print([eq(s1, s2) for s1, s2 in true_testcases])
print([eq(s1, s2) for s1, s2 in false_testcases])

关于python - 如何更正用于比较包含按键的两个字符串的算法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54659687/

python - 如何更正用于比较包含按键的两个字符串的算法？

上一篇：python - 任何 python 包来加速循环计算？

下一篇：algorithm - 分而治之关系的时间复杂度