我对 SequenceMatcher
返回的两个不同答案感到有点困惑取决于参数的顺序。为什么会这样?
例子
SequenceMatcher 不可交换:
>>> from difflib import SequenceMatcher
>>> SequenceMatcher(None, "Ebojfm Mzpm", "Ebfo ef Mfpo").ratio()
0.6086956521739131
>>> SequenceMatcher(None, "Ebfo ef Mfpo", "Ebojfm Mzpm").ratio()
0.5217391304347826
最佳答案
SequenceMatcher.ratio
内部使用 SequenceMatcher.get_matching_blocks
来计算比率,我将引导您完成这些步骤以了解这是如何发生的:
SequenceMatcher.get_matching_blocks
Return list of triples describing matching subsequences. Each triple is of the form
(i, j, n)
, and means thata[i:i+n] == b[j:j+n]
. The triples are monotonically increasing ini
andj
.The last triple is a dummy, and has the value
(len(a), len(b), 0)
. It is the only triple withn == 0
.If (i, j, n)
and(i', j', n')
are adjacent triples in the list, and the second is not the last triple in the list, theni+n != i'
orj+n != j'
; in other words, adjacent triples always describe non-adjacent equal blocks.
ratio
在内部使用 SequenceMatcher.get_matching_blocks
的结果,并对 SequenceMatcher.get_matching_blocks
返回的所有匹配序列的大小求和。这是来自 difflib.py
的确切源代码:
matches = sum(triple[-1] for triple in self.get_matching_blocks())
上面一行很关键,因为上面表达式的结果是用来计算比率的。我们很快就会看到这一点,以及它如何影响比率的计算。
>>> m1 = SequenceMatcher(None, "Ebojfm Mzpm", "Ebfo ef Mfpo")
>>> m2 = SequenceMatcher(None, "Ebfo ef Mfpo", "Ebojfm Mzpm")
>>> matches1 = sum(triple[-1] for triple in m1.get_matching_blocks())
>>> matches1
7
>>> matches2 = sum(triple[-1] for triple in m2.get_matching_blocks())
>>> matches2
6
如您所见,我们有 7 和 6。这些只是 get_matching_blocks
返回的匹配子序列的总和。为什么这很重要?这就是为什么,比率是按以下方式计算的,(这来自 difflib
源代码):
def _calculate_ratio(matches, length):
if length:
return 2.0 * matches / length
return 1.0
length
是 len(a) + len(b)
其中 a
是第一个序列,b
作为第二个序列。
好了,废话少说,我们需要行动:
>>> length = len("Ebojfm Mzpm") + len("Ebfo ef Mfpo")
>>> m1.ratio()
0.6086956521739131
>>> (2.0 * matches1 / length) == m1.ratio()
True
m2
类似:
>>> 2.0 * matches2 / length
0.5217391304347826
>>> (2.0 * matches2 / length) == m2.ratio()
True
注意:并非所有 SequenceMatcher(None a,b).ratio() == SequenceMatcher(None b,a).ratio()
都是 False
,有时它们可以是 True
:
>>> s1 = SequenceMatcher(None, "abcd", "bcde").ratio()
>>> s2 = SequenceMatcher(None, "bcde", "abcd").ratio()
>>> s1 == s2
True
如果你想知道为什么,这是因为
sum(triple[-1] for triple in self.get_matching_blocks())
对于 SequenceMatcher(None, "abcd", "bcde")
和 SequenceMatcher(None, "bcde", "abcd")
是相同的 < strong>3.
关于python - Python 的 SequenceMatcher 是如何工作的?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35517353/