string - 归一化编辑距离公式解释

标签 string algorithm levenshtein-distance edit-distance

基于本文: 关于 PAITERN 分析的 IEEE 交易:Computation of Normalized Edit Distance and Applications In this paper Normalized Edit Distance如下:

Given two strings X and Y over a finite alphabet, the normalized edit distance between X and Y, d( X , Y ) is defined as the minimum of W( P ) / L ( P )w, here P is an editing path between X and Y , W ( P ) is the sum of the weights of the elementary edit operations of P, and L(P) is the number of these operations (length of P).

enter image description here

我可以安全地将上面解释的归一化编辑距离算法翻译成这样吗:

normalized edit distance = 
levenshtein(query 1, query 2)/max(length(query 1), length(query 2))

最佳答案

您可能误解了指标。有两个问题:

归一化步骤是将编辑过程的权重W(P)除以L(P)，这是编辑过程的长度编辑过程，不要像你那样超过字符串的最大长度；
此外，该论文表明(示例 3.1)归一化编辑距离不能简单地用编辑距离计算。您可能需要实现他们的算法。

例3.1(c)的解释:

从aaab到abbb，论文使用了以下转换:

将a与a匹配；
跳过第一个字符串中的a；
跳过第一个字符串中的a；
跳过第二个字符串中的b；
跳过第二个字符串中的b；
匹配最后的 b。

这是 6 个操作，这就是为什么 L(P) 是 6；从 (a) 中的矩阵来看，匹配的成本为 0，跳过的成本为 2，因此我们的总成本为 0 + 2 + 2 + 2 + 2 + 0 = 8，这正是 W(P)，W(P)/L(P) = 1.33。对于 (b) 可以获得类似的结果，我将把它留给你作为练习:-)

关于string - 归一化编辑距离公式解释，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30787098/

上一篇：string - 百分比 diff b/t 两个不同长度的字符串

下一篇：java - 具有高效查询算法的层次数据结构

algorithm - Pollard Rho 分解方法

java - 在两个字符串上查找 Levenshtein 距离

asp.net - 在 ASP .NET 上的 DataList 项内的 LinkButton 中添加 Javascript 字符串值

linux - 如何让 grep 将任何字符视为常规字符

c# - 正则表达式或其他方法可从特定位置在C#中使用字符串

c - C 中的字符串输入/输出

c++ - 有没有更好更快的方法来只填充圆弧(以像素为单位)？

string - 什么是最适合用于比较电视节目标题的字符串距离算法？

swift - Swift3 中的 Levenshtein 距离