c++ - 我只关心文字的编辑距离

标签 c++ levenshtein-distance edit-distance

我想检查两个字符串之间插入/删除/编辑单词的距离。这与编辑距离类似,但我只关心单词,而不关心字符。例如:

“猫坐在垫子上” & “狗小心地坐在垫子上”

字距为 3。

我正在使用 Rosetta Code C++ 脚本来实现 levenshtein distance,但我不知道该怎么做。

#include <string>
#include <iostream>
using namespace std;

// Compute Levenshtein Distance
// Martin Ettl, 2012-10-05

size_t uiLevenshteinDistance(const std::string &s1, const std::string &s2)
{
  const size_t m(s1.size());
  const size_t n(s2.size());

  if( m==0 ) return n;
  if( n==0 ) return m;

  size_t *costs = new size_t[n + 1];

  for( size_t k=0; k<=n; k++ ) costs[k] = k;

  size_t i = 0;
  for ( std::string::const_iterator it1 = s1.begin(); it1 != s1.end(); ++it1, ++i )
  {
    costs[0] = i+1;
    size_t corner = i;

    size_t j = 0;
    for ( std::string::const_iterator it2 = s2.begin(); it2 != s2.end(); ++it2, ++j )
    {
      size_t upper = costs[j+1];
      if( *it1 == *it2 )
      {
          costs[j+1] = corner;
      }
      else
      {
        size_t t(upper<corner?upper:corner);
        costs[j+1] = (costs[j]<t?costs[j]:t)+1;
      }

      corner = upper;
    }
  }

  size_t result = costs[n];
  delete [] costs;

  return result;
}

int main()
{
    string s0 = "rosettacode";
        string s1 = "raisethysword";
    cout << "distance between " << s0 << " and " << s1 << " : " 
         << uiLevenshteinDistance(s0,s1) << std::endl;

        return 0;
}

最佳答案

好吧,因为是周末,所以这个在房子里:)

#include <iostream>
#include <sstream>
#include <string>
#include <vector>

typedef std::vector<std::string> Sentence;

Sentence &split(const std::string &s, char delim, Sentence &elems) {
  std::stringstream ss(s);
  std::string item;
  while (std::getline(ss, item, delim)) {
    elems.push_back(item);
  }
  return elems;
}

Sentence split(const std::string &s, char delim) {
  Sentence elems;
  split(s, delim, elems);
  return elems;
}

unsigned int edit_distance(const Sentence& s1, const Sentence& s2)
{
  const std::size_t len1 = s1.size(), len2 = s2.size();
  std::vector<std::vector<unsigned int>> d(len1 + 1, std::vector<unsigned int>(len2 + 1));

  d[0][0] = 0;
  for(unsigned int i = 1; i <= len1; ++i) d[i][0] = i;
  for(unsigned int i = 1; i <= len2; ++i) d[0][i] = i;

  for(unsigned int i = 1; i <= len1; ++i)
    for(unsigned int j = 1; j <= len2; ++j)
    {
      d[i][j] = std::min(d[i - 1][j] + 1, d[i][j - 1] + 1);
      d[i][j] = std::min(d[i][j], d[i - 1][j - 1] + (s1[i - 1] == s2[j - 1] ? 0 : 1));
    }
  return d[len1][len2];
}

int main(int argc, char *argv[])
{
  Sentence s1 = split("The cat sat on the mat", ' ');
  Sentence s2 = split("Dog sat carefully on the mat", ' ');

  std::cout << "Distance between sentences: " << edit_distance(s1, s2) << std::endl;

  return 0;
}

这会输出“3”,因为它应该......

关于c++ - 我只关心文字的编辑距离,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31625387/

相关文章:

c++ - 动态数组的时间复杂度和增长策略

java - 计算两个字符串之间的编辑距离

neo4j 编辑距离搜索

支持 Scott Meyer 建议的 C++ IDE : Prefer non-member non-friend functions over members

c++ - 如何在 CruiseControl 上运行 exe 文件

swift - Swift3 中的 Levenshtein 距离

sql - 模糊匹配SQL中的字符串

c - C 中的 Levenshtein 距离,所需内存为 O(m)

optimization - 优化 Levenshtein 距离算法

c++ - 在数组中查找 y 的 x 个连续值的最有效方法是什么?