python - 考虑删除获取接近的字符串匹配 - python

标签 python string pattern-matching string-matching difflib

有没有办法让difflib在字符串匹配时考虑删除?

我试过 difflib.get_close_matches() 但它不考虑接近匹配输出中长度较短的字符串。例如。

from difflib import get_close_matches as gcm

x = """Erfreulich
Erfreuliche
Erfreulicher
Erfreulicherem
Erfreulicheres
Erfreulicherweis
Erfreulicherweise
Erfreuliches
Erfreulichste"""

x = [i for i in x.split("\n")]

for i in x:
  print i, gcm(i,x)

输出:

Erfreulich ['Erfreulich', 'Erfreuliche', 'Erfreuliches']
Erfreuliche ['Erfreuliche', 'Erfreuliches', 'Erfreulicher']
Erfreulicher ['Erfreulicher', 'Erfreuliche', 'Erfreulicheres']
Erfreulicherem ['Erfreulicherem', 'Erfreulicheres', 'Erfreulicher']
Erfreulicheres ['Erfreulicheres', 'Erfreulicherweis', 'Erfreulicherem']
Erfreulicherweis ['Erfreulicherweis', 'Erfreulicherweise', 'Erfreulicheres']
Erfreulicherweise ['Erfreulicherweise', 'Erfreulicherweis', 'Erfreulicheres']
Erfreuliches ['Erfreuliches', 'Erfreuliche', 'Erfreulicheres']
Erfreulichste ['Erfreulichste', 'Erfreuliche', 'Erfreuliches']

请注意,对于字符串 ErfreulicherErfreulich 不被视为接近匹配,尽管距离仅为 -1。

最佳答案

来自documentation ,可以增加 n 参数以获得更多匹配。有些单词较短,因此 difflib 确实考虑了删除。

difflib.get_close_matches(word, possibilities[, n][, cutoff])
Return a list of the best “good enough” matches. word is a sequence for which close matches are desired (typically a string), and possibilities is a list of sequences against which to match word (typically a list of strings).

Optional argument n (default 3) is the maximum number of close matches to return; n must be greater than 0.

Optional argument cutoff (default 0.6) is a float in the range [0, 1]. Possibilities that don’t score at least that similar to word are ignored.

The best (no more than n) matches among the possibilities are returned in a list, sorted by similarity score, most similar first.

这里是 gcm(i,x,6) 的同一个词:

Erfreulicher ['Erfreulicher', 'Erfreuliche', 'Erfreulicheres', 'Erfreulicherem',
              'Erfreuliches', 'Erfreulich']

关于python - 考虑删除获取接近的字符串匹配 - python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19637646/

相关文章:

arrays - 当与 Delphi 中的 SetLength 一起使用时,动态数组的内存中有什么?

c# - 选择字符串中的上一个和下一个单词

rust - 在比赛中使用ref和与非引用匹配之间是否有区别?

python - 谷歌应用引擎上的联系表

python - 将数据从 Django View 传递到 D3

javascript - jQuery 中的模式匹配?

javascript - 您将如何在 TypeScript 中进行模式匹配?

javascript - 在 JavaScript 中使用 $1 作为变量

python - 算法 : selecting points from a list

python - 为什么简单梯度下降会发散?