python - 考虑删除获取接近的字符串匹配 - python

有没有办法让difflib在字符串匹配时考虑删除？

我试过 difflib.get_close_matches() 但它不考虑接近匹配输出中长度较短的字符串。例如。

from difflib import get_close_matches as gcm

x = """Erfreulich
Erfreuliche
Erfreulicher
Erfreulicherem
Erfreulicheres
Erfreulicherweis
Erfreulicherweise
Erfreuliches
Erfreulichste"""

x = [i for i in x.split("\n")]

for i in x:
  print i, gcm(i,x)

输出:

Erfreulich ['Erfreulich', 'Erfreuliche', 'Erfreuliches']
Erfreuliche ['Erfreuliche', 'Erfreuliches', 'Erfreulicher']
Erfreulicher ['Erfreulicher', 'Erfreuliche', 'Erfreulicheres']
Erfreulicherem ['Erfreulicherem', 'Erfreulicheres', 'Erfreulicher']
Erfreulicheres ['Erfreulicheres', 'Erfreulicherweis', 'Erfreulicherem']
Erfreulicherweis ['Erfreulicherweis', 'Erfreulicherweise', 'Erfreulicheres']
Erfreulicherweise ['Erfreulicherweise', 'Erfreulicherweis', 'Erfreulicheres']
Erfreuliches ['Erfreuliches', 'Erfreuliche', 'Erfreulicheres']
Erfreulichste ['Erfreulichste', 'Erfreuliche', 'Erfreuliches']

请注意，对于字符串 Erfreulicher，Erfreulich 不被视为接近匹配，尽管距离仅为 -1。

最佳答案

来自documentation ，可以增加 n 参数以获得更多匹配。有些单词较短，因此 difflib 确实考虑了删除。

difflib.get_close_matches(word, possibilities[, n][, cutoff])
Return a list of the best “good enough” matches. word is a sequence for which close matches are desired (typically a string), and possibilities is a list of sequences against which to match word (typically a list of strings).

Optional argument n (default 3) is the maximum number of close matches to return; n must be greater than 0.

Optional argument cutoff (default 0.6) is a float in the range [0, 1]. Possibilities that don’t score at least that similar to word are ignored.

The best (no more than n) matches among the possibilities are returned in a list, sorted by similarity score, most similar first.

这里是 gcm(i,x,6) 的同一个词:

Erfreulicher ['Erfreulicher', 'Erfreuliche', 'Erfreulicheres', 'Erfreulicherem',
              'Erfreuliches', 'Erfreulich']

关于python - 考虑删除获取接近的字符串匹配 - python，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19637646/

python - 考虑删除获取接近的字符串匹配 - python

上一篇： python / NumPy : How do you assign the end+1 element of an array similar to how it's done in Matlab?

下一篇：python - pandas.concat 和 numpy.append 的大数据集内存错误