有没有办法让difflib在字符串匹配时考虑删除?
我试过 difflib.get_close_matches()
但它不考虑接近匹配输出中长度较短的字符串。例如。
from difflib import get_close_matches as gcm
x = """Erfreulich
Erfreuliche
Erfreulicher
Erfreulicherem
Erfreulicheres
Erfreulicherweis
Erfreulicherweise
Erfreuliches
Erfreulichste"""
x = [i for i in x.split("\n")]
for i in x:
print i, gcm(i,x)
输出:
Erfreulich ['Erfreulich', 'Erfreuliche', 'Erfreuliches']
Erfreuliche ['Erfreuliche', 'Erfreuliches', 'Erfreulicher']
Erfreulicher ['Erfreulicher', 'Erfreuliche', 'Erfreulicheres']
Erfreulicherem ['Erfreulicherem', 'Erfreulicheres', 'Erfreulicher']
Erfreulicheres ['Erfreulicheres', 'Erfreulicherweis', 'Erfreulicherem']
Erfreulicherweis ['Erfreulicherweis', 'Erfreulicherweise', 'Erfreulicheres']
Erfreulicherweise ['Erfreulicherweise', 'Erfreulicherweis', 'Erfreulicheres']
Erfreuliches ['Erfreuliches', 'Erfreuliche', 'Erfreulicheres']
Erfreulichste ['Erfreulichste', 'Erfreuliche', 'Erfreuliches']
请注意,对于字符串 Erfreulicher
,Erfreulich
不被视为接近匹配,尽管距离仅为 -1。
最佳答案
来自documentation ,可以增加 n
参数以获得更多匹配。有些单词较短,因此 difflib
确实考虑了删除。
difflib.get_close_matches(word, possibilities[, n][, cutoff])
Return a list of the best “good enough” matches. word is a sequence for which close matches are desired (typically a string), and possibilities is a list of sequences against which to match word (typically a list of strings).Optional argument n (default 3) is the maximum number of close matches to return; n must be greater than 0.
Optional argument cutoff (default 0.6) is a float in the range [0, 1]. Possibilities that don’t score at least that similar to word are ignored.
The best (no more than n) matches among the possibilities are returned in a list, sorted by similarity score, most similar first.
这里是 gcm(i,x,6)
的同一个词:
Erfreulicher ['Erfreulicher', 'Erfreuliche', 'Erfreulicheres', 'Erfreulicherem',
'Erfreuliches', 'Erfreulich']
关于python - 考虑删除获取接近的字符串匹配 - python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19637646/