python - 从字符串中删除变音符号以实现搜索功能

我正在用Django开发一个简单的网页，我需要实现搜索功能。我目前正在使用这样的东西:

search_box = request.GET['search_box']
X = Foo.objects.filter(Q(title__contains=search_box) | Q(info__contains=search_box)).values()

如果指定的列包含搜索字符串，它会检查我的数据库，但是如果我搜索“kočík”但我的数据库包含“kocik”怎么办。我如何从 Python 3 中的字符串中删除变音符号，或者实现它的最佳方法是什么？谢谢

最佳答案

你可以为此使用 unicodedata 包。

import unicodedata
def shave_marks(txt):
    """This method removes all diacritic marks from the given string"""
    norm_txt = unicodedata.normalize('NFD', txt)
    shaved = ''.join(c for c in norm_txt if not unicodedata.combining(c))
    return unicodedata.normalize('NFC', shaved)

关于这个算法的一些细节:

变音符号的主要问题是，在 UTF-8 中，一些组合字符会修改前面的字符，而另一些则包含在字符中。例如，'café' 和 'cafe/u0301' 看起来一样。

来自 https://docs.python.org/2/library/unicodedata.html :

Even if two unicode strings are normalized and look the same to a human reader, if one has combining characters and the other doesn’t, they may not compare equal.

该算法首先分解一个字符串(使用'NFD'方法)，使所有变音符号成为组合字符，然后过滤掉所有组合字符，最后组成字符串(使用'NFC'方法)。

关于python - 从字符串中删除变音符号以实现搜索功能，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34753821/

上一篇：Python:根据多个值+总和合并n个字典

下一篇：python - 当我尝试在 Docker 容器中运行 shell 脚本时无法打开文件

相关文章：

css - Django 提供在 CSS 文件中定义的静态图像

python - django user_passes_test 装饰器

javascript - 将自定义 Javascript 回调注入(inject) Jupyter Notebook 中的plot.ly 代码中

python - 为了清楚起见，如何在 pandas DataFrame 中 "name"列/行？

Python或bash : Merging two csv files based on several matching field values,格式化，输出CSV

python - Django:分组依据？

python - 获取 pandas 中包含字符串的列数

javascript - tf.loadModel 不是函数

python - Pandas ，比较不同形状的数据框列

python - 变量传递给 python 函数线程安全吗？