我正在使用django_elasticsearch_dsl。
我的文件:
html_strip = analyzer(
'html_strip',
tokenizer='standard',
filter=["lowercase", "stop", "snowball"],
char_filter=["html_strip"]
)
class Document(django_elasticsearch_dsl.Document):
name = TextField(
analyzer=html_strip,
fields={
'raw': fields.KeywordField(),
'suggest': fields.CompletionField(),
}
)
...
我的请求:_search = Document.search().suggest("suggestions", text=query, completion={'field': 'name.suggest'}).execute()
我将以下文档“名称”编入索引:"This is a test"
"this is my test"
"this test"
"Test this"
现在,如果搜索This is my text
只会收到"this is my text"
但是,如果我搜索test
,那么我得到的只是"Test this"
即使我想要所有文档,它们的名称中都带有test
。我想念什么?
最佳答案
Based on the comment given by the user, adding another answer using ngrams
添加带有索引映射,索引数据,搜索查询和搜索结果的工作示例
索引映射:
{
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
"min_gram": 4,
"max_gram": 20
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"ngram_filter"
]
}
}
},
"max_ngram_diff": 50
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "ngram_analyzer",
"search_analyzer": "standard"
}
}
}
}
索引数据:{
"name": [
"Test this"
]
}
{
"name": [
"This is a test"
]
}
{
"name": [
"this is my test"
]
}
{
"name": [
"this test"
]
}
分析API:POST/_analyze
{
"analyzer" : "ngram_analyzer",
"text" : "this is my test"
}
生成以下 token :{
"tokens": [
{
"token": "this",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "test",
"start_offset": 11,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 3
}
]
}
搜索查询:{
"query": {
"match": {
"name": "test"
}
}
}
搜索结果:"hits": [
{
"_index": "stof_64281341",
"_type": "_doc",
"_id": "4",
"_score": 0.2876821,
"_source": {
"name": [
"Test this"
]
}
},
{
"_index": "stof_64281341",
"_type": "_doc",
"_id": "3",
"_score": 0.2876821,
"_source": {
"name": [
"this is my test"
]
}
},
{
"_index": "stof_64281341",
"_type": "_doc",
"_id": "2",
"_score": 0.2876821,
"_source": {
"name": [
"This is a test"
]
}
},
{
"_index": "stof_64281341",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"name": [
"this test"
]
}
}
]
对于模糊搜索,您可以使用以下搜索查询:{
"query": {
"fuzzy": {
"name": {
"value": "tst" <-- used tst in place of test
}
}
}
}
关于python - ElasticSearch建议者全文搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64281341/