Elasticsearch 1.6
我想为包含连字符的文本编制索引,例如 U-12、U-17、WU-12、T 恤……并能够使用“简单查询字符串”查询来搜索它们。
数据样本(简化):
{"title":"U-12 Soccer",
"comment": "the t-shirts are dirty"}
由于已经有很多关于连字符的问题,我已经尝试了以下解决方案:
使用字符过滤器:ElasticSearch - Searching with hyphens in name .
所以我做了这个映射:
{
"settings":{
"analysis":{
"char_filter":{
"myHyphenRemoval":{
"type":"mapping",
"mappings":[
"-=>"
]
}
},
"analyzer":{
"default":{
"type":"custom",
"char_filter": [ "myHyphenRemoval" ],
"tokenizer":"standard",
"filter":[
"standard",
"lowercase"
]
}
}
}
},
"mappings":{
"test":{
"properties":{
"title":{
"type":"string"
},
"comment":{
"type":"string"
}
}
}
}
}
搜索是通过以下查询完成的:
{"_source":true,
"query":{
"simple_query_string":{
"query":"<Text>",
"default_operator":"AND"
}
}
}
有效的方法:
“U-12”、“U*”、“t*”、“ts*”
什么没用:
“U-*”、“u-1*”、“t-*”、“t-sh*”……
所以似乎没有对搜索字符串执行 char 过滤器? 我可以做些什么来完成这项工作?
最佳答案
答案很简单:
引自 Igor Motov:Configuring the standard tokenizer
By default the simple_query_string query doesn't analyze the words with wildcards. As a result it searches for all tokens that start with i-ma. The word i-mac doesn't match this request because during analysis it's split into two tokens i and mac and neither of these tokens starts with i-ma. In order to make this query find i-mac you need to make it analyze wildcards:
{
"_source":true,
"query":{
"simple_query_string":{
"query":"u-1*",
"analyze_wildcard":true,
"default_operator":"AND"
}
}
}
关于ElasticSearch - 使用连字符搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30917043/