我一直在尝试使用 ElasticSearch 为应用程序构建搜索模块。下面是我从其他 StackOverflow 帖子中阅读的示例代码构建的索引结构。
{
"megacorp4":{
"settings":{
"analysis":{
"analyzer":{
"my_analyzer":{
"type":"custom",
"tokenizer":"my_ngram_tokenizer",
"filter":[
"my_ngram_filter"
]
}
},
"filter":{
"my_ngram_filter":{
"type":"edgeNGram",
"min_gram":3,
"max_gram":15
}
},
"tokenizer":{
"my_ngram_tokenizer":{
"type":"edgeNGram",
"min_gram":3,
"max_gram":15
}
}
},
"mappings":{
"employee":{
"properties":{
"about":{
"type":"string",
"analyzer":"my_analyzer"
},
"age":{
"type":"long"
},
"first_name":{
"type":"string"
},
"interests":{
"type":"string",
"analyzer":"my_analyzer"
},
"last_name":{
"type":"string"
}
}
}
}
}
}
}
以下是我插入以测试搜索功能的记录
[
{
"first_name":"John",
"last_name":"Smith",
"age":25,
"about":"I love to go rock climbing",
"interests":[
"sports",
"music"
]
},
{
"first_name":"Douglas",
"last_name":"Fir",
"age":35,
"about":"I like to build album climb cabinets",
"interests":[
"forestry",
"music"
]
},
{
"first_name":"Jane",
"last_name":"Smith",
"age":32,
"about":"I like to collect rock albums",
"interests":[
"music"
]
}
]
我使用 API(通过 POSTMAN)和 Python 客户端对“关于”列进行了搜索,如下所示:
API查询:
localhost:9200/megacorp4/_search?q=climb
python 查询:
from elasticsearch import Elasticsearch
from pprint import pprint
es = Elasticsearch()
res = es.search(index="megacorp4", body={"query": {"match": {'about':"climb"}}})
pprint(res)
我只能获得完全匹配,并且在输出中没有得到“攀爬”的结果。但是,当我在查询中将 'climb' 替换为 'climb*' 时,我得到 2 条记录为 'climb' 和 'climbing'。我不想使用 '*' 通配符方法。
我也尝试过使用“english”、“standard”和“ngram”内置分析器,但似乎没有任何效果。
需要帮助以将关键字搜索为全文中的部分单词。
提前致谢。
最佳答案
请改用此映射:
删除测试
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_ngram_filter"
]
}
},
"filter": {
"my_ngram_filter": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 15
}
}
}
},
"mappings": {
"employee": {
"properties": {
"about": {
"type": "string",
"analyzer": "my_analyzer"
},
"age": {
"type": "long"
},
"first_name": {
"type": "string"
},
"interests": {
"type": "string",
"analyzer": "my_analyzer"
},
"last_name": {
"type": "string"
}
}
}
}
}
POST /test/employee/_bulk
{"index":{}}
{"first_name":"John","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"]}
{"index":{}}
{"first_name":"Douglas","last_name":"Fir","age":35,"about":"I like to build album climb cabinets","interests":["forestry","music"]}
{"index":{}}
{"first_name":"Jane","last_name":"Smith","age":32,"about":"I like to collect rock albums","interests":["music"]}
GET /test/_search?q=about:climb
GET /test/_search
{
"query": {
"query_string": {
"query": "about:climb"
}
}
}
GET /test/_search
{
"query": {
"match": {
"about": "climb"
}
}
}
两个变化:
settings
需要另一个右大括号部分 standard
标记器 对于
?q=climb
部分,默认情况下搜索 _all
使用 standard
分析的字段分析仪,而不是您的自定义分析仪。所以,正确的查询是
localhost:9200/megacorp4/_search?q=about:climb
.
关于elasticsearch - 部分词搜索 - ElasticSearch 1.7.2,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33037451/