我想在查询中实现同义词和停用词过滤器。为此,我创建了两个分析器,并且两个分析器都可以单独正常工作。但是我要同时使用它们,怎么办?
GET my_index/_search/
{
"query": {
"match": {
"_all": {
"query": "Good and Bad",
"analyzer": [
"stop_analyzer",
"synonym"
]
}
}
}
}
上面的查询抛出一个错误:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[match] unknown token [START_ARRAY] after [analyzer]",
"line": 6,
"col": 26
}
],
"type": "parsing_exception",
"reason": "[match] unknown token [START_ARRAY] after [analyzer]",
"line": 6,
"col": 26
},
"status": 400
}
我想我不能在那里使用数组或对象,因为当我使用像
"analyzer": "stop_analyzer"
或"analyzer": "synonym"
这样的单个分析器时,它的效果很好。所以我的问题是如何同时使用两者?
最佳答案
您可以定义一个custom analyzer,它可以将这两个简单的分析器组合为一个复杂的组件。
定义自定义分析器
假设您通过以下方式创建了索引:
PUT my_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"stopwordsSynonym": {
"filter": [
"lowercase",
"my_synonym",
"english_stop"
],
"tokenizer": "standard"
}
},
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"my_synonym": {
"type": "synonym",
"synonyms": [
"nice => good",
"poor => bad"
]
}
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"my_text": {
"type": "text",
"analyzer": "stopwordsSynonym"
}
}
}
}
}
并添加一条记录:
POST my_index/my_type
{
"my_text": "People aren’t born good or bad. Maybe they’re born with tendencies either way, but it’s the way you live your life that matters."
}
现在默认情况下,对
my_text
的搜索将使用stopwordsSynonym
分析器。该查询将与文档匹配,因为nice
是good
的同义词:GET my_index/_search
{
"query": {
"match": {
"my_text": "nice and ugly"
}
}
}
测试自定义分析仪
您也可以像这样测试分析仪:
GET my_index/_analyze
{
"analyzer": "stopwordsSynonym",
"text": "nice or ugly"
}
{
"tokens": [
{
"token": "good",
"start_offset": 0,
"end_offset": 4,
"type": "SYNONYM",
"position": 0
},
{
"token": "ugly",
"start_offset": 8,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 2
}
]
}
将此与
standard
分析器输出进行比较:GET my_index/_analyze
{
"analyzer": "standard",
"text": "nice or ugly"
}
{
"tokens": [
{
"token": "nice",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "or",
"start_offset": 5,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "ugly",
"start_offset": 8,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 2
}
]
}
实际上,
stopwordsSynonym
用nice
替换了good
token (其type
是SYNONYM
),并从 token 列表中删除了or
,因为它是常见的英语停用词。定义分析器查询
为了对给定查询使用不同的分析器,可以使用
query_string
查询:GET /_search
{
"query": {
"query_string": {
"query": "my_text:nice and poor",
"analyzer": "stopwordsSynonym"
}
}
}
或
match_phrase
查询:GET my_index/_search
{
"query": {
"match_phrase" : {
"my_standard_text" : {
"query" : "nice and poor",
"analyzer": "stopwordsSynonym"
}
}
}
}
无论如何,应在创建时将
analyzer
添加到索引的设置中(请参阅答案的开头)。还要看看search analyzer,它允许使用不同的分析器进行搜索。
关于php - Elasticsearch-如何在查询中使用多个分析器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45859800/