elasticsearch - 如何处理多词同义词

标签 elasticsearch nest

我试图了解我在两种情况下在Elastic搜索中获得的结果。我定义了以下同义词列表:

"product insert, product inserts, qc package, qc package inserts, qc package insert, package insert => package inserts"

我希望将箭头左侧的所有术语都视为右侧的术语。这是我的索引设置:
PUT /test_index
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_syn_filt": {
            "tokenizer": "keyword",
            "type": "synonym",
            "synonyms": [
              "product insert, product inserts, package inserts, qc package, qc packages, qc insert, qc inserts, package insert, qc package insert, qc package inserts => package inserts"
            ]
          }
        },
        "analyzer": {
          "my_synonyms": {
            "filter": [
              "lowercase",
              "my_syn_filt"
            ],
            "tokenizer": "keyword"
          }
        }
      }
    }
  }
}

我的问题是,当我搜索几个词(“产品插入”)时,没有得到预期的结果。但是“产品插入”效果很好。我的配置有问题吗?我错过了一步吗?

最佳答案

我已经测试了您的设置,但我猜您尚未将my_synonyms分析器分配给您的字段。

在不知道如何定义映射的情况下,我将向您展示一个有效的示例:

假设您的映射和设置如下所示:

PUT /my_index
{
  "mappings": {
    "properties": {
      "data": {
        "type": "text",
        "analyzer": "my_synonyms",  => my guess
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  },
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_syn_filt": {
            "tokenizer": "keyword",
            "type": "synonym",
            "synonyms": [
              "product insert, product inserts, package inserts, qc package, qc packages, qc insert, qc inserts, package insert, qc package insert, qc package inserts => package inserts"
            ]
          }
        },
        "analyzer": {
          "my_synonyms": {
            "filter": [
              "lowercase",
              "my_syn_filt"
            ],
            "tokenizer": "keyword"
          }
        }
      }
    }
  }
}

索引一些数据:
POST my_index/_doc/1
{
  "data":"package inserts"
}

查询同义词的实用程序:
GET my_index/_search
{
  "query": {
      "match": {
        "data": "product insert"
      }
  }
}

结果:
{
 "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "data" : "package inserts"
        }
      }
    ]
  }
}

如果不将分析器分配给您的字段,则只有在搜索查询中包含packageinserts这两个词之一时,您才会获得结果。实际上,没有使用分析器的情况下,您将执行使用默认的elasticsearch match分析器的简单standard查询。

希望这可以帮助

关于elasticsearch - 如何处理多词同义词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58977982/

相关文章:

elasticsearch - 由于列中的整数较长,导致grok模式失败

elasticsearch - 在Elasticsearch中更新问题

elasticsearch - 将多个 bool 过滤器附加到 NEST 查询

elasticsearch - nest:如何使用UpdateByQuery()?

elasticsearch - Elasticsearch POST/delete_by_query不起作用

docker - 使用stack-docker时无法连接到端口9200上的elasticsearch

elasticsearch - 图片不为空时评分-elasticsearch-1.7

elasticsearch - 如何从映射中排除继承的对象属性

c# - 使用Nest 6.x将30d转换为两倍

C#嵌套 : How to index array of geo-poinst