java - Elasticsearch 排除包含特定术语的文档

标签 java python elasticsearch lucene

我已经在 elasticsearch 中对如下文档建立了索引。

{    
    "category": "clothing (f)",
    "description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
    "name": "Women's Unstoppable Graphic T-Shirt",
    "price": "$34.99"
}

有诸如 clothing (m)clothing (f) 等类别。我试图排除 cloting (m) 类别如果搜索的是女性商品。我正在尝试的查询是:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "description": "women's black shirt"
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "category": "clothing (m)"
          }
        }
      ]
    }
  },
  "from": 0,
  "size": 50
}

但这并没有按预期工作。 clothing (m) 文档与其他文档的结果总是很少。如何排除具有特定类别的文档?

最佳答案

为了排除特定的术语(完全匹配),您必须使用keyword数据类型。

Keyword datatypes are typically used for filtering (Find me all blog posts where status is published), for sorting, and for aggregations. Keyword fields are only searchable by their exact value.

Keyword Datatype

您当前的查询在结果中捕获了 clothing (m),因为当您对文档建立索引时,这些文档是使用 elasticsearch standard 分析器进行分析的,该分析器分析 clothing (m) 服装(男)

在您的查询中,您搜索了 category 作为 text 数据类型。

Text datatype fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed.

运行此命令:

POST my_index/_analyze
{
  "text": ["clothing (m)"]
}

结果:

{
  "tokens" : [
    {
      "token" : "clothing",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "m",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

一个工作示例:

假设您的映射如下所示:

{
 "my_index" : {
    "mappings" : {
      "properties" : {
        "category" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "description" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "price" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

让我们发布一些文档:

POST my_index/_doc/1
{    
    "category": "clothing (m)",
    "description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
    "name": "Women's Unstoppable Graphic T-Shirt",
    "price": "$34.99"
}


POST my_index/_doc/2
{    
    "category": "clothing (f)",
    "description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
    "name": "Women's Unstoppable Graphic T-Shirt",
    "price": "$34.99"
}

现在我们的查询应该如下所示:

GET my_index/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "description": "women's black shirt"
        }
      },
      "filter": {
        "bool": {
          "must_not": {
            "term": {
              "category.keyword": "clothing (m)"
            }
          }
        }
      }
    }
  },
  "from": 0,
  "size": 50
}

结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.43301374,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.43301374,
        "_source" : {
          "category" : "clothing (f)",
          "description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
          "name" : "Women's Unstoppable Graphic T-Shirt",
          "price" : "$34.99"
        }
      }
    ]
  }
}

不使用关键字的结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.43301374,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.43301374,
        "_source" : {
          "category" : "clothing (f)",
          "description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
          "name" : "Women's Unstoppable Graphic T-Shirt",
          "price" : "$34.99"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.43301374,
        "_source" : {
          "category" : "clothing (m)",
          "description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
          "name" : "Women's Unstoppable Graphic T-Shirt",
          "price" : "$34.99"
        }
      }
    ]
  }
}

正如您从最后的结果中看到的,我们还得到了服装(男)。 顺便说一句,不要将 term 用于 text 数据类型。使用匹配

希望这有帮助。

关于java - Elasticsearch 排除包含特定术语的文档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59015686/

相关文章:

elasticsearch - 流利的插件 Elasticsearch : "Could not push log to Elasticsearch" error with "error"=> {"type"= >"mapper_parsing_exception"}

java - eclipse 中的 Hibernate 程序异常

python - 嵌套查询/在 Pandas 中有效地比较多个数据集

python - undefined symbol : xmlMemDisplayLast error with lxml

python - 如何在 Golang 中解开一个 python 对象

javascript - 当导入的函数需要时间才能完成时,如何从nodeJS中的require导入字符串?

elasticsearch - 如何在ElasticSearch中匹配 “prefix”而不是整个字符串?

用于性能考虑的 Javadoc 标记

Java Android AsyncHttpClient 将 byte[]responseBody 转换为 InputStream

java - 试图找出如何在《我的世界》1.12.2 mod 中扩大我的影响力