elasticsearch - Elasticsearch 查询匹配中的不同记录

标签 elasticsearch

我在 Elasticsearch 中设置了以下记录

POST /books/book/1
{
  "title" : "JavaScript: The Good Parts",
  "author" : "Douglas Crockford",
  "language" : "JavaScript",
  "publishYear" : 2009,
  "soldCopy" : "50"
}

POST /books/book/2
{
  "title" : "JavaScript: The Good Parts",
  "author" : "Douglas Crockford",
  "language" : "JavaScript",
  "publishYear" : 2009,
  "soldCopy" : "110"
}

POST /books/book/3
{
  "title" : "JavaScript: The Good Parts",
  "author" : "Douglas Crockford1",
  "language" : "JavaScript",
  "publishYear" : 2011,
  "soldCopy" : "2"
}

POST /books/book/4
{
  "title" : "JavaScript: The Good Parts",
  "author" : "Douglas Crockford2",
  "language" : "JavaScript",
  "publishYear" : 2012,
  "soldCopy" : "5"
}

我正在使用以下 Elasticsearch 查询来基于给定的2009年获得不同的标题和作者。我期望的查询输出为
JavaScript: The Good Parts Douglas Crockford

但在响应中,我获得了2条具有相同输出的记录,例如:
JavaScript: The Good Parts      Douglas Crockford
JavaScript: The Good Parts      Douglas Crockford

用于 Elasticsearch 的查询为:
{
  "query": {
    "match": {
      "publishYear": "2009"   }
  }
}

我尝试用数据库术语创建的等效选择查询是:
select distinct title,author from book where publishYear = '2009'

我如何从 Elasticsearch 中获得与SQL查询相同的输出?
谢谢

最佳答案

sql中的区别等同于elasticsearch中的terms aggregation

{
  "query": {
    "match": {
      "publishYear": "2009"
    }
  },
  "aggs": {
    "unique_author": {
      "terms": {
        "field": "author",
        "size": 10
      }
    },
    "unique_book": {
      "terms": {
        "field": "title",
        "size": 10
      }
    }
  },
  "size": 0
}

为此,您必须将title和author字段设置为 not_analyzed ,或者也可以将keyword tokenizerlowercase token 过滤器配合使用。更好的选择是将它们设置为multi fields

您可以这样创建索引
PUT books
{
  "mappings": {
    "book":{
      "properties": {
        "title":{
          "type": "string",
          "fields": {
            "raw":{
              "type": "string",
              "index": "not_analyzed"
            }
          }
        },
        "author":{
          "type": "string",
          "fields": {
            "raw":{
              "type": "string",
              "index": "not_analyzed"
            }
          }
        },
        "language":{
          "type": "string"
        },
        "publishYear":{
          "type": "integer"
        },
        "soldCopy":{
          "type": "string"
        }
      }
    }
  }
}

然后在聚合中使用.raw。

关于elasticsearch - Elasticsearch 查询匹配中的不同记录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40371341/

相关文章:

elasticsearch - 在ElasticSearch查询中返回子属性

elasticsearch - 查询未应用自定义分数

elasticsearch - 来自带有多个WHERE子句的SQL语句的 Elasticsearch 查询

java - Elasticsearch集群上的数据量

python - settings.DATABASES配置不正确(Elasticsearch)

elasticsearch - 按 timeUpdated 排序,如果存在,否则 timeCreated

elasticsearch - 在无痛脚本的Array contains方法的幕后,无痛到底做了什么

json - Elasticsearch 提取内部元素

ruby-on-rails - 几号?使用Elasticsearch和Tire困于45,338年

php - 查找导致 elasticsearch 响应变慢的原因