json - 带有嵌套集的Elasticsearch查询

标签 json elasticsearch nested set elasticsearch-query

我对Elasticsearch来说还很陌生,所以请多多包涵,让我知道是否需要提供其他信息。我继承了一个项目,需要实现新的搜索功能。文档/映射结构已经到位,但是如果不能方便地实现,则可以更改。我正在使用Elasticsearch版本5.6.16。

公司能够提供多种服务。每个服务产品都归为一组。每组是3个类别的 Composer ;

  • 产品(ID 1)
  • 进程(ID 3)
  • Material (ID 4)

  • 文档结构如下:
    [{
      "id": 4485,
      "name": "Company A",
      // ...
      "services": {
        "595": {
          "1": [
            95, 97, 91
          ],
          "3": [
            475, 476, 471
          ],
          "4": [
            644, 645, 683
          ]
        },
        "596": {
          "1": [
            91, 89, 76
          ],
          "3": [
            476, 476, 301
          ],
          "4": [
            644, 647, 555
          ]
        },
        "597": {
          "1": [
            92, 93, 89
          ],
          "3": [
            473, 472, 576
          ],
          "4": [
            641, 645, 454
          ]
        },
      }
    }]
    

    在上面的例子中; 595、596和597是与该集合有关的ID。 1、3和4与类别有关(如上所述)。

    映射看起来像;
    [{
      "id": {
        "type": "long"
      },
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "services": {
        "properties": {
          // ...
          "595": {
            "properties": {
              "1": {"type": "long"},
              "3": {"type": "long"},
              "4": {"type": "long"}
            }
          },
          "596": {
            "properties": {
              "1": {"type": "long"},
              "3": {"type": "long"},
              "4": {"type": "long"}
            }
          },
          // ...
        }
      },
    }]
    

    在搜索提供产品(ID 1)的公司时-搜索91和95将返回公司A,因为这些ID在同一集合内。但是,如果我要搜索95和76,它将不会返回公司A-尽管公司确实同时提供这两种产品,但它们不在同一集合中。搜索流程和 Material 或它们的组合时,将应用这些相同的规则。

    我希望确认当前的文档/映射结构将有助于这种搜索。
  • 如果是这样的话,给定3个ID数组(产品,过程和 Material ),用什么JSON来查找在同一组中提供这些服务的所有公司?
  • 如果不是,应如何更改文档/映射以允许此搜索?

  • 谢谢您的帮助。

    最佳答案

    ID本身作为field本身的值出现是一个坏主意,因为这可能会导致创建如此多的反向索引(请记住,在Elasticsearch中,反向索引是在每个字段上创建的),但我认为并非如此有这样的事情是合理的。

    而是将数据模型更改为如下所示。我还提供了示例文档,您可以应用的可能查询以及响应的显示方式。

    请注意,仅为了简单起见,我仅关注映射中提到的services字段。

    对应:

    PUT my_services_index
    {
      "mappings": {
        "properties": {
          "services":{
            "type": "nested",                   <----- Note this
            "properties": {
              "service_key":{
                "type": "keyword"               <----- Note that I have mentioned keyword here. Feel free to use text and keyword if you plan to implement partial + exact search.
              },
              "product_key": {
                "type": "keyword"
              },
              "product_values": {
                "type": "keyword"
              },
              "process_key":{
                "type": "keyword"
              },
              "process_values":{
                "type": "keyword"
              },
              "material_key":{
                "type": "keyword"
              },
              "material_values":{
                "type": "keyword"
              }
            }
          }
        }
      }
    }
    

    注意,我已经使用了nested数据类型。我建议您通过该链接了解为什么我们需要它而不是使用普通的 object 类型。

    样本文件:
    POST my_services_index/_doc/1
    {
      "services":[
      {
        "service_key": "595",
        "process_key": "1",
        "process_values": ["95", "97", "91"],
        "product_key": "3",
        "product_values": ["475", "476", "471"],
        "material_key": "4",
        "material_values": ["644", "645", "643"]
      },
      {
        "service_key": "596",
        "process_key": "1",
        "process_values": ["91", "89", "75"],
        "product_key": "3",
        "product_values": ["476", "476", "301"],
        "material_key": "4",
        "material_values": ["644", "647", "555"]
      }
        ]
    }
    

    请注意,如果数据最终具有多个组合或product_key, process_key and material_key,现在将如何管理它们。

    解释上述文档的方式是,在my_services_index文档中有两个嵌套文档。

    查询样例:
    POST my_services_index/_search
    {
      "_source": "services.service_key", 
      "query": {
        "bool": {
          "must": [
            {
              "nested": {                                      <---- Note this
                "path": "services",
                "query": {
                  "bool": {
                    "must": [
                      {
                        "term": {
                          "services.service_key": "595"
                        }
                      },
                      {
                        "term": {
                          "services.process_key": "1"
                        }
                      },
                      {
                        "term": {
                          "services.process_values": "95"
                        }
                      }
                    ]
                  }
                },
                "inner_hits": {}                              <---- Note this
              }
            }
          ]
        }
      }
    }
    

    请注意,我已经使用了Nested Query

    响应:
    {
      "took" : 3,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 1.828546,
        "hits" : [                              <---- Note this. Which would return the original document. 
          {
            "_index" : "my_services_index",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.828546,
            "_source" : {
              "services" : [
                {
                  "service_key" : "595",
                  "process_key" : "1",
                  "process_values" : [
                    "95",
                    "97",
                    "91"
                  ],
                  "product_key" : "3",
                  "product_values" : [
                    "475",
                    "476",
                    "471"
                  ],
                  "material_key" : "4",
                  "material_values" : [
                    "644",
                    "645",
                    "643"
                  ]
                },
                {
                  "service_key" : "596",
                  "process_key" : "1",
                  "process_values" : [
                    "91",
                    "89",
                    "75"
                  ],
                  "product_key" : "3",
                  "product_values" : [
                    "476",
                    "476",
                    "301"
                  ],
                  "material_key" : "4",
                  "material_values" : [
                    "644",
                    "647",
                    "555"
                  ]
                }
              ]
            },
            "inner_hits" : {                    <--- Note this, which would tell you which inner document has been a hit. 
              "services" : {
                "hits" : {
                  "total" : {
                    "value" : 1,
                    "relation" : "eq"
                  },
                  "max_score" : 1.828546,
                  "hits" : [
                    {
                      "_index" : "my_services_index",
                      "_type" : "_doc",
                      "_id" : "1",
                      "_nested" : {
                        "field" : "services",
                        "offset" : 0
                      },
                      "_score" : 1.828546,
                      "_source" : {
                        "service_key" : "595",
                        "process_key" : "1",
                        "process_values" : [
                          "95",
                          "97",
                          "91"
                        ],
                        "product_key" : "3",
                        "product_values" : [
                          "475",
                          "476",
                          "471"
                        ],
                        "material_key" : "4",
                        "material_values" : [
                          "644",
                          "645",
                          "643"
                        ]
                      }
                    }
                  ]
                }
              }
            }
          }
        ]
      }
    }
    

    请注意,我已经使用了keyword数据类型。请随意使用数据类型,以及所有字段的业务需求。

    我提供的想法是为了帮助您了解文档模型。

    希望这可以帮助!

    关于json - 带有嵌套集的Elasticsearch查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60913318/

    相关文章:

    tomcat - 错误显示时间过滤器 kibana 3 (utc)

    elasticsearch - Elasticsearch :能够搜索 'n1'并匹配 'N°1'

    json - VB.NET动态地将Newtonsoft JSON反序列化为对象

    javascript - 从 JSON 自动生成 HTML

    ruby-on-rails - SearchKick中条件所在的用户下级功能

    python - 使用多个嵌套 for 循环时,如何绕过 python 中的静态嵌套 block ?

    javascript - d3 用循环嵌套在几个键上

    python - 从嵌套字典列表中删除重复值

    java - 将 JSON 对象动态转换为表行

    php - 输出 1/0//Yes/No 的比较工具