lucene - 查询嵌套文档中的缺失字段

我有一个包含许多标签的用户文档
这是映射:

{
  "user" : {
    "properties" : {
      "tags" : {
        "type" : "nested",
        "properties" : {
          "id" : {
            "type" : "string",
            "index" : "not_analyzed",
            "store" : "yes"
          },
          "current" : {
            "type" : "boolean"
          },
          "type" : {
            "type" : "string"
          },
          "value" : {
            "type" : "multi_field",
            "fields" : {
              "value" : {
                "type" : "string",
                "analyzer" : "name_analyzer"
              },
              "value_untouched" : {
                "type" : "string",
                "index" : "not_analyzed",
                "include_in_all" : false
              }
            }
          }
        }
      }
    }
  }
}

以下是样本用户文档:
用户1

{
  "created_at": 1317484762000,
  "updated_at": 1367040856000,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
    },
    {
      "type": "company",
      "value": "alma connect",
      "id": "58ad4afcc8415216ea451339aaecf311ed40e132"
    },
    {
      "type": "company",
      "value": "Google",
      "id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5",
      "current": true
    },
    {
      "type": "discipline",
      "value": "B.Tech.",
      "id": "a7706af7f1477cbb1ac0ceb0e8531de8da4ef1eb",
      "institute_id": "4fb424a5addf32296f00013a"
    },    
  ]
}

使用者2:

{
  "created_at": 1318513355000,
  "updated_at": 1364888695000,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
    },
    {
      "type": "college",
      "value": "Bharatiya Vidya Bhavan's Public School, Jubilee hills, Hyderabad",
      "id": "d20730345465a974dc61f2132eb72b04e2f5330c"
    },
    {
      "type": "company",
      "value": "Alma Connect",
      "id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5"
    },
    {
      "type": "sector",
      "value": "Website and Software Development",
      "id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a4"
    }    
  ]
}

使用者3:

{
  "created_at": 1318513355001,
  "updated_at": 1364888695010,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361821"
    },
    {
      "type": "sector",
      "value": "Website and Software Development",
      "id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a1"
    }    
  ]
}

使用上面的ES文档进行搜索，我想构造一个查询，在这里我需要获取嵌套标签文档中具有公司标签的用户或没有任何公司标签的用户。我的搜索查询是什么？

例如，在上述情况下，如果搜索google标签，则返回的文档应为“用户1”和“用户3”(因为用户1具有公司标签google，而用户3没有公司标签)。用户2未返回，因为它也具有Google以外的公司标签。

最佳答案

一点都不琐碎，主要是由于没有type:company标签子句。这是我想出的:

{
  "or" : {
    "filters" : [ {
      "nested" : {
        "filter" : {
          "and" : {
            "filters" : [ {
              "term" : {
                "tags.value" : "google"
              }
            }, {
              "term" : {
                "tags.type" : "company"
              }
            } ]
          }
        },
        "path" : "tags"
      }
    }, {
      "not" : {
        "filter" : {
          "nested" : {
            "filter" : {
              "term" : {
                "tags.type" : "company"
              }
            },
            "path" : "tags"
          }
        }
      }
    } ]
  }
}

它包含一个带有两个嵌套子句的or filter:第一个查找具有tag.type:company和tags.value:google的文档，而第二个查找没有任何tag.type:company的所有文档。

尽管和/或/和过滤器不像term filter那样利用与位集一起使用的过滤器缓存，但仍需要对此进行优化。最好花一些时间来找到一种使用bool filter并获得相同结果的方法。看看this article了解更多。

关于lucene - 查询嵌套文档中的缺失字段，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16399464/

lucene - 查询嵌套文档中的缺失字段

上一篇：audio - MP3数据帧使用什么整数类型？

下一篇：json - Elasticsearch Reindex或标志已删除的类型属性