lucene - 查询嵌套文档中的缺失字段

标签 lucene elasticsearch

我有一个包含许多标签的用户文档
这是映射:

{
  "user" : {
    "properties" : {
      "tags" : {
        "type" : "nested",
        "properties" : {
          "id" : {
            "type" : "string",
            "index" : "not_analyzed",
            "store" : "yes"
          },
          "current" : {
            "type" : "boolean"
          },
          "type" : {
            "type" : "string"
          },
          "value" : {
            "type" : "multi_field",
            "fields" : {
              "value" : {
                "type" : "string",
                "analyzer" : "name_analyzer"
              },
              "value_untouched" : {
                "type" : "string",
                "index" : "not_analyzed",
                "include_in_all" : false
              }
            }
          }
        }
      }
    }
  }
}

以下是样本用户文档:
用户1
{
  "created_at": 1317484762000,
  "updated_at": 1367040856000,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
    },
    {
      "type": "company",
      "value": "alma connect",
      "id": "58ad4afcc8415216ea451339aaecf311ed40e132"
    },
    {
      "type": "company",
      "value": "Google",
      "id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5",
      "current": true
    },
    {
      "type": "discipline",
      "value": "B.Tech.",
      "id": "a7706af7f1477cbb1ac0ceb0e8531de8da4ef1eb",
      "institute_id": "4fb424a5addf32296f00013a"
    },    
  ]
}

使用者2:
{
  "created_at": 1318513355000,
  "updated_at": 1364888695000,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
    },
    {
      "type": "college",
      "value": "Bharatiya Vidya Bhavan's Public School, Jubilee hills, Hyderabad",
      "id": "d20730345465a974dc61f2132eb72b04e2f5330c"
    },
    {
      "type": "company",
      "value": "Alma Connect",
      "id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5"
    },
    {
      "type": "sector",
      "value": "Website and Software Development",
      "id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a4"
    }    
  ]
}

使用者3:
{
  "created_at": 1318513355001,
  "updated_at": 1364888695010,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361821"
    },
    {
      "type": "sector",
      "value": "Website and Software Development",
      "id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a1"
    }    
  ]
}

使用上面的ES文档进行搜索,我想构造一个查询,在这里我需要获取嵌套标签文档中具有公司标签的用户或没有任何公司标签的用户。我的搜索查询是什么?

例如,在上述情况下,如果搜索google标签,则返回的文档应为“用户1”和“用户3”(因为用户1具有公司标签google,而用户3没有公司标签)。用户2未返回,因为它也具有Google以外的公司标签。

最佳答案

一点都不琐碎,主要是由于没有type:company标签子句。这是我想出的:

{
  "or" : {
    "filters" : [ {
      "nested" : {
        "filter" : {
          "and" : {
            "filters" : [ {
              "term" : {
                "tags.value" : "google"
              }
            }, {
              "term" : {
                "tags.type" : "company"
              }
            } ]
          }
        },
        "path" : "tags"
      }
    }, {
      "not" : {
        "filter" : {
          "nested" : {
            "filter" : {
              "term" : {
                "tags.type" : "company"
              }
            },
            "path" : "tags"
          }
        }
      }
    } ]
  }
}

它包含一个带有两个嵌套子句的or filter:第一个查找具有tag.type:company和tags.value:google的文档,而第二个查找没有任何tag.type:company的所有文档。

尽管和/或/和过滤器不像term filter那样利用与位集一起使用的过滤器缓存,但仍需要对此进行优化。最好花一些时间来找到一种使用bool filter并获得相同结果的方法。看看this article了解更多。

关于lucene - 查询嵌套文档中的缺失字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16399464/

相关文章:

java - 使用 Lucene 近实时索引功能时是否需要关闭 DirectoryReader

lucene - 避免 lucence QueryParser Parse 异常?

elasticsearch - Elastic Search 7.6.2中的随机文档-弃用警告消息

Elasticsearch Shield 插件 - 即使具有管理员权限的用户也无法对用户进行身份验证

elasticsearch - Elasticsearch无法精确查询文档内的数组字段

elasticsearch - Elasticsearch-如何在所有索引和文档中搜索词的一部分

java - 在elasticsearch相似性实现中无法覆盖ClassicSimilarity中的scorePayload函数

java.lang.OutOfMemoryError : Java heap space-How to resolve this error? 错误

apache - 在 localhost 中尝试 solr 时出错

sorting - 如何对 Elasticsearch 中的分析/标记化字段进行排序?