python - 如何使用elasticsearch-dsl查找数组中所有索引中的不同值？

我在 django 中使用elasticsearch-dsl。我定义了一个 DocType 文档和一个包含值列表的关键字。

这是我的代码。

from elasticsearch_dsl import DocType, Text, Keyword

class ProductIndex(DocType):
    """
    Index for products
    """
    id = Keyword()
    slug = Keyword()
    name = Text()
    filter_list = Keyword()

filter_list 是这里包含多个值的数组。现在我有一些值，例如sample_filter_list，它们是不同的值，其中一些元素可以存在于某些产品的filter_list 数组中。因此，给定这个sample_filter_list，我想要filter_list与sample_filter_list交集不为空的所有产品的filter_list的所有唯一元素。

for example I have 5 products whose filter_list is like :
1) ['a', 'b', 'c']
2) ['d', 'e', 'f']
3) ['g', 'h', 'i']
4) ['j', 'k', 'l']
5) ['m', 'n', 'o']
and if my sample filter_list is ['a', 'd', 'g', 'j', 'm']
then elasticsearch should return an array containg distinct element 
i.e. ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o']

最佳答案

            Writing Answer not specific to django but general,
            Suppose you have some ES index some_index2 with mapping

            PUT some_index2
            {
              "mappings": {
                "some_type": {
                  "dynamic_templates": [
                    {
                      "strings": {
                        "mapping": {
                          "type": "string"
                        },
                        "match_mapping_type": "string"
                      }
                    }
                  ],
                  "properties": {
                    "field1": {
                      "type": "string"
                    },
                    "field2": {
                      "type": "string"
                    }
                  }
                }
              }
            }

        Also you have inserted the documents 
        {
            "field1":"id1",
            "field2":["a","b","c","d]
        }
        {
            "field1":"id2",
            "field2":["e","f","g"]
        }
        {
            "field1":"id3",
            "field2":["e","l","k"]
        }

    Now as you stated you want all the distinct values of field2(filter_list) in your case, You can easily get that by using ElasticSearch term aggregation

    GET some_index2/_search
    {
    "aggs": {
      "some_name": {
        "terms": {
          "field": "field2",
          "size": 10000
        }
      }
    },
    "size": 0
    }

    Which will give you result as:

    {
      "took": 2,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 3,
        "max_score": 0,
        "hits": []
      },
      "aggregations": {
        "some_name": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "e",
              "doc_count": 2
            },
            {
              "key": "a",
              "doc_count": 1
            },
            {
              "key": "b",
              "doc_count": 1
            },
            {
              "key": "c",
              "doc_count": 1
            },
            {
              "key": "d",
              "doc_count": 1
            },
            {
              "key": "f",
              "doc_count": 1
            },
            {
              "key": "g",
              "doc_count": 1
            },
            {
              "key": "k",
              "doc_count": 1
            },
            {
              "key": "l",
              "doc_count": 1
            }
          ]
        }
      }
    }

    where buckets contains the list of all the distinct values.
    you can easily iterate through bucket and find the value under KEY.

Hope this is what is required to you.

关于python - 如何使用elasticsearch-dsl查找数组中所有索引中的不同值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51056111/

python - 如何使用elasticsearch-dsl查找数组中所有索引中的不同值？

上一篇：python - celery - 无法获取任务结果

下一篇：python - 新的 PyYAML 版本在大多数自定义 python 对象上中断 - RepresenterError