dataframe - 在 Elasticsearch 中查询某些字段和某些条件?

标签 dataframe apache-spark elasticsearch

我有关于Product的数据,其中有一些字段(_id, Shop, ProductVerion ...)。它已在Elastic Search中建立索引。而且我想在商店中搜索具有最大ProductVersion的产品。

例如:

Shop Amazon has 3 Version crawl product: 111,222,333.
Shop Ebay has 2 version: 222,444
Shop Alibaba has 2 version: 111, 444

版本可能相同。

现在,我想获得具有以下功能的产品:
Shop Amazon and ProducVersion 333
or Shop Ebay and ProductVersion 444
or Shop Alibaba and ProductVersion 444.

但我不知道这是什么。
请帮帮我!

最佳答案

我尝试了一些示例文档。我将版本字段保留为数字字段。

这些是我尝试过的示例文档

[
  {
    "_index": "test",
    "_type": "doc",
    "_id": "12334",
    "_score": 1,
    "_source": {
      "shopName": "amazon",
      "version": 341
    }
  },
  {
    "_index": "test",
    "_type": "doc",
    "_id": "123",
    "_score": 1,
    "_source": {
      "shopName": "amazon",
      "version": 3412
    }
  },
  {
    "_index": "test",
    "_type": "doc",
    "_id": "1233",
    "_score": 1,
    "_source": {
      "shopName": "amazon",
      "version": 341
    }
  },
  {
    "_index": "test",
    "_type": "doc",
    "_id": "1238",
    "_score": 1,
    "_source": {
      "shopName": "alibaba",
      "version": 34120
    }
  },
  {
    "_index": "test",
    "_type": "doc",
    "_id": "1239",
    "_score": 1,
    "_source": {
      "shopName": "alibaba",
      "version": 3414
    }
  },
  {
    "_index": "test",
    "_type": "doc",
    "_id": "123910",
    "_score": 1,
    "_source": {
      "shopName": "alibaba",
      "version": 124
    }
  }
]

正如@demas所指定的,我继续进行字词汇总和热门歌曲汇总
indexName/_search

{
  "size": 0,
  "aggs": {
    "shop": {
      "terms": {
        "field": "shopName.keyword"
      },
      "aggs": {
        "product": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "version": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

这将为您提供包含每个商店最高产品版本号的文档,如下所示。
{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "shop": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "alibaba",
          "doc_count": 3,
          "product": {
            "hits": {
              "total": 3,
              "max_score": null,
              "hits": [
                {
                  "_index": "test",
                  "_type": "doc",
                  "_id": "1238",
                  "_score": null,
                  "_source": {
                    "shopName": "alibaba",
                    "version": 34120
                  },
                  "sort": [
                    34120
                  ]
                }
              ]
            }
          }
        },
        {
          "key": "amazon",
          "doc_count": 3,
          "product": {
            "hits": {
              "total": 3,
              "max_score": null,
              "hits": [
                {
                  "_index": "test",
                  "_type": "doc",
                  "_id": "123",
                  "_score": null,
                  "_source": {
                    "shopName": "amazon",
                    "version": 3412
                  },
                  "sort": [
                    3412
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
} 

关于dataframe - 在 Elasticsearch 中查询某些字段和某些条件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57743819/

相关文章:

pandas - 如何获取 75% 或最大值的数据帧行数

r - 使用R//redux向redis传输数据

java - "main"java.lang.ClassCastException : [Lscala. Tuple2;无法在 Spark MLlib LDA 中转换为 scala.Tuple2

java - 如何解决elasticsearch中的AccessDeniedException?

python - 使用相似的列合并 2 个数据框

python - 阅读 Pandas 数据框时跳过包含特定值的特定行

elasticsearch - 如何在Elasticsearch中有效地存储表格数据?

elasticsearch - 尝试从主题创建索引时,Elasticsearch Sink连接器抛出403禁止的异常

java - Spark 序列化的奇怪之处

memory - 如何为 apache spark worker 更改每个节点的内存