elasticsearch - 从 Elasticsearch 中过滤重复的类似网址

在ES 6.4.2上工作。如何过滤以下结果。

Title: Some TITLE
Description:A Storm application is designed as a "topology" in the shape of a directed acyclic graph (DAG) with spouts and bolts acting as the graph vertices. Edges on the graph are named streams and direct data from one node to another. Together, the topology acts as a data transformation pipeline. At a superficial level the general topology structure is similar to a MapReduce job, with the main difference being that data is processed in real time as opposed to in individual batches. Additionally, Storm topologies run indefinitely until killed, while a MapReduce job DAG must eventually end.
url: https://www.someurl.com


Title: Some TITLE 
Description:A Storm application is designed as a "topology" in the shape of a directed acyclic graph (DAG) with spouts and bolts acting as the graph vertices. Edges on the graph are named streams and direct data from one node to another. Together, the topology acts as a data transformation pipeline. At a superficial level the general topology structure is similar to a MapReduce job, with the main difference being that data is processed in real time as opposed to in individual batches. Additionally, Storm topologies run indefinitely until killed, while a MapReduce job DAG must eventually end.
url: http://www.someurl.com

我如何获得其中一项记录。标题和内容是相同的url，唯一的一个是需要过滤http和https。任何解决方案。

最佳答案

对此可能有多种解决方案，我能想到的最简单的方法是使用匹配短语查询来过滤结果。在您的情况下，两个查询字词是

http: and https:

请注意，我故意在http后使用冒号，以使https短语不匹配。

这是您的查询

GET yourIndexName/_search
{
  "query": {
    "match_phrase": {
      "url": "http:"
    }
  }
}

GET yourIndexname/_search
{
  "query": {
    "match_phrase": {
      "url": "https:"
    }
  }
}

关于elasticsearch - 从 Elasticsearch 中过滤重复的类似网址，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53685141/

elasticsearch - 从 Elasticsearch 中过滤重复的类似网址

上一篇：elasticsearch - 弹性:尝试初始化同义词 token 过滤器时出错

下一篇：elasticsearch - 如何在elasticsearch中通过ID删除多个文档？