elasticsearch - 从 Elasticsearch 中过滤重复的类似网址

标签 elasticsearch

在ES 6.4.2上工作。如何过滤以下结果。

Title: Some TITLE
Description:A Storm application is designed as a "topology" in the shape of a directed acyclic graph (DAG) with spouts and bolts acting as the graph vertices. Edges on the graph are named streams and direct data from one node to another. Together, the topology acts as a data transformation pipeline. At a superficial level the general topology structure is similar to a MapReduce job, with the main difference being that data is processed in real time as opposed to in individual batches. Additionally, Storm topologies run indefinitely until killed, while a MapReduce job DAG must eventually end.
url: https://www.someurl.com


Title: Some TITLE 
Description:A Storm application is designed as a "topology" in the shape of a directed acyclic graph (DAG) with spouts and bolts acting as the graph vertices. Edges on the graph are named streams and direct data from one node to another. Together, the topology acts as a data transformation pipeline. At a superficial level the general topology structure is similar to a MapReduce job, with the main difference being that data is processed in real time as opposed to in individual batches. Additionally, Storm topologies run indefinitely until killed, while a MapReduce job DAG must eventually end.
url: http://www.someurl.com

我如何获得其中一项记录。标题和内容是相同的url,唯一的一个是需要过滤http和https。任何解决方案。

最佳答案

对此可能有多种解决方案,我能想到的最简单的方法是使用匹配短语查询来过滤结果。在您的情况下,两个查询字词是

http: and https:



请注意,我故意在http后使用冒号,以使https短语不匹配。

这是您的查询
GET yourIndexName/_search
{
  "query": {
    "match_phrase": {
      "url": "http:"
    }
  }
}

GET yourIndexname/_search
{
  "query": {
    "match_phrase": {
      "url": "https:"
    }
  }
}

关于elasticsearch - 从 Elasticsearch 中过滤重复的类似网址,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53685141/

相关文章:

json - Elastic Search地理哈希返回完整的详细信息,而不仅仅是doc_count

php - 如何结合 bool 必须和排序以进行 Elasticsearch

docker - Docker Volume-未处理的异常:访问被拒绝。 (来自HRESULT的异常:0x80070005(E_ACCESSDENIED))

amazon-web-services - 在单独的 aws ec2 服务器中设置 elasticsearch kibana

ElasticSearch:同一索引中多种类型的性能影响

elasticsearch - Helm delele 无限期挂起

elasticsearch - 仅存储选定字段,不存储_all在pyes/elasticsearch中

elasticsearch - Kibana:将Y轴上的值显示为百分比

elasticsearch - 不区分大小写搜索非索引字段

curl - 如何创建 ElasticSearch 类型并使其可在索引内搜索