elasticsearch - 使用 OR 和通配符查询 elasticsearch

标签 elasticsearch wildcard

我正在尝试对我的 elasticsearch _type 做一个简单的查询,并用通配符匹配多个字段,我的第一次尝试是这样的:

POST my_index/my_type/_search
{
  "sort" : { "date_field" : {"order" : "desc"}},
  "query" : {
    "filtered" : {
      "filter" : {
        "or" : [
          {
              "term" : { "field1" : "4848" }
          },
          {
              "term" : { "field2" : "6867" }
          }
        ]
      }
    }
  }
}

当 field1 OR field2 分别完全等于 4848 和 6867 时,此示例将成功匹配每条记录。

我想要做的是在 field1 上匹配包含 4848 的任何文本和包含 6867 的 field2,但我不确定如何去做。

我很感激我能得到的任何帮助:)

最佳答案

听起来您的问题主要与 analysis 有关。合适的解决方案取决于您的数据结构以及您想要匹配的内容。我将提供几个例子。

首先,让我们假设您的数据是这样的,我们只需使用 standard analyzer 就可以获得我们想要的东西。此分析器将标记空格、标点符号和符号上的文本字段。因此文本 "1234-5678-90" 将被分解为术语 "1234""5678""90" ,因此任何这些术语的 "term" 查询或过滤器将匹配该文档。更具体地说:

DELETE /test_index

PUT /test_index
{
   "settings": {
      "number_of_shards": 1
   },
   "mappings": {
       "doc": {
           "properties": {
               "field1":{
                   "type": "string",
                   "analyzer": "standard"
               },
               "field2":{
                   "type": "string",
                   "analyzer": "standard"
               }
           }
       }
   }
}

POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"field1": "1212-2323-4848","field2": "1234-5678-90"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"field1": "0000-0000-0000","field2": "0987-6543-21"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"field1": "1111-2222-3333","field2": "6867-4545-90"}

POST test_index/_search
{
   "query": {
      "filtered": {
         "filter": {
            "or": [
               {
                  "term": { "field1": "4848" }
               },
               {
                  "term": { "field2": "6867" }
               }
            ]
         }
      }
   }
}
...
{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 1,
            "_source": {
               "field1": "1212-2323-4848",
               "field2": "1234-5678-90"
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "3",
            "_score": 1,
            "_source": {
               "field1": "1111-2222-3333",
               "field2": "6867-4545-90"
            }
         }
      ]
   }
}

(显式编写 "analyzer": "standard" 是多余的,因为如果您没有指定,那是默认使用的分析器;我只是想让它显而易见。)

另一方面,如果文本的嵌入方式使得标准分析无法提供您想要的内容,例如 "121223234848" 并且您想要匹配 "4848" ,您将不得不使用 ngrams 做一些更复杂的事情.这是一个例子(注意数据的差异):
DELETE /test_index

PUT /test_index
{
   "settings": {
      "analysis": {
         "filter": {
            "nGram_filter": {
               "type": "nGram",
               "min_gram": 2,
               "max_gram": 20,
               "token_chars": [
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
         "analyzer": {
            "nGram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "nGram_filter"
               ]
            },
            "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   },
   "mappings": {
      "doc": {
          "properties": {
               "field1":{
                   "type": "string",
                   "index_analyzer": "nGram_analyzer", 
                   "search_analyzer": "whitespace_analyzer"
               },
               "field2":{
                   "type": "string",
                   "index_analyzer": "nGram_analyzer", 
                   "search_analyzer": "whitespace_analyzer"
               }
           }
      }
   }
}

POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"field1": "121223234848","field2": "1234567890"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"field1": "000000000000","field2": "0987654321"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"field1": "111122223333","field2": "6867454590"}


POST test_index/_search
{
   "query": {
      "filtered": {
         "filter": {
            "or": [
               {
                  "term": { "field1": "4848" }
               },
               {
                  "term": { "field2": "6867" }
               }
            ]
         }
      }
   }
}
...
{
   "took": 8,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 1,
            "_source": {
               "field1": "121223234848",
               "field2": "1234567890"
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "3",
            "_score": 1,
            "_source": {
               "field1": "111122223333",
               "field2": "6867454590"
            }
         }
      ]
   }
}

这里发生了很多事情,所以我不会试图在这篇文章中解释它。如果您需要更多解释,我鼓励您阅读这篇博文: http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams 。希望你原谅这个无耻的插件。 ;)

希望有帮助。

关于elasticsearch - 使用 OR 和通配符查询 elasticsearch,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28842106/

相关文章:

ruby-on-rails - 使用elasticsearch搜索多边形内的多边形

python - python的glob函数是否支持深度可变的通配符?

具有复合键和通配符的 Java 泛型

ruby-on-rails - 为什么我的mongo rails应用程序中的Elasticsearch索引没有更新?

elasticsearch - Elasticsearch聚合查询中的嵌套过滤器

bash脚本: wildcard expression is not processed as expected

Node.js Mqtt 客户端 : matched topic

java - 使用 JSch LS 方法列出带问号的目录

python - 如何使Elasticsearch返回所有包含特定术语的文档而不对其评分

elasticsearch - 如何在Grafana中实现完全匹配查询