elasticsearch - 为什么我的 Elasticsearch 查询不返回由英语分析器分析的文本?

标签 elasticsearch

我有一个名为test_blocks的索引

{
  "test_blocks" : {
    "aliases" : { },
    "mappings" : {
      "block" : {
        "dynamic" : "false",
        "properties" : {
          "content" : {
            "type" : "string",
            "fields" : {
              "content_en" : {
                "type" : "string",
                "analyzer" : "english"
              }
            }
          },
          "id" : {
            "type" : "long"
          },
          "title" : {
            "type" : "string",
            "fields" : {
              "title_en" : {
                "type" : "string",
                "analyzer" : "english"
              }
            }
          },
          "user_id" : {
            "type" : "long"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1438642440687",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "version" : {
          "created" : "1070099"
        },
        "uuid" : "45vkIigXSCyvHN6g-w5kkg"
      }
    },
    "warmers" : { }
  }
}

当我搜索killing(内容中的单词)时,搜索结果将按预期返回。
http://localhost:9200/test_blocks/_search?q=killing&pretty=1


{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.07431685,
    "hits" : [ {
      "_index" : "test_blocks",
      "_type" : "block",
      "_id" : "218",
      "_score" : 0.07431685,
      "_source":{"block":{"id":218,"title":"The \u003ci\u003eparticle\u003c/i\u003e streak","content":"Barry Allen is a Central City police forensic scientist\n                        with a reasonably happy life, despite the childhood\n                        trauma of a mysterious red and yellow being killing his\n                        mother and framing his father. All that changes when a\n                        massive \u003cb\u003eparticle\u003c/b\u003e accelerator accident leads to Barry\n                        being struck by lightning in his lab.","user_id":82}}
    }, {
      "_index" : "test_blocks",
      "_type" : "block",
      "_id" : "219",
      "_score" : 0.07431685,
      "_source":{"block":{"id":219,"title":"The \u003ci\u003eparticle\u003c/i\u003e streak","content":"Barry Allen is a Central City police forensic scientist\n                        with a reasonably happy life, despite the childhood\n                        trauma of a mysterious red and yellow being killing his\n                        mother and framing his father. All that changes when a\n                        massive \u003cb\u003eparticle\u003c/b\u003e accelerator accident leads to Barry\n                        being struck by lightning in his lab.","user_id":83}}
    } ]
  }
}

但是,鉴于我有一个用于内容字段(content_en)english分析器,我希望它为查询kill返回相同的文档。但事实并非如此。我得到0次点击。
http://localhost:9200/test_blocks/_search?q=kill&pretty=1

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

通过这个分析查询,我的理解是“杀人”将被分解为“杀人”
http://localhost:9200/_analyze?analyzer=english&text=killing

{
  "tokens" : [ {
    "token" : "kill",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "<ALPHANUM>",
    "position" : 1
  } ]
}

那么,为什么“kill”查询与该文档不匹配?我的映射不正确还是我的搜索不正确?

我正在使用elasticsearch v1.7.0

最佳答案

您需要使用fuzzysearch(一些介绍可用here):

curl -XPOST 'http://localhost:9200/test_blocks/_search' -d '
{
  "query": {
    "match": {
      "title": {
        "query": "kill",
        "fuzziness": 2,
        "prefix_length": 1
      }
    }
  }
}'

UPD 。拥有content_en字段,其内容由stemmer给出,因此实际查询该字段很有意义:
curl -XPOST 'http://localhost:9200/test_blocks/_search' -d '
{
  "query": {
    "multi_match": {
      "type": "most_fields",
      "query": "kill",
      "fields": ["block.title", "block.title.title_en"]
    }
  }
}'

关于elasticsearch - 为什么我的 Elasticsearch 查询不返回由英语分析器分析的文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31786281/

相关文章:

elasticsearch - 带有标准分析器和编号的Elasticsearch完成建议

elasticsearch - 包含/的elasticsearch查询字符串

database - 当我只有全部文档的一个子集时,如何应用 TF-IDF?

elasticsearch - refresh_interval对ElasticSearch对旧不变索引的影响

elasticsearch - 在 Elastic 搜索中推荐的人名分析器/过滤器是什么

elasticsearch - 在Elasticsearch中将 “keyword”更新为索引的 “text”字段类型以进行不精确的单词匹配

elasticsearch - 如何在 Elasticsearch 中检索版本为n的所有文档

elasticsearch - 在 elasticSearch 中查询时定义分析器

csv - 将 csv 导入 elasticsearch

elasticsearch - 是否可以使用 HTTP Basic Auth 通过 Jest 连接到 Nginx 代理的 Elasticsearch?