elasticsearch - ElasticSearch 是否支持 Unicode/Chinese？

我正在通过 ElasticSearch 进行文本搜索，并且使用术语类型进行查询时出现问题。我在下面所做的基本上是，

添加一个带有中文字符串(你好)的文档。
用text方法查询，返回文档。
使用 term 方法查询，不返回任何内容。

那么，为什么会这样？以及如何解决。

➜  curl -XPOST 'http://localhost:9200/test/test/' -d '{ "name" : "你好" }'

{
  "ok": true,
  "_index": "test",
  "_type": "test",
  "_id": "VdV8K26-QyiSCvDrUN00Nw",
  "_version": 1
}

➜  curl -XGET 'http://localhost:9200/test/test/_mapping?pretty=1'

{
  "test" : {
    "properties" : {
      "name" : {
        "type" : "string"
      }
    }
  }
}

➜  curl -XGET 'http://localhost:9200/test/test/_search?pretty=1'

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.0,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "VdV8K26-QyiSCvDrUN00Nw",
        "_score": 1.0,
        "_source": {
          "name": "你好"
        }
      }
    ]
  }
}

➜  curl -XGET 'http://localhost:9200/test/test/_search?pretty=1' -d '{
  "query": {
    "text": {
      "name": "你好"
    }
  }
}'

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.8838835,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "VdV8K26-QyiSCvDrUN00Nw",
        "_score": 0.8838835,
        "_source": {
          "name": "你好"
        }
      }
    ]
  }
}

➜  curl -XGET 'http://localhost:9200/test/test/_search?pretty=1' -d '{
  "query": {
    "term": {
      "name": "你好"
    }
  }
}'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

最佳答案

来自关于 term query 的 ElasticSearch 文档:

Matches documents that have fields that contain a term (not analyzed).

name字段默认是解析过的，所以term查询是查不到的(只能找到未解析过的字段)。你可以试试用不同的name(不是中文)索引另一个文档，也不能通过term查询找到。如果您现在想知道为什么以下搜索查询会返回结果:

curl -XGET 'http://localhost:9200/test/test/_search?pretty=1' -d '{"query" : {"term" : { "name" : "好" }}}'

这是因为每个标记在这方面都是一个未分析的术语。如果你索引一个名称为“你好吗”的文档，你也不会找到包含“好吗”或“你好”的文档，但你可以找到包含“你”、“好”或“吗”的文档术语查询。

对于中文，您可能需要特别注意所使用的分析器。对我来说，标准分析器似乎已经足够好了(基于字符而不是空格对中文短语进行标记)。

关于elasticsearch - ElasticSearch 是否支持 Unicode/Chinese？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19903591/

elasticsearch - ElasticSearch 是否支持 Unicode/Chinese？

上一篇：elasticsearch - spring-data-elasticsearch 的 Multi-Tenancy

下一篇：indexing - 向现有文档 elasticsearch 添加附加属性