elasticsearch - Elasticsearch自定义分析器被忽略

我正在使用Elasticsearch 2.2.0，并且尝试在字段上使用lowercase + asciifolding过滤器。

这是http://localhost:9200/myindex/的输出

{
    "myindex": {
        "aliases": {}, 
        "mappings": {
            "products": {
                "properties": {
                    "fold": {
                        "analyzer": "folding", 
                        "type": "string"
                    }
                }
            }
        }, 
        "settings": {
            "index": {
                "analysis": {
                    "analyzer": {
                        "folding": {
                            "token_filters": [
                                "lowercase", 
                                "asciifolding"
                            ], 
                            "tokenizer": "standard", 
                            "type": "custom"
                        }
                    }
                }, 
                "creation_date": "1456180612715", 
                "number_of_replicas": "1", 
                "number_of_shards": "5", 
                "uuid": "vBMZEasPSAyucXICur3GVA", 
                "version": {
                    "created": "2020099"
                }
            }
        }, 
        "warmers": {}
    }
}

当我尝试使用folding API测试_analyze自定义过滤器时，这就是http://localhost:9200/myindex/_analyze?analyzer=folding&text=%C3%89sta%20est%C3%A1%20loca的输出

{
    "tokens": [
        {
            "end_offset": 4, 
            "position": 0, 
            "start_offset": 0, 
            "token": "Ésta", 
            "type": "<ALPHANUM>"
        }, 
        {
            "end_offset": 9, 
            "position": 1, 
            "start_offset": 5, 
            "token": "está", 
            "type": "<ALPHANUM>"
        }, 
        {
            "end_offset": 14, 
            "position": 2, 
            "start_offset": 10, 
            "token": "loca", 
            "type": "<ALPHANUM>"
        }
    ]
}

如您所见，返回的 token 为:Ésta，está，loca 而不是 esta，esta和loca。这是怎么回事？似乎这种折叠式分析仪被忽略了。

最佳答案

创建索引时，看起来像是一个简单的错字。

在您的"analysis":{"analyzer":{...}}块中，这是:

"token_filters": [...]

应该

"filter": [...]

检查the documentation对此进行确认。由于您的过滤器数组名称不正确，ES完全忽略了它，只是决定使用standard分析器。这是一个使用Sense chrome插件编写的小示例。按顺序执行它们:

DELETE /test

PUT /test
{
      "analysis": {
         "analyzer": {
            "folding": {
               "type": "custom",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ],
               "tokenizer": "standard"
            }
         }
      }
}

GET /test/_analyze
{
    "analyzer":"folding",
    "text":"Ésta está loca"
}

和最后GET /test/_analyze的结果:

"tokens": [
      {
         "token": "esta",
         "start_offset": 0,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "esta",
         "start_offset": 5,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "loca",
         "start_offset": 10,
         "end_offset": 14,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]

关于elasticsearch - Elasticsearch自定义分析器被忽略，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35565421/

elasticsearch - Elasticsearch自定义分析器被忽略

上一篇：powershell - 请解释一下这个 LAN 唤醒脚本的工作原理

下一篇：performance - ElasticSearch-将索引设置为只读可提高性能吗？