elasticsearch - Elasticsearch自定义分析器被忽略

标签 elasticsearch analyzer

我正在使用Elasticsearch 2.2.0,并且尝试在字段上使用lowercase + asciifolding过滤器。

这是http://localhost:9200/myindex/的输出

{
    "myindex": {
        "aliases": {}, 
        "mappings": {
            "products": {
                "properties": {
                    "fold": {
                        "analyzer": "folding", 
                        "type": "string"
                    }
                }
            }
        }, 
        "settings": {
            "index": {
                "analysis": {
                    "analyzer": {
                        "folding": {
                            "token_filters": [
                                "lowercase", 
                                "asciifolding"
                            ], 
                            "tokenizer": "standard", 
                            "type": "custom"
                        }
                    }
                }, 
                "creation_date": "1456180612715", 
                "number_of_replicas": "1", 
                "number_of_shards": "5", 
                "uuid": "vBMZEasPSAyucXICur3GVA", 
                "version": {
                    "created": "2020099"
                }
            }
        }, 
        "warmers": {}
    }
}

当我尝试使用folding API测试_analyze自定义过滤器时,这就是http://localhost:9200/myindex/_analyze?analyzer=folding&text=%C3%89sta%20est%C3%A1%20loca的输出
{
    "tokens": [
        {
            "end_offset": 4, 
            "position": 0, 
            "start_offset": 0, 
            "token": "Ésta", 
            "type": "<ALPHANUM>"
        }, 
        {
            "end_offset": 9, 
            "position": 1, 
            "start_offset": 5, 
            "token": "está", 
            "type": "<ALPHANUM>"
        }, 
        {
            "end_offset": 14, 
            "position": 2, 
            "start_offset": 10, 
            "token": "loca", 
            "type": "<ALPHANUM>"
        }
    ]
}

如您所见,返回的 token 为:Éstaestáloca 而不是 estaestaloca。这是怎么回事?似乎这种折叠式分析仪被忽略了。

最佳答案

创建索引时,看起来像是一个简单的错字。

在您的"analysis":{"analyzer":{...}}块中,这是:

"token_filters": [...]

应该
"filter": [...]

检查the documentation对此进行确认。由于您的过滤器数组名称不正确,ES完全忽略了它,只是决定使用standard分析器。这是一个使用Sense chrome插件编写的小示例。按顺序执行它们:
DELETE /test

PUT /test
{
      "analysis": {
         "analyzer": {
            "folding": {
               "type": "custom",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ],
               "tokenizer": "standard"
            }
         }
      }
}

GET /test/_analyze
{
    "analyzer":"folding",
    "text":"Ésta está loca"
}

和最后GET /test/_analyze的结果:
"tokens": [
      {
         "token": "esta",
         "start_offset": 0,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "esta",
         "start_offset": 5,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "loca",
         "start_offset": 10,
         "end_offset": 14,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]

关于elasticsearch - Elasticsearch自定义分析器被忽略,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35565421/

相关文章:

elasticsearch - elastic apm,关闭ssl验证

ElasticSearch:如何在索引上应用正则表达式

android - 如何在 string.xml 文件中找到所有虚假或无用的字符串值?

dart - 如何使用 `DartType` 中的 `analyzer` 类获取子类型?

fiddler - 我怎样才能优雅地关闭 RawCap?

python - 改善Elasticsearch性能

elasticsearch - 在ElasticSearch上索引聚合数据的最佳方法是什么

elasticsearch - ElasticSearch:高基数聚合断路器

hibernate - Lucene bool 查询

elasticsearch - Elastic Search 忽略 `token_chars`