我正在使用Elasticsearch 2.2.0,并且尝试在字段上使用lowercase
+ asciifolding
过滤器。
这是http://localhost:9200/myindex/
的输出
{
"myindex": {
"aliases": {},
"mappings": {
"products": {
"properties": {
"fold": {
"analyzer": "folding",
"type": "string"
}
}
}
},
"settings": {
"index": {
"analysis": {
"analyzer": {
"folding": {
"token_filters": [
"lowercase",
"asciifolding"
],
"tokenizer": "standard",
"type": "custom"
}
}
},
"creation_date": "1456180612715",
"number_of_replicas": "1",
"number_of_shards": "5",
"uuid": "vBMZEasPSAyucXICur3GVA",
"version": {
"created": "2020099"
}
}
},
"warmers": {}
}
}
当我尝试使用
folding
API测试_analyze
自定义过滤器时,这就是http://localhost:9200/myindex/_analyze?analyzer=folding&text=%C3%89sta%20est%C3%A1%20loca
的输出{
"tokens": [
{
"end_offset": 4,
"position": 0,
"start_offset": 0,
"token": "Ésta",
"type": "<ALPHANUM>"
},
{
"end_offset": 9,
"position": 1,
"start_offset": 5,
"token": "está",
"type": "<ALPHANUM>"
},
{
"end_offset": 14,
"position": 2,
"start_offset": 10,
"token": "loca",
"type": "<ALPHANUM>"
}
]
}
如您所见,返回的 token 为:
Ésta
,está
,loca
而不是 esta
,esta
和loca
。这是怎么回事?似乎这种折叠式分析仪被忽略了。
最佳答案
创建索引时,看起来像是一个简单的错字。
在您的"analysis":{"analyzer":{...}}
块中,这是:
"token_filters": [...]
应该
"filter": [...]
检查the documentation对此进行确认。由于您的过滤器数组名称不正确,ES完全忽略了它,只是决定使用
standard
分析器。这是一个使用Sense chrome插件编写的小示例。按顺序执行它们:DELETE /test
PUT /test
{
"analysis": {
"analyzer": {
"folding": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding"
],
"tokenizer": "standard"
}
}
}
}
GET /test/_analyze
{
"analyzer":"folding",
"text":"Ésta está loca"
}
和最后
GET /test/_analyze
的结果:"tokens": [
{
"token": "esta",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "esta",
"start_offset": 5,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "loca",
"start_offset": 10,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 2
}
]
关于elasticsearch - Elasticsearch自定义分析器被忽略,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35565421/