作为一个简单的实验,使用Elasticsearch 2.2,我想从任何以小写字符“s”结尾的单词中删除最后一个字符。例如,单词“声音”将被索引为“声音”。
我正在这样定义分析器:
{
"template": "document-index-template",
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"sFilter": {
"type": "pattern_replace",
"pattern": "([a-zA-Z]+)([s]( |$))",
"replacement": "$2"
}
},
"analyzer": {
"tight": {
"type": "standard",
"filter": [
"sFilter",
"lowercase"
]
}
}
}
}
}
然后,当我使用此请求分析术语“寂静之声”时:
<index>/_analyze?analyzer=tight&text=sounds%20of%20silences
我得到:
{
"tokens": [
{
"token": "sounds",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "of",
"start_offset": 7,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "silences",
"start_offset": 10,
"end_offset": 18,
"type": "<ALPHANUM>",
"position": 2
}
]
}
我期望“声音”为“声音”,“沉默”为“沉默”
最佳答案
上述分析器设置无效。我认为您打算使用的是custom类型的分析器,并将tokenizer设置为standard
例:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"sFilter": {
"type": "pattern_replace",
"pattern": "([a-zA-Z]+)s",
"replacement": "$1"
}
},
"analyzer": {
"tight": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"sFilter"
]
}
}
}
}
}
关于elasticsearch - Elasticsearch-如何从字尾删除S,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37867015/