如何创建仅用'/'字符标记字段的自定义分析器。
我的字段中有用于exp的url字符串:“https://stackoverflow.com/questions/ask”
我想像这样标记:“http”,“stackoverflow.com”,“问题”和“询问”
最佳答案
这似乎可以使用pattern tokenizer完成您想要的操作:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"slash_analyzer": {
"type": "pattern",
"pattern": "[/:]+",
"lowercase": true
}
}
}
},
"mappings": {
"doc": {
"properties": {
"url": {
"type": "string",
"index_analyzer": "slash_analyzer",
"search_analyzer": "standard",
"term_vector": "yes"
}
}
}
}
}
PUT /test_index/doc/1
{
"url": "http://stackoverflow.com/questions/ask"
}
我在映射中添加了term vectors(您可能不想在生产中这样做),因此我们可以看到生成了哪些术语:
GET /test_index/doc/1/_termvector
...
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_version": 1,
"found": true,
"took": 1,
"term_vectors": {
"url": {
"field_statistics": {
"sum_doc_freq": 4,
"doc_count": 1,
"sum_ttf": 4
},
"terms": {
"ask": {
"term_freq": 1
},
"http": {
"term_freq": 1
},
"questions": {
"term_freq": 1
},
"stackoverflow.com": {
"term_freq": 1
}
}
}
}
}
这是我使用的代码:
http://sense.qbox.io/gist/669fbdd681895d7e9f8db13799865c6e8be75b11
关于elasticsearch - elasticsearch自定义分析器按特定字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32951811/