我在elasticsearch中运行术语统计,并且得到结果:
"tevez's": {
"doc_freq": 165,
"ttf": 245,
"term_freq": 1,
"tokens": [
{
"position": 722,
"start_offset": 4077,
"end_offset": 4084
}
],
"score": 9.041515
如何告诉Elasticsearch考虑
tevez's
和tevez
一样吗?
我也得到:
"benched": {
"doc_freq": 130,
"ttf": 140,
"term_freq": 1,
"tokens": [
{
"position": 757,
"start_offset": 4292,
"end_offset": 4299
}
],
"score": 9.278306
如何告诉elasticsearch将
benched
和bench
视为相同?
最佳答案
possessive_english
删除's
porter
或其他词干删除时态和其他对于英语,这是词干的完整list。
另外,您需要创建如下设置:
{
"settings": {
"index": {
"analysis": {
"filter": {
"possessive": {
"type": "stemmer",
"language": "possessive_english"
},
"porter": {
"type": "stemmer",
"language": "english"
}
},
"analyzer": {
"custom_english": {
"tokenizer": "standard",
"filter": [
"lowercase",
"porter",
"possessive"
]
}
}
}
}
}
}
最后,请求
$endpoint/$index/_analyze?analyzer=persian_keyword_analyzer&text=$text
以查看词干效果。
关于elasticsearch - Elasticsearch分析器配置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40973537/