elasticsearch - 弹性分析器以启用匹配搜索，例如C#，C++，A +

我正在尝试在 Elasticsearch 中创建一个自定义分析器，以启用匹配项，例如C#，C++，A +，目前它仅匹配C，C，A。

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_custom_analyzer": {
                "type": "keyword", 
                "type_table": [
                    "# => ALPHANUM",
                    "+ => ALPHANUM"
                ],
                "filter": [
                    "lowercase"
                ]
                }
            }
        }
    }
}

我尝试使用以下方法分析索引:

{
"analyzer": "my_custom_analyzer",
"text": "CSS, A++, C#.Net, ASP.Net Hospitals is Africa's leading and the fastest growing super specialty care and multi-organ transplantation hospital. Designed the User Interfaces, User Controls according the requirements\n· Developed Cascading Style Sheets (CSS) for User Interface uniformity throughout the   application\n· Involved in programming the business logic layer and data access layer\n· Involved in in developing pages in ASP.Net with C#.Net"
}

结果:

{
"tokens": [
    {
    "token": "CSS, A++, C#.Net, ASP.Net Hospitals is Africa's leading and the fastest growing super specialty care and multi-organ transplantation hospital. Designed the User Interfaces, User Controls according the requirements\n· Developed Cascading Style Sheets (CSS) for User Interface uniformity throughout the   application\n· Involved in programming the business logic layer and data access layer\n· Involved in in developing pages in ASP.Net with C#.Net",
    "start_offset": 0,
    "end_offset": 443,
    "type": "word",
    "position": 0
    }
]
}

另外，我不确定如何启用分析器，是否应该在映射中完成？

{
    "properties": {
        "attachment.content": {
            "type": "my_custom_analyzer"
        }
    }
}

尝试在映射中使用时的响应:

{
"error": {
    "root_cause": [
    {
        "type": "mapper_parsing_exception",
        "reason": "No handler for type [my_custom_analyzer] declared on field [attachment.content]"
    }
    ],
    "type": "mapper_parsing_exception",
    "reason": "No handler for type [my_custom_analyzer] declared on field [attachment.content]"
},
"status": 400
}

任何帮助将不胜感激。

最佳答案

我使用以下方法设法从ES api获得了正确的响应，虽然还不是100％，但是它在正确的轨道上，当前未突出显示哪个问题，但是当使用Analyzer api进行测试时，我得到了一个响应认为方向正确。

{
    "settings": {
        "analysis": {
                    "filter": {
                        "my_delimeter": {
                            "type": "word_delimiter",
                            "type_table": [
                                "# => ALPHANUM",
                                "+ => ALPHANUM",
                                ". => ALPHANUM"
                            ]
                        }
                    },
                    "analyzer": {
                            "my_analyzer": {
                                    "tokenizer": "whitespace",
                                "filter": ["lowercase", "my_delimeter"]
                    }
                }
            }
    }
}

我正在分析的文字:

{
    "analyzer": "my_analyzer",
    "text": "CSS, A++, C#.Net, ASP.Net Hospitals is Africa's leading and the fastest growing super specialty care and multi-organ transplantation hospital. Designed the User Interfaces, User Controls according the requirements\n· Developed Cascading Style Sheets (CSS) for User Interface uniformity throughout the   application\n· Involved in programming the business logic layer and data access layer\n· Involved in in developing pages in ASP.Net with C#.Net"
}

响应:

{
"tokens": [
    {
    "token": "css",
    "start_offset": 0,
    "end_offset": 3,
    "type": "word",
    "position": 0
    },
    {
    "token": "a++",
    "start_offset": 5,
    "end_offset": 8,
    "type": "word",
    "position": 1
    },
    {
    "token": "c#.net",
    "start_offset": 10,
    "end_offset": 16,
    "type": "word",
    "position": 2
    },
    {
    "token": "asp.net",
    "start_offset": 18,
    "end_offset": 25,
    "type": "word",
    "position": 3
    },
    {
    "token": "hospitals",
    "start_offset": 26,
    "end_offset": 35,
    "type": "word",
    "position": 4
    },
    {
    "token": "is",
    "start_offset": 36,
    "end_offset": 38,
    "type": "word",
    "position": 5
    },
    {
    "token": "africa",
    "start_offset": 39,
    "end_offset": 45,
    "type": "word",
    "position": 6
    },
    {
    "token": "leading",
    "start_offset": 48,
    "end_offset": 55,
    "type": "word",
    "position": 7
    },
    {
    "token": "and",
    "start_offset": 56,
    "end_offset": 59,
    "type": "word",
    "position": 8
    },
    {
    "token": "the",
    "start_offset": 60,
    "end_offset": 63,
    "type": "word",
    "position": 9
    },
    {
    "token": "fastest",
    "start_offset": 64,
    "end_offset": 71,
    "type": "word",
    "position": 10
    },
    {
    "token": "growing",
    "start_offset": 72,
    "end_offset": 79,
    "type": "word",
    "position": 11
    },
    {
    "token": "super",
    "start_offset": 80,
    "end_offset": 85,
    "type": "word",
    "position": 12
    },
    {
    "token": "specialty",
    "start_offset": 86,
    "end_offset": 95,
    "type": "word",
    "position": 13
    },
    {
    "token": "care",
    "start_offset": 96,
    "end_offset": 100,
    "type": "word",
    "position": 14
    },
    {
    "token": "and",
    "start_offset": 101,
    "end_offset": 104,
    "type": "word",
    "position": 15
    },
    {
    "token": "multi",
    "start_offset": 105,
    "end_offset": 110,
    "type": "word",
    "position": 16
    },
    {
    "token": "organ",
    "start_offset": 111,
    "end_offset": 116,
    "type": "word",
    "position": 17
    },
    {
    "token": "transplantation",
    "start_offset": 117,
    "end_offset": 132,
    "type": "word",
    "position": 18
    },
    {
    "token": "hospital.",
    "start_offset": 133,
    "end_offset": 142,
    "type": "word",
    "position": 19
    },
    {
    "token": "designed",
    "start_offset": 143,
    "end_offset": 151,
    "type": "word",
    "position": 20
    },
    {
    "token": "the",
    "start_offset": 152,
    "end_offset": 155,
    "type": "word",
    "position": 21
    },
    {
    "token": "user",
    "start_offset": 156,
    "end_offset": 160,
    "type": "word",
    "position": 22
    },
    {
    "token": "interfaces",
    "start_offset": 161,
    "end_offset": 171,
    "type": "word",
    "position": 23
    },
    {
    "token": "user",
    "start_offset": 173,
    "end_offset": 177,
    "type": "word",
    "position": 24
    },
    {
    "token": "controls",
    "start_offset": 178,
    "end_offset": 186,
    "type": "word",
    "position": 25
    },
    {
    "token": "according",
    "start_offset": 187,
    "end_offset": 196,
    "type": "word",
    "position": 26
    },
    {
    "token": "the",
    "start_offset": 197,
    "end_offset": 200,
    "type": "word",
    "position": 27
    },
    {
    "token": "requirements",
    "start_offset": 201,
    "end_offset": 213,
    "type": "word",
    "position": 28
    },
    {
    "token": "developed",
    "start_offset": 216,
    "end_offset": 225,
    "type": "word",
    "position": 29
    },
    {
    "token": "cascading",
    "start_offset": 226,
    "end_offset": 235,
    "type": "word",
    "position": 30
    },
    {
    "token": "style",
    "start_offset": 236,
    "end_offset": 241,
    "type": "word",
    "position": 31
    },
    {
    "token": "sheets",
    "start_offset": 242,
    "end_offset": 248,
    "type": "word",
    "position": 32
    },
    {
    "token": "css",
    "start_offset": 250,
    "end_offset": 253,
    "type": "word",
    "position": 33
    },
    {
    "token": "for",
    "start_offset": 255,
    "end_offset": 258,
    "type": "word",
    "position": 34
    },
    {
    "token": "user",
    "start_offset": 259,
    "end_offset": 263,
    "type": "word",
    "position": 35
    },
    {
    "token": "interface",
    "start_offset": 264,
    "end_offset": 273,
    "type": "word",
    "position": 36
    },
    {
    "token": "uniformity",
    "start_offset": 274,
    "end_offset": 284,
    "type": "word",
    "position": 37
    },
    {
    "token": "throughout",
    "start_offset": 285,
    "end_offset": 295,
    "type": "word",
    "position": 38
    },
    {
    "token": "the",
    "start_offset": 296,
    "end_offset": 299,
    "type": "word",
    "position": 39
    },
    {
    "token": "application",
    "start_offset": 302,
    "end_offset": 313,
    "type": "word",
    "position": 40
    },
    {
    "token": "involved",
    "start_offset": 316,
    "end_offset": 324,
    "type": "word",
    "position": 41
    },
    {
    "token": "in",
    "start_offset": 325,
    "end_offset": 327,
    "type": "word",
    "position": 42
    },
    {
    "token": "programming",
    "start_offset": 328,
    "end_offset": 339,
    "type": "word",
    "position": 43
    },
    {
    "token": "the",
    "start_offset": 340,
    "end_offset": 343,
    "type": "word",
    "position": 44
    },
    {
    "token": "business",
    "start_offset": 344,
    "end_offset": 352,
    "type": "word",
    "position": 45
    },
    {
    "token": "logic",
    "start_offset": 353,
    "end_offset": 358,
    "type": "word",
    "position": 46
    },
    {
    "token": "layer",
    "start_offset": 359,
    "end_offset": 364,
    "type": "word",
    "position": 47
    },
    {
    "token": "and",
    "start_offset": 365,
    "end_offset": 368,
    "type": "word",
    "position": 48
    },
    {
    "token": "data",
    "start_offset": 369,
    "end_offset": 373,
    "type": "word",
    "position": 49
    },
    {
    "token": "access",
    "start_offset": 374,
    "end_offset": 380,
    "type": "word",
    "position": 50
    },
    {
    "token": "layer",
    "start_offset": 381,
    "end_offset": 386,
    "type": "word",
    "position": 51
    },
    {
    "token": "involved",
    "start_offset": 389,
    "end_offset": 397,
    "type": "word",
    "position": 52
    },
    {
    "token": "in",
    "start_offset": 398,
    "end_offset": 400,
    "type": "word",
    "position": 53
    },
    {
    "token": "in",
    "start_offset": 401,
    "end_offset": 403,
    "type": "word",
    "position": 54
    },
    {
    "token": "developing",
    "start_offset": 404,
    "end_offset": 414,
    "type": "word",
    "position": 55
    },
    {
    "token": "pages",
    "start_offset": 415,
    "end_offset": 420,
    "type": "word",
    "position": 56
    },
    {
    "token": "in",
    "start_offset": 421,
    "end_offset": 423,
    "type": "word",
    "position": 57
    },
    {
    "token": "asp.net",
    "start_offset": 424,
    "end_offset": 431,
    "type": "word",
    "position": 58
    },
    {
    "token": "with",
    "start_offset": 432,
    "end_offset": 436,
    "type": "word",
    "position": 59
    },
    {
    "token": "c#.net",
    "start_offset": 437,
    "end_offset": 443,
    "type": "word",
    "position": 60
    }
]
}

尝试了以下映射:

{
    "properties": {
        "attachment.content": {
            "type": "text",
    "search_analyzer": "my_analyzer",
    "analyzer": "my_analyzer",
            "fields": {
                "content": {
                    "type": "text",
                    "analyzer": "my_analyzer"
                }
            }
        }
    }
}

仍然突出的响应是:

"highlight": {
    "skills": [
    "<em>C</em>#",
    "Microsoft Visual Studio <em>C</em># (Windows Form and Web APP) and Java Eclipse"
    ]
}

关于elasticsearch - 弹性分析器以启用匹配搜索，例如C#，C++，A +，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63477160/

elasticsearch - 弹性分析器以启用匹配搜索，例如C#，C++，A +

上一篇：android - 如何在Android中从.wav录制的开始/结束的第一分钟修剪/删除

下一篇：elasticsearch - 对两个字段进行汇总将返回其中一个的null