elasticsearch - 弹性分析器以启用匹配搜索,例如C#,C++,A +

标签 elasticsearch

我正在尝试在 Elasticsearch 中创建一个自定义分析器,以启用匹配项,例如C#,C++,A +,目前它仅匹配C,C,A。

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_custom_analyzer": {
                "type": "keyword", 
                "type_table": [
                    "# => ALPHANUM",
                    "+ => ALPHANUM"
                ],
                "filter": [
                    "lowercase"
                ]
                }
            }
        }
    }
}
我尝试使用以下方法分析索引:
{
"analyzer": "my_custom_analyzer",
"text": "CSS, A++, C#.Net, ASP.Net Hospitals is Africa's leading and the fastest growing super specialty care and multi-organ transplantation hospital. Designed the User Interfaces, User Controls according the requirements\n· Developed Cascading Style Sheets (CSS) for User Interface uniformity throughout the   application\n· Involved in programming the business logic layer and data access layer\n· Involved in in developing pages in ASP.Net with C#.Net"
}
结果:
{
"tokens": [
    {
    "token": "CSS, A++, C#.Net, ASP.Net Hospitals is Africa's leading and the fastest growing super specialty care and multi-organ transplantation hospital. Designed the User Interfaces, User Controls according the requirements\n· Developed Cascading Style Sheets (CSS) for User Interface uniformity throughout the   application\n· Involved in programming the business logic layer and data access layer\n· Involved in in developing pages in ASP.Net with C#.Net",
    "start_offset": 0,
    "end_offset": 443,
    "type": "word",
    "position": 0
    }
]
}
另外,我不确定如何启用分析器,是否应该在映射中完成?
{
    "properties": {
        "attachment.content": {
            "type": "my_custom_analyzer"
        }
    }
}
尝试在映射中使用时的响应:
{
"error": {
    "root_cause": [
    {
        "type": "mapper_parsing_exception",
        "reason": "No handler for type [my_custom_analyzer] declared on field [attachment.content]"
    }
    ],
    "type": "mapper_parsing_exception",
    "reason": "No handler for type [my_custom_analyzer] declared on field [attachment.content]"
},
"status": 400
}
任何帮助将不胜感激。

最佳答案

我使用以下方法设法从ES api获得了正确的响应,虽然还不是100%,但是它在正确的轨道上,当前未突出显示哪个问题,但是当使用Analyzer api进行测试时,我得到了一个响应认为方向正确。

{
    "settings": {
        "analysis": {
                    "filter": {
                        "my_delimeter": {
                            "type": "word_delimiter",
                            "type_table": [
                                "# => ALPHANUM",
                                "+ => ALPHANUM",
                                ". => ALPHANUM"
                            ]
                        }
                    },
                    "analyzer": {
                            "my_analyzer": {
                                    "tokenizer": "whitespace",
                                "filter": ["lowercase", "my_delimeter"]
                    }
                }
            }
    }
}
我正在分析的文字:
{
    "analyzer": "my_analyzer",
    "text": "CSS, A++, C#.Net, ASP.Net Hospitals is Africa's leading and the fastest growing super specialty care and multi-organ transplantation hospital. Designed the User Interfaces, User Controls according the requirements\n· Developed Cascading Style Sheets (CSS) for User Interface uniformity throughout the   application\n· Involved in programming the business logic layer and data access layer\n· Involved in in developing pages in ASP.Net with C#.Net"
}
响应:
{
"tokens": [
    {
    "token": "css",
    "start_offset": 0,
    "end_offset": 3,
    "type": "word",
    "position": 0
    },
    {
    "token": "a++",
    "start_offset": 5,
    "end_offset": 8,
    "type": "word",
    "position": 1
    },
    {
    "token": "c#.net",
    "start_offset": 10,
    "end_offset": 16,
    "type": "word",
    "position": 2
    },
    {
    "token": "asp.net",
    "start_offset": 18,
    "end_offset": 25,
    "type": "word",
    "position": 3
    },
    {
    "token": "hospitals",
    "start_offset": 26,
    "end_offset": 35,
    "type": "word",
    "position": 4
    },
    {
    "token": "is",
    "start_offset": 36,
    "end_offset": 38,
    "type": "word",
    "position": 5
    },
    {
    "token": "africa",
    "start_offset": 39,
    "end_offset": 45,
    "type": "word",
    "position": 6
    },
    {
    "token": "leading",
    "start_offset": 48,
    "end_offset": 55,
    "type": "word",
    "position": 7
    },
    {
    "token": "and",
    "start_offset": 56,
    "end_offset": 59,
    "type": "word",
    "position": 8
    },
    {
    "token": "the",
    "start_offset": 60,
    "end_offset": 63,
    "type": "word",
    "position": 9
    },
    {
    "token": "fastest",
    "start_offset": 64,
    "end_offset": 71,
    "type": "word",
    "position": 10
    },
    {
    "token": "growing",
    "start_offset": 72,
    "end_offset": 79,
    "type": "word",
    "position": 11
    },
    {
    "token": "super",
    "start_offset": 80,
    "end_offset": 85,
    "type": "word",
    "position": 12
    },
    {
    "token": "specialty",
    "start_offset": 86,
    "end_offset": 95,
    "type": "word",
    "position": 13
    },
    {
    "token": "care",
    "start_offset": 96,
    "end_offset": 100,
    "type": "word",
    "position": 14
    },
    {
    "token": "and",
    "start_offset": 101,
    "end_offset": 104,
    "type": "word",
    "position": 15
    },
    {
    "token": "multi",
    "start_offset": 105,
    "end_offset": 110,
    "type": "word",
    "position": 16
    },
    {
    "token": "organ",
    "start_offset": 111,
    "end_offset": 116,
    "type": "word",
    "position": 17
    },
    {
    "token": "transplantation",
    "start_offset": 117,
    "end_offset": 132,
    "type": "word",
    "position": 18
    },
    {
    "token": "hospital.",
    "start_offset": 133,
    "end_offset": 142,
    "type": "word",
    "position": 19
    },
    {
    "token": "designed",
    "start_offset": 143,
    "end_offset": 151,
    "type": "word",
    "position": 20
    },
    {
    "token": "the",
    "start_offset": 152,
    "end_offset": 155,
    "type": "word",
    "position": 21
    },
    {
    "token": "user",
    "start_offset": 156,
    "end_offset": 160,
    "type": "word",
    "position": 22
    },
    {
    "token": "interfaces",
    "start_offset": 161,
    "end_offset": 171,
    "type": "word",
    "position": 23
    },
    {
    "token": "user",
    "start_offset": 173,
    "end_offset": 177,
    "type": "word",
    "position": 24
    },
    {
    "token": "controls",
    "start_offset": 178,
    "end_offset": 186,
    "type": "word",
    "position": 25
    },
    {
    "token": "according",
    "start_offset": 187,
    "end_offset": 196,
    "type": "word",
    "position": 26
    },
    {
    "token": "the",
    "start_offset": 197,
    "end_offset": 200,
    "type": "word",
    "position": 27
    },
    {
    "token": "requirements",
    "start_offset": 201,
    "end_offset": 213,
    "type": "word",
    "position": 28
    },
    {
    "token": "developed",
    "start_offset": 216,
    "end_offset": 225,
    "type": "word",
    "position": 29
    },
    {
    "token": "cascading",
    "start_offset": 226,
    "end_offset": 235,
    "type": "word",
    "position": 30
    },
    {
    "token": "style",
    "start_offset": 236,
    "end_offset": 241,
    "type": "word",
    "position": 31
    },
    {
    "token": "sheets",
    "start_offset": 242,
    "end_offset": 248,
    "type": "word",
    "position": 32
    },
    {
    "token": "css",
    "start_offset": 250,
    "end_offset": 253,
    "type": "word",
    "position": 33
    },
    {
    "token": "for",
    "start_offset": 255,
    "end_offset": 258,
    "type": "word",
    "position": 34
    },
    {
    "token": "user",
    "start_offset": 259,
    "end_offset": 263,
    "type": "word",
    "position": 35
    },
    {
    "token": "interface",
    "start_offset": 264,
    "end_offset": 273,
    "type": "word",
    "position": 36
    },
    {
    "token": "uniformity",
    "start_offset": 274,
    "end_offset": 284,
    "type": "word",
    "position": 37
    },
    {
    "token": "throughout",
    "start_offset": 285,
    "end_offset": 295,
    "type": "word",
    "position": 38
    },
    {
    "token": "the",
    "start_offset": 296,
    "end_offset": 299,
    "type": "word",
    "position": 39
    },
    {
    "token": "application",
    "start_offset": 302,
    "end_offset": 313,
    "type": "word",
    "position": 40
    },
    {
    "token": "involved",
    "start_offset": 316,
    "end_offset": 324,
    "type": "word",
    "position": 41
    },
    {
    "token": "in",
    "start_offset": 325,
    "end_offset": 327,
    "type": "word",
    "position": 42
    },
    {
    "token": "programming",
    "start_offset": 328,
    "end_offset": 339,
    "type": "word",
    "position": 43
    },
    {
    "token": "the",
    "start_offset": 340,
    "end_offset": 343,
    "type": "word",
    "position": 44
    },
    {
    "token": "business",
    "start_offset": 344,
    "end_offset": 352,
    "type": "word",
    "position": 45
    },
    {
    "token": "logic",
    "start_offset": 353,
    "end_offset": 358,
    "type": "word",
    "position": 46
    },
    {
    "token": "layer",
    "start_offset": 359,
    "end_offset": 364,
    "type": "word",
    "position": 47
    },
    {
    "token": "and",
    "start_offset": 365,
    "end_offset": 368,
    "type": "word",
    "position": 48
    },
    {
    "token": "data",
    "start_offset": 369,
    "end_offset": 373,
    "type": "word",
    "position": 49
    },
    {
    "token": "access",
    "start_offset": 374,
    "end_offset": 380,
    "type": "word",
    "position": 50
    },
    {
    "token": "layer",
    "start_offset": 381,
    "end_offset": 386,
    "type": "word",
    "position": 51
    },
    {
    "token": "involved",
    "start_offset": 389,
    "end_offset": 397,
    "type": "word",
    "position": 52
    },
    {
    "token": "in",
    "start_offset": 398,
    "end_offset": 400,
    "type": "word",
    "position": 53
    },
    {
    "token": "in",
    "start_offset": 401,
    "end_offset": 403,
    "type": "word",
    "position": 54
    },
    {
    "token": "developing",
    "start_offset": 404,
    "end_offset": 414,
    "type": "word",
    "position": 55
    },
    {
    "token": "pages",
    "start_offset": 415,
    "end_offset": 420,
    "type": "word",
    "position": 56
    },
    {
    "token": "in",
    "start_offset": 421,
    "end_offset": 423,
    "type": "word",
    "position": 57
    },
    {
    "token": "asp.net",
    "start_offset": 424,
    "end_offset": 431,
    "type": "word",
    "position": 58
    },
    {
    "token": "with",
    "start_offset": 432,
    "end_offset": 436,
    "type": "word",
    "position": 59
    },
    {
    "token": "c#.net",
    "start_offset": 437,
    "end_offset": 443,
    "type": "word",
    "position": 60
    }
]
}
尝试了以下映射:
{
    "properties": {
        "attachment.content": {
            "type": "text",
    "search_analyzer": "my_analyzer",
    "analyzer": "my_analyzer",
            "fields": {
                "content": {
                    "type": "text",
                    "analyzer": "my_analyzer"
                }
            }
        }
    }
}
仍然突出的响应是:
"highlight": {
    "skills": [
    "<em>C</em>#",
    "Microsoft Visual Studio <em>C</em># (Windows Form and Web APP) and Java Eclipse"
    ]
}

关于elasticsearch - 弹性分析器以启用匹配搜索,例如C#,C++,A +,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63477160/

相关文章:

scala - Elastic4s-与Elasticsearch集群的HTTPS连接

elasticsearch - ElasticSearch 中的实体标记

arrays - 如何在Elasticsearch中按值数组查询/过滤精确计数?

php - 如何使用PHP API分析字符串

sorting - Elasticsearch(循环排序)

java - 如何使用spring data在elasticsearch中根据结果的最高精度对数据进行排序

elasticsearch - SQL Server数据库> Logstash> Elasticsearch:将与同一实体相关的结果集记录映射到同一ES文档

elasticsearch - 与Elasticsearch中的IN运算符完全匹配

c# - 使用NEST显示 Elasticsearch 命中值

amazon-web-services - 识别 ElasticSearch 集群中使用的 EC2 实例