我正在尝试使用一个字符串进行搜索,该字符串包含多个以逗号分隔的字符串。 [可能与整个值文本不匹配,可以是部分值,传递的项目应在文本中]
注意:我也尝试过n-gram,但是它不能提供正确的数据。
(例如:搜索词“数据科学”给出所有“数据”,“科学”,“数据科学”)
文档在ES中:
{
"_index": "questions_dev",
"_type": "_doc",
"_id": "188",
"_score": 6.6311107,
"_source": {
"questionId": 188,
"questionText": "What other social media platforms do you use on your own time?",
"domainId": 2,
"subdomainId": 25,
"type": "TEXT",
"difficulty": 1,
"time": 600,
"domain": "Domain Specific",
"subdomain": "Social Media Specialist",
"skill": ["social media"]
}
}
我到目前为止所做的:索引:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"default": {
"tokenizer": "custom_tokenizer",
"filter": ["lowercase"]
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "pattern",
"pattern": ",",
},
}
}
},
"mappings": {
"properties": {
"questionId": {
"type": "long"
},
"questionText": {
"type": "text",
},
"domain": {
"type": "text"
},
"subdomain": {
"type": "text"
},
"type":{
"type": "keyword"
},
"difficulty":{
"type": "keyword"
},
"totaltime":{
"type": "keyword"
},
"domainId":{
"type": "keyword"
},
"subdomainId":{
"type": "keyword"
}
}
}
}
查询:{
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": ["questionText","skill"],
"query": "social media"
}
}
]
}
}
}
输出:{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
预期产量:{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 6.6311107,
"hits": [
{
"_index": "questions_development",
"_type": "_doc",
"_id": "188",
"_score": 6.6311107,
"_source": {
"questionId": 188,
"questionText": "What other social media platforms do you use on your own time?",
"domainId": 2,
"subdomainId": 25,
"type": "TEXT",
"difficulty": 1,
"time": 600,
"domain": "Domain Specific",
"subdomain": "Social Media Specialist",
"skill": []
}
}
]
}
}
目标:使用字符串搜索所有包含该字符串的文档。
例:
如果我使用
"social media"
搜索,则应该返回上述文档。(就我而言,它没有返回。)
此搜索还应支持以逗号分隔的搜索机制。
这意味着,我可以传递“社交媒体,自己的时间”,并且期望输出的
questionTexts
文本包含这些字符串中的任何一个。
最佳答案
您正在索引 social media, own time
的数据包含,
和own time
之间的空格。因此,您先前的映射生成的 token 为:
{
"tokens": [
{
"token": " social media", <-- note the preceding whitespace here
"start_offset": 0,
"end_offset": 12,
"type": "word",
"position": 0
},
{
"token": " own time", <-- note the preceding whitespace here
"start_offset": 13,
"end_offset": 22,
"type": "word",
"position": 1
}
]
}因此,在搜索查询中,当您使用
"query": "social media"
时(没有空格),开始时不会显示搜索结果。但是,如果以这种方式查询"query": " social media"
(开头包含空格),则搜索结果将在那里。要从流中的每个 token 中删除开头和结尾的空格,可以使用Trim Token filter
添加带有索引数据,映射和搜索查询的工作示例
索引映射:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"default": {
"tokenizer": "custom_tokenizer",
"filter": [
"lowercase",
"trim" <-- note this
]
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "pattern",
"pattern": ",",
"filter": [
"trim" <-- note this
]
}
}
}
},
"mappings": {
"properties": {
"questionText": {
"type": "text"
}
}
}
}
索引数据:{ "questionText": "social media" }
{ "questionText": "social media, own time" }
搜索查询: {
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": [
"questionText"
],
"query": "own time" <-- no whitespace included in the
beginning
}
}
]
}
}
}
搜索结果:"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "2",
"_score": 0.60996956,
"_source": {
"questionText": "social media, own time"
}
}
更新1:索引设置
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": ","
}
}
}
}
}
索引数据:{
"questionText": "What other platforms do you use on your ?"
}
{
"questionText": "What other social time platforms do you use on your?"
}
{
"questionText": "What other social media platforms do you use on your?"
}
{
"questionText": "What other platforms do you use on your own time?"
}
搜索查询:{
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": "questionText",
"query": "social media, own time"
}
}
]
}
}
}
搜索结果"hits": [
{
"_index": "my-index3",
"_type": "_doc",
"_id": "1",
"_score": 2.5628972,
"_source": {
"questionText": "What other social media platforms do you use on your own time?"
}
},
{
"_index": "my-index3",
"_type": "_doc",
"_id": "2",
"_score": 1.3862944,
"_source": {
"questionText": "What other social media platforms do you use on your?"
}
},
{
"_index": "my-index3",
"_type": "_doc",
"_id": "3",
"_score": 1.3862944,
"_source": {
"questionText": "What other platforms do you use on your own time?"
}
}
]
关于elasticsearch - 在ElasticSearch查询和索引中使用逗号分隔的字符串进行搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63056802/