我对Elasticsearch来说还很陌生,所以请多多包涵,让我知道是否需要提供其他信息。我继承了一个项目,需要实现新的搜索功能。文档/映射结构已经到位,但是如果不能方便地实现,则可以更改。我正在使用Elasticsearch版本5.6.16。
公司能够提供多种服务。每个服务产品都归为一组。每组是3个类别的 Composer ;
文档结构如下:
[{
"id": 4485,
"name": "Company A",
// ...
"services": {
"595": {
"1": [
95, 97, 91
],
"3": [
475, 476, 471
],
"4": [
644, 645, 683
]
},
"596": {
"1": [
91, 89, 76
],
"3": [
476, 476, 301
],
"4": [
644, 647, 555
]
},
"597": {
"1": [
92, 93, 89
],
"3": [
473, 472, 576
],
"4": [
641, 645, 454
]
},
}
}]
在上面的例子中; 595、596和597是与该集合有关的ID。 1、3和4与类别有关(如上所述)。
映射看起来像;
[{
"id": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"services": {
"properties": {
// ...
"595": {
"properties": {
"1": {"type": "long"},
"3": {"type": "long"},
"4": {"type": "long"}
}
},
"596": {
"properties": {
"1": {"type": "long"},
"3": {"type": "long"},
"4": {"type": "long"}
}
},
// ...
}
},
}]
在搜索提供产品(ID 1)的公司时-搜索91和95将返回公司A,因为这些ID在同一集合内。但是,如果我要搜索95和76,它将不会返回公司A-尽管公司确实同时提供这两种产品,但它们不在同一集合中。搜索流程和 Material 或它们的组合时,将应用这些相同的规则。
我希望确认当前的文档/映射结构将有助于这种搜索。
谢谢您的帮助。
最佳答案
让ID
本身作为field
本身的值出现是一个坏主意,因为这可能会导致创建如此多的反向索引(请记住,在Elasticsearch中,反向索引是在每个字段上创建的),但我认为并非如此有这样的事情是合理的。
而是将数据模型更改为如下所示。我还提供了示例文档,您可以应用的可能查询以及响应的显示方式。
请注意,仅为了简单起见,我仅关注映射中提到的services
字段。
对应:
PUT my_services_index
{
"mappings": {
"properties": {
"services":{
"type": "nested", <----- Note this
"properties": {
"service_key":{
"type": "keyword" <----- Note that I have mentioned keyword here. Feel free to use text and keyword if you plan to implement partial + exact search.
},
"product_key": {
"type": "keyword"
},
"product_values": {
"type": "keyword"
},
"process_key":{
"type": "keyword"
},
"process_values":{
"type": "keyword"
},
"material_key":{
"type": "keyword"
},
"material_values":{
"type": "keyword"
}
}
}
}
}
}
注意,我已经使用了nested数据类型。我建议您通过该链接了解为什么我们需要它而不是使用普通的
object
类型。样本文件:
POST my_services_index/_doc/1
{
"services":[
{
"service_key": "595",
"process_key": "1",
"process_values": ["95", "97", "91"],
"product_key": "3",
"product_values": ["475", "476", "471"],
"material_key": "4",
"material_values": ["644", "645", "643"]
},
{
"service_key": "596",
"process_key": "1",
"process_values": ["91", "89", "75"],
"product_key": "3",
"product_values": ["476", "476", "301"],
"material_key": "4",
"material_values": ["644", "647", "555"]
}
]
}
请注意,如果数据最终具有多个组合或
product_key, process_key and material_key
,现在将如何管理它们。解释上述文档的方式是,在
my_services_index
文档中有两个嵌套文档。查询样例:
POST my_services_index/_search
{
"_source": "services.service_key",
"query": {
"bool": {
"must": [
{
"nested": { <---- Note this
"path": "services",
"query": {
"bool": {
"must": [
{
"term": {
"services.service_key": "595"
}
},
{
"term": {
"services.process_key": "1"
}
},
{
"term": {
"services.process_values": "95"
}
}
]
}
},
"inner_hits": {} <---- Note this
}
}
]
}
}
}
请注意,我已经使用了Nested Query。
响应:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.828546,
"hits" : [ <---- Note this. Which would return the original document.
{
"_index" : "my_services_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.828546,
"_source" : {
"services" : [
{
"service_key" : "595",
"process_key" : "1",
"process_values" : [
"95",
"97",
"91"
],
"product_key" : "3",
"product_values" : [
"475",
"476",
"471"
],
"material_key" : "4",
"material_values" : [
"644",
"645",
"643"
]
},
{
"service_key" : "596",
"process_key" : "1",
"process_values" : [
"91",
"89",
"75"
],
"product_key" : "3",
"product_values" : [
"476",
"476",
"301"
],
"material_key" : "4",
"material_values" : [
"644",
"647",
"555"
]
}
]
},
"inner_hits" : { <--- Note this, which would tell you which inner document has been a hit.
"services" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.828546,
"hits" : [
{
"_index" : "my_services_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "services",
"offset" : 0
},
"_score" : 1.828546,
"_source" : {
"service_key" : "595",
"process_key" : "1",
"process_values" : [
"95",
"97",
"91"
],
"product_key" : "3",
"product_values" : [
"475",
"476",
"471"
],
"material_key" : "4",
"material_values" : [
"644",
"645",
"643"
]
}
}
]
}
}
}
}
]
}
}
请注意,我已经使用了
keyword
数据类型。请随意使用数据类型,以及所有字段的业务需求。我提供的想法是为了帮助您了解文档模型。
希望这可以帮助!
关于json - 带有嵌套集的Elasticsearch查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60913318/