elasticsearch - Elasticsearch:全文搜索和按对象嵌套数组过滤

标签 elasticsearch elasticsearch-dsl elasticsearch-query

要创建一个基于PostgreSQL中N-join表中的数据构建的GUI表的任务。
该GUI表意味着具有全文搜索功能的排序和过滤。

我想为此目的使用松紧带。准备了用于Elasticsearch的此数据结构:

{
  did_user_read: true,
  view_info: {
      total: 1,
      users: [
          { name: 'John Smith', read_at: '2020-02-04 11:00:01', is_current_user: false },
          { name: 'Samuel Jackson', read_at: '2020-02-04 11:00:01', is_current_user: true },
      ],
  },
  is_favorite: true,
  has_attachments: true,
  from: { 
      short_name: 'You',  
      full_name: 'Chuck Norris',
      email: 'ch.norris@example.com', 
      is_current_user: true 
  },
  subject: 'The secret of the appearance of navel lints',
  received_at: '2020-02-04 11:00:01'
}

请告知如何正确索引此结构,以便可以按嵌套对象和嵌套对象数组进行过滤和搜索?

例如,我想要获得所有符合以下条件的记录:
is_favorite IS false

AND

FULL_TEXT_SEARCH("sam jackson") 
   BY FIELDS 
    users.name,        -- inside of array(!) 
    from.full_name,
    from.short_name

AND

users.is_current_user IS NOT false

AND

ORDER BY received_at DESC

最佳答案

对于上述数据结构,您的elasticsearch索引映射应为:

制图

{
    "mappings": {
        "properties": {
            "did_user_read": {
                "type": "boolean"
            },
            "view_info": {
                "properties": {
                    "total": {
                        "type": "integer"
                    },
                    "users": {
                        "properties": {
                            "name": {
                                "type": "text"
                            },
                            "read_at": {
                                "type": "date",
                                "format": "date_hour_minute_second"
                            },
                            "is_current_user": {
                                "type": "boolean"
                            }
                        }
                    }
                }
            },
            "is_favorite": {
                "type": "boolean"
            },
            "has_attachments": {
                "type": "boolean"
            },
            "from": {
                "properties": {
                    "short_name": {
                        "type": "text"
                    },
                    "full_name": {
                        "type": "text"
                    },
                    "email": {
                        "type": "keyword"
                    },
                    "is_current_user": {
                        "type": "boolean"
                    }
                }
            },
            "subject": {
                "type": "text"
            },
            "received_at": {
                "type": "date",
                "format": "date_hour_minute_second"
            }
        }
    }
}

现在,我已经以您在示例中给出的相同格式对少数文档建立了索引。

根据要求的条件进行的搜索查询应为:

搜索查询:
{
    "query": {
        "bool": {
            "filter": [
                {
                    "term": {
                        "is_favorite": false
                    }
                },
                {
                    "term": {
                        "view_info.users.is_current_user": true  
                    }
                }
            ],
            "must": {
                "multi_match": {
                    "query": "sam jackson",
                    "fields": [
                        "view_info.users.name",
                        "from.full_name",
                        "from.short_name"
                    ]
                }
            }


        }

    },
    "sort": [
    {
      "received_at": {
        "order": "desc"
      }
    }
  ]
}

输出量
"hits": [
      {
        "_index": "topics",
        "_type": "_doc",
        "_id": "3",
        "_score": null,
        "_source": {
          "did_user_read": true,
          "view_info": {
            "total": 1,
            "users": [
              {
                "name": "John Smith",
                "read_at": "2020-02-04T11:00:01",
                "is_current_user": false
              },
              {
                "name": "Samuel Jackson",
                "read_at": "2020-02-04T11:00:01",
                "is_current_user": true
              }
            ]
          },
          "is_favorite": false,
          "has_attachments": true,
          "from": {
            "short_name": "You",
            "full_name": "Chuck Norris",
            "email": "ch.norris@example.com",
            "is_current_user": true
          },
          "subject": "The secret of the appearance of navel lints",
          "received_at": "2020-02-04T11:00:03"
        },
        "sort": [
          1580814003000
        ]
      },
      {
        "_index": "topics",
        "_type": "_doc",
        "_id": "2",
        "_score": null,
        "_source": {
          "did_user_read": true,
          "view_info": {
            "total": 1,
            "users": [
              {
                "name": "John Smith",
                "read_at": "2020-02-04T11:00:01",
                "is_current_user": false
              },
              {
                "name": "Samuel Jackson",
                "read_at": "2020-02-04T11:00:01",
                "is_current_user": true
              }
            ]
          },
          "is_favorite": false,
          "has_attachments": true,
          "from": {
            "short_name": "You",
            "full_name": "Chuck Norris",
            "email": "ch.norris@example.com",
            "is_current_user": true
          },
          "subject": "The secret of the appearance of navel lints",
          "received_at": "2020-02-04T11:00:01"
        },
        "sort": [
          1580814001000
        ]
      }
    ]

说明:

根据您的查询,这就是构造搜索查询的方式:
  • is_favorite IS false and users.is_current_user IS NOT false
    这是通过filter query来完成的。当我们希望我们的文档满足某些条件但对计算搜索文档的分数没有帮助时,将使用过滤器。现在,由于两个查询字段均为 bool(boolean) 值,因为答案为是或否,所以它们将不会有助于计算得分。
  • FULL_TEXT_SEARCH("sam jackson") BY FIELDS users.name, -- inside of array(!) from.full_name, from.short_name
    在这里我们要搜索sam jackson,它们应该在所有3个字段中,因此
    使用match_phrase

  • 这三个条件保留在bool过滤器中,因为存在将它们连接在一起的AND条件
  • ORDER BY received_at DESC
    对于此sort查询,使用

  • 注意:您必须更改存在日期时间的数据,例如read_at,received_at。目前,您采用的格式为2020-02-04 11:00:01。您只需稍作更改,以使它在Elasticsearch中索引文档时采用格式2020-02-04T11:00:01(而不是空间使用T),因为elasticsearch仅接受日期时间格式集。您可以在此处引用日期时间接受的格式https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html

    关于elasticsearch - Elasticsearch:全文搜索和按对象嵌套数组过滤,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61142483/

    相关文章:

    elasticsearch - 将现有字段映射转换为 geoip

    python - 使用 Elasticsearch 获取日志文件

    elasticsearch - 如何检查 Elasticsearch 无痛参数中是否存在 key ?

    elasticsearch - Elasticsearch 全字符串匹配不起作用

    php - PHP Elasticsearch查询

    elasticsearch - Kibana,在条形图中对同一字段的值进行分组

    php - Yii2(或独立)中用于ElasticSearch查询DSL的构建器

    elasticsearch - 在AND情况下将SQL转换为Elasticsearch DSL

    elasticsearch - Elasticsearch查询以查找字符串字段上的完全匹配项(无需分析)

    java - 可以使用复合键制作 spring-data-elasticsearch @Document 吗?