json - 将ElasticSearch文档按xml标记值分组(在字符串字段中)

标签 json xml elasticsearch aggregate

我的ElasticSearch索引上有此类文档:

{
    "took" : 31,
    "timed_out" : false,
    "_shards" : {
        "total" : 68,
        "successful" : 68,
        "failed" : 0
    },
    "hits" : {
        "total" : 9103,
        "max_score" : 8.823501,
        "hits" : [{
                "_index" : "ESB",
                "_type" : "MDOrderFO",
                "_id" : "AVaxDzEGBclOg4W8YiW1",
                "_score" : 8.823501,
                "_source" : {
                    "message" : "<root><flux>MyFlux</flux><requestId>123</requestId><timeStamp>2016-26-08T09:37:17</timeStamp><step>1</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                    "timestamp" : "2016-08-22T07:02:57.085Z",
                    "logger_name" : "MDOrderFOToFO"
                }
            }, {
                "_index" : "ESB",
                "_type" : "MDOrderFO",
                "_id" : "AVaxDzEGBclOg4W8YiW1",
                "_score" : 8.823501,
                "_source" : {
                    "message" : "<root><flux>MyFlux</flux><requestId>123</requestId><timeStamp>2016-26-08T09:37:17</timeStamp><step>2</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                    "timestamp" : "2016-08-22T07:02:57.085Z",
                    "logger_name" : "MDOrderFOToFO"
                }
            }, {
                "_index" : "ESB",
                "_type" : "MDOrderFO",
                "_id" : "AVaxDzEGBclOg4W8YiW1",
                "_score" : 8.823501,
                "_source" : {
                    "message" : "<root><flux>MyFlux</flux><requestId>123</requestId><timeStamp>2016-26-08T09:37:18</timeStamp><step>3</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                    "timestamp" : "2016-08-22T07:02:57.085Z",
                    "logger_name" : "MDOrderFOToFO"
                }
            }, {
                "_index" : "ESB",
                "_type" : "MDOrderFO",
                "_id" : "AVaxDzEGBclOg4W8YiW1",
                "_score" : 8.823501,
                "_source" : {
                    "message" : "<root><flux>MyFlux</flux><requestId>123</requestId><timeStamp>2016-26-08T09:37:26</timeStamp><step>1</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                    "timestamp" : "2016-08-22T07:02:57.085Z",
                    "logger_name" : "MDOrderFOToFO"
                }
            }, {
                "_index" : "ESB",
                "_type" : "MDOrderFO",
                "_id" : "AVaxDzEGBclOg4W8YiW1",
                "_score" : 8.823501,
                "_source" : {
                    "message" : "<root><flux>MyFlux</flux><requestId>456</requestId><timeStamp>2016-26-08T09:37:27</timeStamp><step>2</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                    "timestamp" : "2016-08-22T07:02:57.085Z",
                    "logger_name" : "MDOrderFOToFO"
                }
            }, {
                "_index" : "ESB",
                "_type" : "MDOrderFO",
                "_id" : "AVaxDzEGBclOg4W8YiW1",
                "_score" : 8.823501,
                "_source" : {
                    "message" : "<root><flux>MyFlux</flux><requestId>456</requestId><timeStamp>2016-26-08T09:37:27</timeStamp><step>3</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                    "timestamp" : "2016-08-22T07:02:57.085Z",
                    "logger_name" : "MDOrderFOToFO"
                }
            }, {
                "_index" : "ESB",
                "_type" : "MDOrderFO",
                "_id" : "AVaxDzEGBclOg4W8YiW1",
                "_score" : 8.823501,
                "_source" : {
                    "message" : "<root><flux>MyFlux</flux><requestId>456</requestId><timeStamp>2016-26-08T09:37:17</timeStamp><step>2</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                    "timestamp" : "2016-08-22T07:02:57.085Z",
                    "logger_name" : "MDOrderFOToFO"
                }
            }
        ]
    }
}

这是消息字段的XML格式:
<root>
    <flux>MyFlux</flux>
    <requestId>123</requestId>
    <timeStamp>2016-26-08T09:37:17</timeStamp>
    <step>2</step>
    <status>ok</status>
    <body><xml><myobject><field1>value1</field1></myobject></xml></body>
</root>

我想构建一个查询,该查询可以将我的文档按RequestId值(位于消息字段的XML内容中)分组。
我希望这种答案:
{
    "took" : 31,
    "timed_out" : false,
    "_shards" : {
        "total" : 68,
        "successful" : 68,
        "failed" : 0
    },
    "hits" : {
        "total" : 9103,
        "max_score" : 8.823501,
        "hits" : [...],
        "aggregations" : {
            "myaggs" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [{
                        "key" : "123",
                        "documents" : [{
                                "_index" : "ESB",
                                "_type" : "MDOrderFO",
                                "_id" : "AVaxDzEGBclOg4W8YiW1",
                                "_score" : 8.823501,
                                "_source" : {
                                    "message" : "<root><flux>MyFlux</flux><requestId>123</requestId><timeStamp>2016-26-08T09:37:17</timeStamp><step>1</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                                    "timestamp" : "2016-08-22T07:02:57.085Z",
                                    "logger_name" : "MDOrderFOToFO"
                                }
                            }, {
                                "_index" : "ESB",
                                "_type" : "MDOrderFO",
                                "_id" : "AVaxDzEGBclOg4W8YiW1",
                                "_score" : 8.823501,
                                "_source" : {
                                    "message" : "<root><flux>MyFlux</flux><requestId>123</requestId><timeStamp>2016-26-08T09:37:17</timeStamp><step>2</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                                    "timestamp" : "2016-08-22T07:02:57.085Z",
                                    "logger_name" : "MDOrderFOToFO"
                                }
                            }, {
                                "_index" : "ESB",
                                "_type" : "MDOrderFO",
                                "_id" : "AVaxDzEGBclOg4W8YiW1",
                                "_score" : 8.823501,
                                "_source" : {
                                    "message" : "<root><flux>MyFlux</flux><requestId>123</requestId><timeStamp>2016-26-08T09:37:18</timeStamp><step>3</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                                    "timestamp" : "2016-08-22T07:02:57.085Z",
                                    "logger_name" : "MDOrderFOToFO"
                                }
                            }
                        ]
                    }, {
                        "key" : "456",
                        "documents" : [{
                                "_index" : "ESB",
                                "_type" : "MDOrderFO",
                                "_id" : "AVaxDzEGBclOg4W8YiW1",
                                "_score" : 8.823501,
                                "_source" : {
                                    "message" : "<root><flux>MyFlux</flux><requestId>123</requestId><timeStamp>2016-26-08T09:37:26</timeStamp><step>1</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                                    "timestamp" : "2016-08-22T07:02:57.085Z",
                                    "logger_name" : "MDOrderFOToFO"
                                }
                            }, {
                                "_index" : "ESB",
                                "_type" : "MDOrderFO",
                                "_id" : "AVaxDzEGBclOg4W8YiW1",
                                "_score" : 8.823501,
                                "_source" : {
                                    "message" : "<root><flux>MyFlux</flux><requestId>456</requestId><timeStamp>2016-26-08T09:37:27</timeStamp><step>2</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                                    "timestamp" : "2016-08-22T07:02:57.085Z",
                                    "logger_name" : "MDOrderFOToFO"
                                }
                            }, {
                                "_index" : "ESB",
                                "_type" : "MDOrderFO",
                                "_id" : "AVaxDzEGBclOg4W8YiW1",
                                "_score" : 8.823501,
                                "_source" : {
                                    "message" : "<root><flux>MyFlux</flux><requestId>456</requestId><timeStamp>2016-26-08T09:37:27</timeStamp><step>3</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                                    "timestamp" : "2016-08-22T07:02:57.085Z",
                                    "logger_name" : "MDOrderFOToFO"
                                }
                            }, {
                                "_index" : "ESB",
                                "_type" : "MDOrderFO",
                                "_id" : "AVaxDzEGBclOg4W8YiW1",
                                "_score" : 8.823501,
                                "_source" : {
                                    "message" : "<root><flux>MyFlux</flux><requestId>456</requestId><timeStamp>2016-26-08T09:37:17</timeStamp><step>2</step><status>ok</status><body><xml><myobject><field1>value1</field1></myobject></xml></body></root>",
                                    "timestamp" : "2016-08-22T07:02:57.085Z",
                                    "logger_name" : "MDOrderFOToFO"
                                }
                            }
                        ]
                    }
                ]
            }
        }
    }
}

我是ElasticSearch的新手,我花了一个星期的时间……目前,我什至不知道这样做是否可行。

我真的希望您能为我提供帮助。
先感谢您。

当然,以法语为母语,对不起我的英语

编辑
-很遗憾,我无法编辑映射。我无权访问将日志保存到E.S.的流程部分。
-实际上,我给出的格式相对于现实是安静的简化了。在映射级别和XML内容中记录了许多其他技术信息。
上下文:将日志推送到E.S.的BUS应用程序包含3个步骤(1:接收,2:路由,3:发送)。它记录有关请求状态(正常,失败)和在此请求中正在传输的对象的信息。
我正在使用的应用程序的目的是显示有关日期范围内已转移的所有请求(BUS应用程序)的业务信息。
因此,在我的查询中,我想:
1.按RequestId汇总我的日志(每个组在接收步骤应包含1个日志,在路由步骤应包含0或1个日志,而在发送步骤应包含0或1个日志)
2.在接收步骤中,按照日志日期过滤结果组
3.按日期降序排列前10组

最佳答案

一种方法是修改数据库架构。由于您的xml模式是固定的,因此您可以将每个xml节点存储在Elastic中的单独文件中,而不是将整个xml存储在单个字段中。例如fluxrequestIdtimeStamp等将映射到Elastic中的单独文件(可能具有相同的名称)。

关于json - 将ElasticSearch文档按xml标记值分组(在字符串字段中),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39163008/

相关文章:

json - 如何将文件夹组织的 .json 文件合并到单个 JSON 文件中,并以文件夹/文件名作为键

xml - 如何给同级的xml节点分配ID

elasticsearch - Logstash似乎更改了Elasticsearch输出URL

python - python 中的 elasticsearch-dsl 库在使用 search.from_dict() 方法从字典语法构造查询时给出双重结果

elasticsearch - 带有字符串和对象的json数组在Elasticsearch中的设置映射内

php - 如何使用 PHP 将原始 json 字符串存储在 cookie 中?

iphone - 使用 JSON 查询模拟 MVC 架构

xml - 在 XML/Xpath 中转义引号的具体问题

python - Qt Designer 或关联的 Qt 工具是否具有针对 xml 格式的 *.ui 文件的全局查找和替换功能?

c# - jqgrid 未捕获类型错误 : Cannot read property 'stype' of undefined