我们定期将简单的设备读数采集到 adx 中。然而,0.28% 的摄取读数稍后可能会更新为新的读数值。这可以通过简单的物化 View 和 arg_max 来纠正。然而,我们还希望优化每天、每月和每年的读数摘要。问题是,我无法在物化 View 之上创建物化 View 。也不可能使用两个摘要子句创建物化 View 。
示例:
.create table Readings (Timestamp:datetime, DeviceName:string, IngestTime:datetime, Reading:decimal)
.ingest inline into table Readings <|
"2022-10-31 23:00:00.0000000", "EX", "2022-11-06 11:02:29.5690000",0.733
"2022-10-31 22:00:00.0000000", "EX", "2022-11-05 11:05:36.4230000",0.763
"2022-10-31 22:00:00.0000000", "EX", "2022-11-01 07:08:55.9580000",0.5
"2022-10-31 22:00:00.0000000", "EX", "2022-11-02 11:04:42.7050000",0.5
"2022-10-31 21:00:00.0000000", "EX", "2022-11-05 11:05:36.4230000",0.856
"2022-10-31 20:00:00.0000000", "EX", "2022-11-05 11:05:36.4230000",0.827
"2022-10-31 20:00:00.0000000", "EX", "2022-11-02 11:04:42.7050000",0
"2022-10-31 20:00:00.0000000", "EX", "2022-11-01 07:08:55.9580000",0
"2022-10-31 19:00:00.0000000", "EX", "2022-11-05 11:05:36.4230000",0.935
.create materialized-view with (backfill=true, DocString="Only latest measures by arg_max",
effectiveDateTime=datetime(2019-01-01),
MaxSourceRecordsForSingleIngest=10000000,
Concurrency=5
) ReadingsLatest on table Readings { Readings
| summarize arg_max(IngestTime, *) by DeviceName, Reading }
给出输出
2022-10-31T23:00:00Z "EX" 2022-11-06T11:02:29.569Z 0.733
2022-10-31T22:00:00Z "EX" 2022-11-05T11:05:36.423Z 0.763
2022-10-31T21:00:00Z "EX" 2022-11-05T11:05:36.423Z 0.856
2022-10-31T20:00:00Z "EX" 2022-11-05T11:05:36.423Z 0.827
2022-10-31T19:00:00Z "EX" 2022-11-05T11:05:36.423Z 0.935
问题在于我们执行聚合查询时的性能:
ReadingsLatest
| summarize Reading_Day = sum(Reading) by Day = startofday(datetime_utc_to_local(Timestamp, 'Europe/Oslo')), DeviceName
所以我们想要一个具体化 View ,但我们不知道如何实现:
.create materialized-view with (backfill=true, DocString="Only latest measures by arg_max",
effectiveDateTime=datetime(2019-01-01),
MaxSourceRecordsForSingleIngest=10000000,
Concurrency=5
) ReadingsLatestDay on materialized-view ReadingsLatest { ReadingsLatest
| summarize Reading_Day = sum(Reading) by Day = startofday(datetime_utc_to_local(Timestamp, 'Europe/Oslo')), DeviceName }
//fails with:
Cannot create materialized view 'ReadingsLatestDay': Materialized view can only be created on top of another materialized view which includes a single any()/anyif()/take_any()/take_anyif() aggregation.
同时尝试两者:
.create materialized-view with (backfill=true, DocString="Only latest measures by arg_max",
effectiveDateTime=datetime(2019-01-01),
MaxSourceRecordsForSingleIngest=10000000,
Concurrency=5
) ReadingsLatestDay on table Readings { Readings
| summarize arg_max(IngestTime, *) by DeviceName, Reading
| summarize Reading_Day = sum(Reading) by Day = startofday(datetime_utc_to_local(Timestamp, 'Europe/Oslo')), DeviceName }
//fails with:
Cannot create materialized view 'ReadingsLatestDay': Materialized views query can only include a single summarize operator over the source table.
我们考虑过的另一个选择是删除在以后摄取中更新的读数,但这似乎也很困难。我们无法弄清楚语法。
最佳答案
您收到的错误是预期的 - 仅当第一个 View 的类型为 take_any(*)
时才支持物化 View - 请参阅 docs here 。也不支持同一物化 View 中存在多个聚合。在 ADX 中没有构建方式来预先计算这两个聚合。您可以创建第一个物化 View (arg_max()
),然后使用 orchestration tools 中的任何一个编排您自己的管道。 ,使用 ingest from query commands 定期查询物化 View 并保留每日聚合。 .
关于azure-data-explorer - 如何优化 adx 上的重复查询和汇总查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74530167/