azure-data-explorer - 如何优化 adx 上的重复查询和汇总查询

标签 azure-data-explorer kql

我们定期将简单的设备读数采集到 adx 中。然而,0.28% 的摄取读数稍后可能会更新为新的读数值。这可以通过简单的物化 View 和 arg_max 来纠正。然而,我们还希望优化每天、每月和每年的读数摘要。问题是,我无法在物化 View 之上创建物化 View 。也不可能使用两个摘要子句创建物化 View 。

示例:

.create table Readings (Timestamp:datetime, DeviceName:string, IngestTime:datetime, Reading:decimal)

.ingest inline into table Readings <|
    "2022-10-31 23:00:00.0000000", "EX", "2022-11-06 11:02:29.5690000",0.733 
    "2022-10-31 22:00:00.0000000", "EX", "2022-11-05 11:05:36.4230000",0.763
    "2022-10-31 22:00:00.0000000", "EX", "2022-11-01 07:08:55.9580000",0.5
    "2022-10-31 22:00:00.0000000", "EX", "2022-11-02 11:04:42.7050000",0.5
    "2022-10-31 21:00:00.0000000", "EX", "2022-11-05 11:05:36.4230000",0.856
    "2022-10-31 20:00:00.0000000", "EX", "2022-11-05 11:05:36.4230000",0.827
    "2022-10-31 20:00:00.0000000", "EX", "2022-11-02 11:04:42.7050000",0
    "2022-10-31 20:00:00.0000000", "EX", "2022-11-01 07:08:55.9580000",0
    "2022-10-31 19:00:00.0000000", "EX", "2022-11-05 11:05:36.4230000",0.935

.create materialized-view with (backfill=true, DocString="Only latest measures by arg_max", 
        effectiveDateTime=datetime(2019-01-01), 
        MaxSourceRecordsForSingleIngest=10000000, 
        Concurrency=5 
) ReadingsLatest on table Readings { Readings 
| summarize arg_max(IngestTime, *) by DeviceName, Reading } 

给出输出

2022-10-31T23:00:00Z     "EX"  2022-11-06T11:02:29.569Z   0.733
2022-10-31T22:00:00Z    "EX"  2022-11-05T11:05:36.423Z   0.763
2022-10-31T21:00:00Z    "EX"  2022-11-05T11:05:36.423Z   0.856
2022-10-31T20:00:00Z    "EX"  2022-11-05T11:05:36.423Z   0.827
2022-10-31T19:00:00Z    "EX"  2022-11-05T11:05:36.423Z   0.935

问题在于我们执行聚合查询时的性能:

ReadingsLatest
| summarize Reading_Day = sum(Reading) by Day = startofday(datetime_utc_to_local(Timestamp, 'Europe/Oslo')), DeviceName

所以我们想要一个具体化 View ,但我们不知道如何实现:

.create materialized-view with (backfill=true, DocString="Only latest measures by arg_max", 
        effectiveDateTime=datetime(2019-01-01), 
        MaxSourceRecordsForSingleIngest=10000000, 
        Concurrency=5 
) ReadingsLatestDay on materialized-view ReadingsLatest { ReadingsLatest 
| summarize Reading_Day = sum(Reading) by Day = startofday(datetime_utc_to_local(Timestamp, 'Europe/Oslo')), DeviceName }

//fails with:
Cannot create materialized view 'ReadingsLatestDay': Materialized view can only be created on top of another materialized view which includes a single any()/anyif()/take_any()/take_anyif() aggregation.

同时尝试两者:

.create materialized-view with (backfill=true, DocString="Only latest measures by arg_max", 
        effectiveDateTime=datetime(2019-01-01), 
        MaxSourceRecordsForSingleIngest=10000000, 
        Concurrency=5 
) ReadingsLatestDay on table Readings { Readings 
| summarize arg_max(IngestTime, *) by DeviceName, Reading  
| summarize Reading_Day = sum(Reading) by Day = startofday(datetime_utc_to_local(Timestamp, 'Europe/Oslo')), DeviceName }

//fails with:
Cannot create materialized view 'ReadingsLatestDay': Materialized views query can only include a single summarize operator over the source table.

我们考虑过的另一个选择是删除在以后摄取中更新的读数,但这似乎也很困难。我们无法弄清楚语法。

最佳答案

您收到的错误是预期的 - 仅当第一个 View 的类型为 take_any(*) 时才支持物化 View - 请参阅 docs here 。也不支持同一物化 View 中存在多个聚合。在 ADX 中没有构建方式来预先计算这两个聚合。您可以创建第一个物化 View (arg_max()),然后使用 orchestration tools 中的任何一个编排您自己的管道。 ,使用 ingest from query commands 定期查询物化 View 并保留每日聚合。 .

关于azure-data-explorer - 如何优化 adx 上的重复查询和汇总查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74530167/

相关文章:

sql - KQL 如何在 mv-apply 后合并结果行以获得正确的计数

azure - 需要一个 KQL 查询来比较今天在特定时间失败的 API 计数与昨天同一时间失败的 API 计数

azure - KQL Azure 工作簿 : Filtering AppInsights cross-resource query by subscription

azure - 使用 kusto 查询从 url 中的 id 获取类别名称

azure - 如何在不显式指定名称的情况下将 json 键值对投影到列

azure - KQL Kusto 使用一个项目重命名重命名多个列

azure - 使用 Kusto 查询语言的具有单个数字的图表

azure - 使用 Kusto 查询在 Azure LAWS 中创建计算机组

azure - Kusto 时间范围内重叠间隔的数量

azure - 查询动态每个主机名