sql - KQL 如何在 mv-apply 后合并结果行以获得正确的计数

标签 sql database azure azure-data-explorer kql

使用 KQL

给定这个数据集

let MyTable = datatable (VMID:int, ID:string, Type:string, features:dynamic, scanner:dynamic)
[
    1, "ID-1", "Windows", dynamic([
        {"name": "name1", "value": true},
        {"name": "name2", "value": false},
        {"name": "name3", "value": true}
    ]), dynamic([
        {"name": "s1", "expiry": false},
        {"name": "s2", "expiry": true},
        {"name": "s3", "expiry": true},
        {"name": "s4", "expiry": false}
    ]),
    2, "ID-1", "Windows", dynamic([
        {"name": "name1", "value": true},
        {"name": "name2", "value": false},
        {"name": "name3", "value": true}
    ]), dynamic([
        {"name": "s1", "expiry": false},
        {"name": "s2", "expiry": true},
        {"name": "s3", "expiry": false},
        {"name": "s4", "expiry": true}
    ]),
    3, "ID-1", "Linux", dynamic([
        {"name": "name1", "value": true},
        {"name": "name2", "value": false},
        {"name": "name3", "value": true}
    ]), dynamic([
        {"name": "s1", "expiry": false},
        {"name": "s2", "expiry": false},
        {"name": "s3", "expiry": true},
        {"name": "s4", "expiry": false}
    ]),
    4, "ID-2", "Windows", dynamic([
        {"name": "name1", "value": true},
        {"name": "name2", "value": false},
        {"name": "name3", "value": true}
    ]), dynamic([
        {"name": "s1", "expiry": false},
        {"name": "s2", "expiry": true},
        {"name": "s3", "expiry": false},
        {"name": "s4", "expiry": true}
    ]),
    5, "ID-2", "Windows", dynamic([
        {"name": "name1", "value": true},
        {"name": "name2", "value": false},
        {"name": "name3", "value": true}
    ]), dynamic([
        {"name": "s1", "expiry": false},
        {"name": "s2", "expiry": true},
        {"name": "s3", "expiry": true},
        {"name": "s4", "expiry": true}
    ])
];

table input

我想按 name1name2 过滤功能,并按 s1s3 过滤 ScanState,然后对每个类型VMID数量,并给出

VMID列表
  1. 对于 feature.name = name1,其值 == true
  2. feature.name = name2 的值 = false
  3. scanState.name = s1 的过期时间 = false
  4. scanState.name = s3 的过期时间 = true

我的主要问题是使用 mv-apply 时它会将 JSON 分成多行。并且执行 count() 会产生额外的结果。

MyTable
| mv-apply features, scanner on (where  features.name == "name1" or features.name == "name2" or scanner.name == "s1" or scanner.name == "s3" )
| extend feature1State = tobool(features.name == "name1" and features.value == true)
| extend feature2State = tobool(features.name == "name2" and features.value == false)
| extend scan1State = tobool(scanner.name == "s1" and scanner.expiry == false)
| extend scan2State = tobool(scanner.name == "s3" and scanner.expiry == true)
| summarize vmCount = count(VMID),
    f1count = countif(feature1State == true),
    f2count= countif(feature2State== true),
    scan1count = countif(scan1State == true),
    scan2count = countif(scan2State == true),
    f1FailVm = make_set_if(VMID, feature1State == false and isnotempty( VMID)),
    f2FailVm = make_set_if(VMID, feature2State== false and isnotempty( VMID)),
    scan1FailVm = make_set_if(VMID, scan1State == false and isnotempty( VMID)),
    scan2FailVm = make_set_if(VMID, scan2State == false and isnotempty( VMID))
    by ID, Type

它的输出 result output

请注意,我得到的 ID-1 的虚拟机计数为 6,这是不正确的。这是因为 mv-apply 本质上创建了多行。 由于 featuresScanner 中的条目数量不同,因此它会选择其中的最大值。

有没有更好的方法来解决这个问题

更新 1:使用 count_distinct 解决 VMID 但如何获取正确的失败列表值。 使用 count_distinct 输出 count_distinct

最佳答案

  • 使用vmCount = count_distinct( VMID)而不是count(VMID)count_distinct函数用于计算不同 VMID 的数量按类型和 ID 字段列出的值。
  • f1FailVm字段是 feature1State 中没有 true 的 VMID 列表类型和 ID 中的字段。对于这种情况,将第一组作为 VMID 的集合。具有 false 的值feature1State 中的值字段,第二组为 VMID 的集合具有 true 的值feature1State 中的值字段,然后使用 set_difference他们之间。

代码:

MyTable
| mv-apply features, scanner on (where features.name == "name1" or features.name == "name2" or scanner.name == "s1" or scanner.name == "s3" )
| extend feature1State = tobool(features.name == "name1" and features.value == true)
| extend feature2State = tobool(features.name == "name2" and features.value == false)
| extend scan1State = tobool(scanner.name == "s1" and scanner.expiry == false)
| extend scan2State = tobool(scanner.name == "s3" and scanner.expiry == true)
| summarize vmCount = count_distinct( VMID),
f1count = countif(feature1State == true),
f2count= countif(feature2State== true),
scan1count = countif(scan1State == true),
scan2count = countif(scan2State == true),
f1failVM= set_difference(make_set_if(VMID, feature1State == false), make_set_if(VMID, feature1State == true)),
f2FailVM = set_difference(make_set_if(VMID, feature2State == false), make_set_if(VMID, feature2State == true)),
scan1FailVm = set_difference(make_set_if(VMID, scan1State == false), make_set_if(VMID, scan1State == true)),
scan2FailVm = set_difference(make_set_if(VMID, scan2State == false), make_set_if(VMID, scan2State == true))
by ID, Type

输出:

<表类=“s-表”> <标题> ID 类型 vmCount f1计数 f2count 扫描1次 扫描2次 f1failVM f2FailVM scan1FailVm scan2FailVm <正文> ID-1 Windows 2 2 2 2 1 [] [] [] [2] ID-1 Linux 1 1 1 1 1 [] [] [] [] ID-2 Windows 2 2 2 2 1 [] [] [] [4]

fiddle

关于sql - KQL 如何在 mv-apply 后合并结果行以获得正确的计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76472613/

相关文章:

mysql - 对结果集进行分页时从 mysql 子查询返回最后一行

azure - Azure 上的 RabbitMQ 连接超时

java - 无法使用java代码从azure服务器下载图像

mysql - Django 查询和存储过程(MySQL)之间的性能差异?

PHP SQL ON DUPLICATE KEY 没有影响

mysql - sqlyog 上的 "please select equal number of source and reference"

c# - 从 Azure 表中删除数据

mysql - mySQL记录集中记录太多,连接错误?

sql - Oracle 跨行散列数据

sql - 在 SQL Server 2008 中创建数据库