relational-database - Kusto 删除部分重复项

标签 relational-database azure-data-explorer kql

使用表 storedata,我尝试删除行“Target TargetCheese 4” 这里的逻辑是,如果给定商店中同一产品有两个或多个条目,它将根据其他行选择最适合该商店的 StoreNumber。如果 StoreNumber 不匹配,但不是重复的产品,则该编号不会更改;例如,即使有更多 StoreNumber 为 6 的 Safeway 条目,SafewayEggs 的 StoreNumber 也将等于 1,因为 SafewayEggs 只有一行。

let storedata=
datatable (Store:string,    Product:string  ,StoreNumber:string)
["Target",  "TargetCheese", "4",
"Target",   "TargetCheese", "5",
"Target",   "TargetApple",  "5",
"Target",   "TargetCorn",   "5",
"Target",   "TargetEggs",   "5",
"Kroger",   "KrogerApple",  "2",
"Kroger",   "KrogerCorn",   "2",
"Kroger",   "KrogerEggs",   "2",
"Safeway",  "SafewayApple", "6",
"Safeway",  "SafewayCorn",  "6",
"Safeway",   "SafewayEggs", "1"
];

我希望从 storedata 表中看到这个结果表:

Store   Product StoreNumber
Target  TargetCheese 5
Target  TargetApple 5
Target  TargetCorn  5
Target  TargetEggs  5
Kroger  KrogerApple 2
Kroger  KrogerCorn  2
Kroger  KrogerEggs  2
Safeway SafewayApple 6
Safeway SafewayCorn 6
Safeway SafewayEggs 1

最佳答案

您可能需要不同的步骤:

  1. 找到“最适合”的 StoreNumber - 在下面的示例中,出现次数最多的 StoreNumber,请使用 arg_max
  2. 必须使用 (1) 清理的数据集,每个商店和产品出现超过 1 次,使用计数
  3. 无需清理的数据集,每个商店和产品仅出现一次
  4. (3) 和校正后的数据集的并集
let storedata=
datatable (Store:string,    Product:string  ,StoreNumber:string)
["Target",  "TargetCheese", "5",
"Target",   "TargetCheese", "4",
"Target",   "TargetApple",  "5",
"Target",   "TargetCorn",   "5",
"Target",   "TargetEggs",   "5",
"Kroger",   "KrogerApple",  "2",
"Kroger",   "KrogerCorn",   "2",
"Kroger",   "KrogerEggs",   "2",
"Safeway",  "SafewayApple", "6",
"Safeway",  "SafewayCorn",  "6",
"Safeway",   "SafewayEggs", "1"
];
// (1) evaluate best-fit StoreNumber
let storenumber =
storedata
| order by Store,  StoreNumber
| summarize occ= count () by Store, StoreNumber
| summarize  arg_max(occ, *) by Store;
// (2) dataset to be cleaned = more than one occurence per store and product
let cleanup =
storedata
| summarize occ = count () by Store,  Product
| where occ > 1
| project-away occ;
// (3) dataset with only one occurrence 
let okdata =
storedata
| summarize occ= count () by Store, Product
| where occ==1
| project-away occ;
// (4) final dataset 
let res1 =storenumber
| join cleanup on Store
| project Store, Product, StoreNumber;
let res2 = storedata
| join okdata on Store, Product
| project-away Store1, Product1;
res1
| union res2;

关于relational-database - Kusto 删除部分重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68830647/

相关文章:

azure - 库斯托 : union of intermediate result produced by expensive calculation

python - SQLalchemy 查询基于排除关系中的项目

mysql - 将 NoSQL 数据库用于关系目的

azure-data-explorer - 如何根据另一个字段的值聚合字段?

azure - 如何在 Azure 日志查询上转换日期时间格式

azure-data-explorer - Kusto 查询以按总数的百分比显示摘要

c# - 使用 EF 在 db 中建模树状结构

mysql - 配置文件页面 View 的数据模型

azure-data-explorer - 将对象列表转换为 Kusto 中的表

azure-data-explorer - 如何编写KQL将key=value对的CSV转换为字典?