使用表 storedata,我尝试删除行“Target TargetCheese 4” 这里的逻辑是,如果给定商店中同一产品有两个或多个条目,它将根据其他行选择最适合该商店的 StoreNumber。如果 StoreNumber 不匹配,但不是重复的产品,则该编号不会更改;例如,即使有更多 StoreNumber 为 6 的 Safeway 条目,SafewayEggs 的 StoreNumber 也将等于 1,因为 SafewayEggs 只有一行。
let storedata=
datatable (Store:string, Product:string ,StoreNumber:string)
["Target", "TargetCheese", "4",
"Target", "TargetCheese", "5",
"Target", "TargetApple", "5",
"Target", "TargetCorn", "5",
"Target", "TargetEggs", "5",
"Kroger", "KrogerApple", "2",
"Kroger", "KrogerCorn", "2",
"Kroger", "KrogerEggs", "2",
"Safeway", "SafewayApple", "6",
"Safeway", "SafewayCorn", "6",
"Safeway", "SafewayEggs", "1"
];
我希望从 storedata 表中看到这个结果表:
Store Product StoreNumber
Target TargetCheese 5
Target TargetApple 5
Target TargetCorn 5
Target TargetEggs 5
Kroger KrogerApple 2
Kroger KrogerCorn 2
Kroger KrogerEggs 2
Safeway SafewayApple 6
Safeway SafewayCorn 6
Safeway SafewayEggs 1
最佳答案
您可能需要不同的步骤:
- 找到“最适合”的 StoreNumber - 在下面的示例中,出现次数最多的 StoreNumber,请使用 arg_max
- 必须使用 (1) 清理的数据集,每个商店和产品出现超过 1 次,使用计数
- 无需清理的数据集,每个商店和产品仅出现一次
- (3) 和校正后的数据集的并集
let storedata=
datatable (Store:string, Product:string ,StoreNumber:string)
["Target", "TargetCheese", "5",
"Target", "TargetCheese", "4",
"Target", "TargetApple", "5",
"Target", "TargetCorn", "5",
"Target", "TargetEggs", "5",
"Kroger", "KrogerApple", "2",
"Kroger", "KrogerCorn", "2",
"Kroger", "KrogerEggs", "2",
"Safeway", "SafewayApple", "6",
"Safeway", "SafewayCorn", "6",
"Safeway", "SafewayEggs", "1"
];
// (1) evaluate best-fit StoreNumber
let storenumber =
storedata
| order by Store, StoreNumber
| summarize occ= count () by Store, StoreNumber
| summarize arg_max(occ, *) by Store;
// (2) dataset to be cleaned = more than one occurence per store and product
let cleanup =
storedata
| summarize occ = count () by Store, Product
| where occ > 1
| project-away occ;
// (3) dataset with only one occurrence
let okdata =
storedata
| summarize occ= count () by Store, Product
| where occ==1
| project-away occ;
// (4) final dataset
let res1 =storenumber
| join cleanup on Store
| project Store, Product, StoreNumber;
let res2 = storedata
| join okdata on Store, Product
| project-away Store1, Product1;
res1
| union res2;
关于relational-database - Kusto 删除部分重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68830647/