python - 在具有匹配属性的字典数组中查找最低值，返回最大分组

这很容易通过几个循环来完成，但我相信有一种更有效的方法来实现这一点，我很想学习。

考虑以下 dict 数组，它表示从 nosql 数据库中提取的数据。

x = [
    {
        "loc" : "alpha",
        "tag" : 1,
        "dist" : 5
    },
    {
        "loc" : "bravo",
        "tag" : 0,
        "dist" : 2
    },
    {
        "loc" : "charlie",
        "tag" : 5,
        "dist" : 50
    },
    {
        "loc" : "delta",
        "tag" : 4,
        "dist" : 2
    },
    {
        "loc" : "echo",
        "tag" : 2,
        "dist" : 30
    },
    {
        "loc" : "foxtrot",
        "tag" : 4,
        "dist" : 2
    },
    {
        "loc" : "gamma",
        "tag" : 4,
        "dist" : 2
    },
    {
        "loc" : "hotel",
        "tag" : 0,
        "dist" : 2
    },
]

我想找到所有具有最低“dist”值的项目，并且如果有多个具有相同最低值的字典，我希望对具有最多字典的属性“标签”进行分组相同的最低值。

例如，上述所需的返回数据为:

r = [
    {
        "LocationName" : "delta",
        "tag" : 4,
        "dist" : 2
    },
    {
        "loc" : "foxtrot",
        "tag" : 4,
        "dist" : 2
    },
    {
        "loc" : "gamma",
        "tag" : 4,
        "dist" : 2
    }
]

总结:dist:2 是最低值，[bravo, delta, foxtrot, gamma, hotel] 的 dist 都是 2，[bravo, hotel] 的标签是:0 和 [delta, foxtrot, gamma]有一个标签:4。返回一个字典数组 [delta, foxtrot, gamma]，因为它们有更多具有相同匹配标签和最低距离的字典。

我正在使用 python 3.6。

感谢您的帮助和兴趣!

最佳答案

您可以指定 key (即 lambda 函数)对于 max()和 min()这可以帮助解决这个问题。对于您的第一个测试，

lowest_single_dist = min(x, key=lambda i: i["dist"])

返回 x 中的元素"dist" 的最低值.如果您随后想要所有具有该标记值的元素，您可以使用列表理解:

lowest_dists = [i for i in x if i["dist"] == lowest_single_dist["dist"]]

为了获得最大的分组，我首先为 "tag" 创建一组可能的值在该子集中，然后检查 lowest_dists 中每个子集有多少, 然后取计数最高的一个:

tags = [i["tag"] for i in lowest_dists]              # get a list of just the tags
ct = {t: tags.count(t) for t in set(tags)}           # make a dict of tag:count for each unique tag
max_tag = max(ct, key=lambda x: ct[x])               # find the largest count and get the largest tag
r = [i for i in lowest_dists if i["tag"] == max_tag] # use another list comprehension to get all the max tags

如果你想将它全部缩短为两行，你可以不那么 pythonic 并执行此操作:

m = min(x, key=lambda i: (i["dist"], -1 * max([j["tag"] for j in x if j["dist"] == i["dist"]].count(i["tag"])))
r = [i for i in x if i["tag"] == m["tag"] and i["dist"] == m["dist"]]

这利用了这样一个事实，即您可以返回一个元组作为排序的键，只有当第一个值相等时才会检查元组的第二个值。我将稍微扩展第一行并解释每个部分的作用:

m = min(x, key=lambda i: (
    i["dist"], -1 * max(
        [j["tag"] for j in x if j["dist"] == i["dist"]].count(i["tag"])
    ))

最内层的列表理解为 x 中的所有元素生成一个标签列表"dist" 的值相同作为i
然后，计算与i相同的标签数
乘以-1 使其为负值，使得min()行为正确
制作i["dist"]的元组和我们刚刚计算的值(i["tag"] 在 x 中的频率)，并为每个元素返回它
分配给m "dist" 具有最低值的列表元素和 "tag" 的最常见值
分配给r x 中元素的子列表"dist" 的值相同和 "tag"

所以基本上与上面的过程相同，但是更短、效率更低并且更复杂一些。

关于python - 在具有匹配属性的字典数组中查找最低值，返回最大分组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53244865/

python - 在具有匹配属性的字典数组中查找最低值，返回最大分组

上一篇：Python:如何在我的时间序列中删除每天的前 5 分钟？

下一篇：python - Pandas:在遍历行时有条件地将行插入 DataFrame