python - Python 中不同嵌套字典的 Gen 填充率？

我有一个在 Python 中接收嵌套字典的进程:

嵌套字典架构示例(伪代码)

key1: value1,
key2: dict(
  key3: value2,
  key4: value3,
),
key5: list(value4,value5) # any value is fine, just not empty or null

嵌套字典数据示例(伪代码)

key1: 'value',
key2: dict(
  key3: '',
  key4: 12345,
),
key5: list()

我想迭代/扫描这个字典并检查每个键是否有值(不为空或空白 - false/0 都可以)。我需要扫描一堆相同的字典以获得该组字典的总体“填充率”。该过程每次运行时都会看到不同格式的字典集，因此需要自动生成填充率报告:

上面单个嵌套示例的示例填充率(理想情况下是平面字典):

key1: 1
key2: 1
key2-key3: 0
key2-key4: 1
key5: 0

例如，如果我们扫描具有相同结构的十个字典，我们可能会看到如下所示的“填充率”:

key1: 5
key2: 6
key2-key3: 6
key2-key4: 4
key5: 3

问题

扫描不同结构的字典以生成填充率的最Pythonic方法是什么？如果我必须执行数百万次，是否有更有效的方法？
创建平面字典来存储计数的最 Pythonic 方法是什么以及如何更新它？

最佳答案

这是我的看法:

What is the most pythonic way to scan dicts of varying structure to gen a fill rate?

递归地。特别是，我将遍历子树的结果返回给调用者。调用者负责将多个子树合并到自己的树结果中。

Is there a more efficient way if I have to do this millions of times?

也许吧。尝试一种解决方案，看看它是否 A) 正确且 B) 足够快。如果两者兼而有之，就不必费心寻找最有效的方法。

What is the most pythonic way to create a flat dict to store the counts and how do I update it?

通过使用 Python 附带的库之一。在本例中，为collections.Counter()。并通过调用其 .update() 函数。

from collections import Counter
from pprint import pprint

example1_dict = {
    'key1': 'value',
    'key2': {
        'key3': '',
        'key4': 12345,
    },
    'key5': list()
}

example2_dict = {
    'key1': 'value',
    'key7': {
        'key3': '',
        'key4': 12345,
    },
    'key5': [1]
}

def get_fill_rate(d, path=()):
    result = Counter()
    for k, v in d.items():
        if isinstance(v, dict):
            result[path+(k,)] += 1
            result.update(get_fill_rate(v, path+(k,)))
        elif v in (False, 0):
            result[path+(k,)] += 1
        elif v:
            result[path+(k,)] += 1
        else:
            result[path+(k,)] += 0
    return result

def get_fill_rates(l):
    result = Counter()
    for d in l:
        result.update(get_fill_rate(d))
    return dict(result)

result = get_fill_rates([example1_dict, example2_dict])

# Raw result
pprint(result)

# Formatted result
print('\n'.join(
    '-'.join(single_key for single_key in key) + ': ' + str(value)
    for key, value in sorted(result.items())))

结果:

{('key1',): 2,
 ('key2',): 1,
 ('key2', 'key3'): 0,
 ('key2', 'key4'): 1,
 ('key5',): 1,
 ('key7',): 1,
 ('key7', 'key3'): 0,
 ('key7', 'key4'): 1}
key1: 2
key2: 1
key2-key3: 0
key2-key4: 1
key5: 1
key7: 1
key7-key3: 0
key7-key4: 1

关于python - Python 中不同嵌套字典的 Gen 填充率？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48452708/

python - Python 中不同嵌套字典的 Gen 填充率？

上一篇：python - 条件重采样 - Pandas

下一篇：python - 根据Python中另一个数组中包含的True/False值填充2D numpy数组