Python根据预定义的映射递归地转换dict

我有一个用例，我需要遍历一个字典(它可以包含字符串、字典和列表作为嵌套值)并根据我的业务团队预定义的映射创建一个新字典。当要求是:

1:1 转换
删除一些键值对

我的代码看起来像这样:

def recursively_transform(parent_keys='', current_key='', container=None):
    container_class = container.__class__
    new_container_value = None
    if container is not None:
        if isinstance(container, basestring):
            new_container_value = do_something_and_return(parent_keys, current_key, container)
            if current_key in mapping:
                populate(parent_keys + current_key, new_container_value)
        elif isinstance(container, collections.Mapping):
            if parent_keys:
                parent_keys = ''.join([parent_keys, ":"])
            new_container_value = container_class(
                (x, recursively_transform(parent_keys + x, x, container[x])) for x in container if key_required(parent_keys, current_key))
        elif isinstance(container, collections.Iterable):
            new_container_value = container_class(recursively_transform(
                parent_keys + "[]", current_key, x) for x in container)
        else:
            raise Exception("")
    return new_container_value

如您所见，在方法 do_something_and_return 中，使用参数 parent_key 和 current_key，我对值进行一些转换并返回新的一个。每个 parent_keys 加上 current_key 组合的步骤在外部映射数据库中指定。

但是，现在，要求已更改为具有复杂的转换(不再是 1:1)。即，在我的映射数据库中，将指定 key 的新路径。这可以是任何结构。例如，键/值对必须被展平，很多时候必须发生相反的情况，有时它们之间没有任何直接对应关系。

例子，

key1:key2:[]:key3 => key2:[]:key3
key1:key2:[]:key4 => key2:[]:key5

这意味着像这样的输入字典:

{key1:{key2:[{key3: "value3", key4: "value4"}, {key3:None}]}}

会变成

{key2:[{key3:"value3_after_transformation", key5:"value4_after_transformation"}, {key3:None}]}

: 是我描述语言中父键和子键的分隔符，[] 推断父键有一个列表作为它的值。

我对这种情况下应该采用什么方法感到困惑。我能想到的处理所有这些情况的唯一方法是递归遍历所有键，然后通过检查目标键的存在并适本地填充它们来动态填充另一个全局字典。但这在处理嵌套列表时并不容易。此外，这听起来不像我上面使用容器及其子项那样优雅的解决方案。以通用的方式和优雅的方式执行此操作的最佳方法是什么？

谢谢!

最佳答案

好的，我成功了。这通过了您给定的测试用例，但是它很长。它找到给定模板的所有可能路径，然后根据新路径填充新字典

 import re


def prepare_path(path):
    # split path
    path = re.findall(r"[^:]+?(?=\[|:|$)|\[\d*?\]", path)
    # prepare path
    for i, element in enumerate(path):
        if element[0] == "[" and element[-1] == "]":
            element = int(element[1:-1])
        path[i] = element
    return path


def prepare_template(template):
    # split path template
    template = re.findall(r"[^:]+?(?=\[|:|$)|\[\d*?\]", template)
    # prepare path template
    counter = 0
    for i, element in enumerate(template):
        if element[0] == "[" and element[-1] == "]":
            if len(element) > 2:
                element = int(element[1:-1])
            else:
                element = ("ListIndex", counter)
        template[i] = element
    return template


def fill_template(template, list_indexes):
    out = []
    for element in template:
        if isinstance(element, tuple):
            element = f"[{list_indexes[element[1]]}]"
        out.append(element)
    return ":".join(out)


def populate(result_dict, target_path, value):
    target_path = prepare_path(target_path)
    current = result_dict
    for i, element in enumerate(target_path[:-1]):
        if isinstance(element, str):  # dict index
            if element not in current:  # create new entry
                if isinstance(target_path[i + 1], str):  # next is a dict
                    current[element] = {}
                else:  # next is a list
                    current[element] = []
        elif isinstance(element, int):  # list index
            if element >= len(current):  # create new entry
                current.extend(None for _ in range(element - len(current) + 1))
            if current[element] is None:
                if isinstance(target_path[i + 1], str):  # next is a dict
                    current[element] = {}
                else:  # next is a list
                    current[element] = []
        current = current[element]
    if isinstance(target_path[-1], int):
        current.append(value)
    else:
        current[target_path[-1]] = value


def get_value(container, target_path):
    target_path = prepare_path(target_path)
    current = container
    for key in target_path:
        current = current[key]
    return current


def transform(old_path, new_path, old_container, new_container, transform_value=lambda *args: ' '.join(args)):
    value = get_value(old_container, old_path)
    new_value = transform_value(old_path, new_path, value)
    populate(new_container, new_path, new_value)


def get_all_paths(prepared_template, container):
    if not prepared_template:
        return [("",())]
    key, *rest = prepared_template
    if isinstance(key, tuple):
        if not isinstance(container, list):
            raise ValueError(container, key)
        paths = [(f"[{i}]:" + path, (i,) + per) for i, child in enumerate(container) for path, per in get_all_paths(rest, child)]
    elif isinstance(key, str):
        if key not in container:
            return []
        child = container[key]
        paths = [(f"{key}:" + path, per) for path, per in get_all_paths(rest, child)]
    elif isinstance(key, int):
        child = container[key]
        paths = [(f"[{key}]:" + path, per) for path, per in get_all_paths(rest, child)]
    else:
        raise ValueError
    return paths


def transform_all(old_template, new_template, old_container, new_container, transform_value=lambda op, np, value: value):
    new_template = prepare_template(new_template)
    old_template = prepare_template(old_template)
    all_paths = get_all_paths(old_template, old_container)
    for path, per in all_paths:
        transform(path, fill_template(new_template, per), old_container, new_container, transform_value)

input_dict = {"key1": {"key2": [{"key3": "value3", "key4": "value4"}, {"key3": None}]}}
output_dict = {}
transform_all("key1:key2:[]:key3", "key2:[]:key3", input_dict, output_dict)
transform_all("key1:key2:[]:key4", "key2:[]:key5", input_dict, output_dict)
print(output_dict)

如果您有任何问题或其他失败的情况，请提出!这些都是您给我们的有趣挑战。

关于Python根据预定义的映射递归地转换dict，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49996618/

Python根据预定义的映射递归地转换dict

上一篇：python - 这两种将元组附加到列表的方法有什么区别

下一篇：python - 对多进程 API 调用者应用速率限制/throttle 的最佳方法 - 使用 Celery