我有一个用例,我需要遍历一个字典(它可以包含字符串、字典和列表作为嵌套值)并根据我的业务团队预定义的映射创建一个新字典。当要求是:
- 1:1 转换
- 删除一些键值对
我的代码看起来像这样:
def recursively_transform(parent_keys='', current_key='', container=None):
container_class = container.__class__
new_container_value = None
if container is not None:
if isinstance(container, basestring):
new_container_value = do_something_and_return(parent_keys, current_key, container)
if current_key in mapping:
populate(parent_keys + current_key, new_container_value)
elif isinstance(container, collections.Mapping):
if parent_keys:
parent_keys = ''.join([parent_keys, ":"])
new_container_value = container_class(
(x, recursively_transform(parent_keys + x, x, container[x])) for x in container if key_required(parent_keys, current_key))
elif isinstance(container, collections.Iterable):
new_container_value = container_class(recursively_transform(
parent_keys + "[]", current_key, x) for x in container)
else:
raise Exception("")
return new_container_value
如您所见,在方法 do_something_and_return
中,使用参数 parent_key
和 current_key
,我对值进行一些转换并返回新的一个。每个 parent_keys
加上 current_key
组合的步骤在外部映射数据库中指定。
但是,现在,要求已更改为具有复杂的转换(不再是 1:1)。即,在我的映射数据库中,将指定 key 的新路径。这可以是任何结构。例如,键/值对必须被展平,很多时候必须发生相反的情况,有时它们之间没有任何直接对应关系。
例子,
key1:key2:[]:key3 => key2:[]:key3
key1:key2:[]:key4 => key2:[]:key5
这意味着像这样的输入字典:
{key1:{key2:[{key3: "value3", key4: "value4"}, {key3:None}]}}
会变成
{key2:[{key3:"value3_after_transformation", key5:"value4_after_transformation"}, {key3:None}]}
:
是我描述语言中父键和子键的分隔符,[]
推断父键有一个列表作为它的值。
我对这种情况下应该采用什么方法感到困惑。我能想到的处理所有这些情况的唯一方法是递归遍历所有键,然后通过检查目标键的存在并适本地填充它们来动态填充另一个全局字典。但这在处理嵌套列表时并不容易。此外,这听起来不像我上面使用容器及其子项那样优雅的解决方案。以通用的方式和优雅的方式执行此操作的最佳方法是什么?
谢谢!
最佳答案
好的,我成功了。这通过了您给定的测试用例,但是它很长。它找到给定模板的所有可能路径,然后根据新路径填充新字典
import re
def prepare_path(path):
# split path
path = re.findall(r"[^:]+?(?=\[|:|$)|\[\d*?\]", path)
# prepare path
for i, element in enumerate(path):
if element[0] == "[" and element[-1] == "]":
element = int(element[1:-1])
path[i] = element
return path
def prepare_template(template):
# split path template
template = re.findall(r"[^:]+?(?=\[|:|$)|\[\d*?\]", template)
# prepare path template
counter = 0
for i, element in enumerate(template):
if element[0] == "[" and element[-1] == "]":
if len(element) > 2:
element = int(element[1:-1])
else:
element = ("ListIndex", counter)
template[i] = element
return template
def fill_template(template, list_indexes):
out = []
for element in template:
if isinstance(element, tuple):
element = f"[{list_indexes[element[1]]}]"
out.append(element)
return ":".join(out)
def populate(result_dict, target_path, value):
target_path = prepare_path(target_path)
current = result_dict
for i, element in enumerate(target_path[:-1]):
if isinstance(element, str): # dict index
if element not in current: # create new entry
if isinstance(target_path[i + 1], str): # next is a dict
current[element] = {}
else: # next is a list
current[element] = []
elif isinstance(element, int): # list index
if element >= len(current): # create new entry
current.extend(None for _ in range(element - len(current) + 1))
if current[element] is None:
if isinstance(target_path[i + 1], str): # next is a dict
current[element] = {}
else: # next is a list
current[element] = []
current = current[element]
if isinstance(target_path[-1], int):
current.append(value)
else:
current[target_path[-1]] = value
def get_value(container, target_path):
target_path = prepare_path(target_path)
current = container
for key in target_path:
current = current[key]
return current
def transform(old_path, new_path, old_container, new_container, transform_value=lambda *args: ' '.join(args)):
value = get_value(old_container, old_path)
new_value = transform_value(old_path, new_path, value)
populate(new_container, new_path, new_value)
def get_all_paths(prepared_template, container):
if not prepared_template:
return [("",())]
key, *rest = prepared_template
if isinstance(key, tuple):
if not isinstance(container, list):
raise ValueError(container, key)
paths = [(f"[{i}]:" + path, (i,) + per) for i, child in enumerate(container) for path, per in get_all_paths(rest, child)]
elif isinstance(key, str):
if key not in container:
return []
child = container[key]
paths = [(f"{key}:" + path, per) for path, per in get_all_paths(rest, child)]
elif isinstance(key, int):
child = container[key]
paths = [(f"[{key}]:" + path, per) for path, per in get_all_paths(rest, child)]
else:
raise ValueError
return paths
def transform_all(old_template, new_template, old_container, new_container, transform_value=lambda op, np, value: value):
new_template = prepare_template(new_template)
old_template = prepare_template(old_template)
all_paths = get_all_paths(old_template, old_container)
for path, per in all_paths:
transform(path, fill_template(new_template, per), old_container, new_container, transform_value)
input_dict = {"key1": {"key2": [{"key3": "value3", "key4": "value4"}, {"key3": None}]}}
output_dict = {}
transform_all("key1:key2:[]:key3", "key2:[]:key3", input_dict, output_dict)
transform_all("key1:key2:[]:key4", "key2:[]:key5", input_dict, output_dict)
print(output_dict)
如果您有任何问题或其他失败的情况,请提出!这些都是您给我们的有趣挑战。
关于Python根据预定义的映射递归地转换dict,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49996618/