python - 用新创建的对象 ID 替换复杂数据结构中的对象 ID

我有一个可以深度嵌套的数据结构，如下所示:

{
 'field1' : 'id1',
 'field2':{'f1':'id1', 'f2':'id2', 'f3':'id3'},
 'field3':['id1','id2', 'id3' ,' id4'],
 'field4':[{'f1': 'id3', 'f2': 'id4'}, ...]
 .....
}

等等......嵌套可以是任意深度，可以是任意数据结构的排列组合。

这里 id1 、id2 、id3 是使用 bson 库生成的 ObjectId 的字符串等价物，记录是通过从 mongoDB 查询得到的。我想替换所有出现的这些 id，即； id1,id2...以及新创建的。

替换必须是这样的，即 id1 必须在所有位置都被新创建的相同 id 替换，并且其他 id 也保持相同。

为了澄清上述内容: 如果id5是新生成的id，那么id1出现的所有地方都必须出现id5，依此类推。

这是我执行上述操作的解决方案:

import re
from bson import ObjectId
from collections import defaultdict
import datetime  


class MutableString(object):
'''
class that represents a mutable string
'''
def __init__(self, data):
    self.data = list(data)
def __repr__(self):
    return "".join(self.data)
def __setitem__(self, index, value):
    self.data[index] = value
def __getitem__(self, index):
    if type(index) == slice:
        return "".join(self.data[index])
    return self.data[index]
def __delitem__(self, index):
    del self.data[index]
def __add__(self, other):
    self.data.extend(list(other))
def __len__(self):
    return len(self.data)


def get_object_id_position_mapping(string):
    '''
    obtains the mapping of start and end positions of object ids in the record from DB
    :param string: string representation of record from DB
    :return: mapping of start and end positions of object ids in record from DB (dict)
    '''
    object_id_pattern = r'[0-9a-f]{24}'
    mapping = defaultdict(list)
    for match in re.finditer(object_id_pattern, string):
        start = match.start()
        end = match.end()
        mapping[string[start:end]].append((start,end))
    return mapping


def replace_with_new_object_ids(mapping, string):
    '''
    replaces the old object ids in record with new ones
    :param mapping: mapping of start and end positions of object ids in record from DB (dict)
    :param string: string representation of record from DB
    :return:
    '''
    mutable_string = MutableString(string)
    for indexes in mapping.values():
        new_object_id = str(ObjectId())
        for index in indexes:
            start,end = index
            mutable_string[start:end] = new_object_id
    return eval(str(mutable_string))


def create_new(record):
    '''
    create a new record with replaced object ids
    :param record: record from DB
    :return: new record (dict)
    '''
    string = str(record)
    mapping = get_object_id_position_mapping(string)
    new_record = replace_with_new_object_ids(mapping, string)
    return new_record

简而言之，我将字典转换为字符串，然后替换 ids 并完成工作。

但我觉得这绝对不是最好的方法，因为如果我没有合适的导入(在本例中为日期时间)，并且我可能没有对象类型的信息(例如(如日期时间等)预先存储在数据库中。

我什至尝试了此处描述的nested_lookup方法https://github.com/russellballestrini/nested-lookup/blob/master/nested_lookup/nested_lookup.py

但无法完全按照我想要的方式工作。有一个更好的方法吗？

注意:效率不是我关心的问题。我想要的只是自动执行用新 ID 替换这些 ID 的过程，以节省手动执行此操作的时间。

编辑1:我将使用从 MongoDB 获取的记录作为其参数来调用 create_new()

编辑2:该结构可以将其他对象(例如日期时间)作为值例如:

 {
 'field1' : 'id1',
 'field2':{'f1':datetime.datetime(2017, 11, 1, 0, 0), 'f2':'id2', 'f3':'id3'},
 'field3':['id1','id2', 'id3' ,' id4'],
 'field4':[{'f1': 'id3', 'f2': datetime.datetime(2017,11, 1, 0 , 0)}, ...]
 .....
}

其他对象必须保持不变，只有 id 必须被替换

最佳答案

您可以使用递归函数深入查看嵌套在输入数据结构中的字符串。

def replace_ids(obj, new_ids=None):
  if new_ids is None:
    new_ids = {}
  if isinstance(obj, dict):
    return {key: replace_ids(value, new_ids) for key, value in obj.items()}
  if isinstance(obj, list):
    return [replace_ids(item, new_ids) for item in obj]
  if isinstance(obj, str):
    if obj not in new_ids:
      new_ids[obj] = generate_new_id()
    return new_ids[obj]
  return obj

generate_new_id 是一个函数，用于确定您希望如何生成新的 id。

关于python - 用新创建的对象 ID 替换复杂数据结构中的对象 ID，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47873033/

python - 用新创建的对象 ID 替换复杂数据结构中的对象 ID

上一篇：python - Django REST Framework - 根据 URL 进行过滤

下一篇：Python Pandas 将时间读取为十进制值