python - 如何匹配字典列表中的数据并删除重复项？

我有两个字典列表，我正在尝试删除其结构中的重复项。

我的列表代码有键 code_id 和 groups:

codes = [{'code_id': '57025', 'groups': '1234'}, 
{'code_id': '57025', 'groups': '4567'}, 
{'code_id': '57025', 'groups': '8910'},
{'code_id': '1', 'groups': '4321'},
{'code_id': '1', 'groups': '9876'}]

对于我的字典列表中的每个寄存器，我的 code_id 都附加有一个或多个组。

我的 data_master 有键 code_ids、组和更多与我的具有相同 code_id 的寄存器相关的数据。

print(data_master)

但在这种情况下，被视为重复:

输出:

[{'code_id': '57025', 
'groups': '1234', 
'initials': 'XXXXX', 
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET', 
'number_1': '',
'number_2': ''},

{'code_id': '57025', 
'groups': '4567', 
'initials': 'XXXXX', 
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET', 
'number_1': '',
'number_2': ''},

{'code_id': '1', 
'groups': '4321', 
'initials': 'YYYY', 
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER', 
'number_1': '',
'number_2': ''},

{'code_id': '1', 
'groups': '9876', 
'initials': 'YYYY', 
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER', 
'number_1': '',
'number_2': ''}
]

在此结果中，对于列表中具有相同 code_id 但与其他组相同的每个寄存器，将返回其他字典结构。

我尝试了很多方法，这就是我实际尝试的方法:

group_list = []
for item in data_master:
  group_list.append(item['groups']) 
  for data in [data for data in codes if data['code_id'] == item['code_id'] and data['groups'] != item['groups']]:
    item['groups'] = group_list


new_data_master = []

for data in data_master:
  if (item["groups"] != data["groups"] for item in new_data_master):
    new_data_master.append(data)

print(new_data_master)

结果:

[{'code_id': '57025', 
'groups': ['1234', '4567', '4321', '9876'],
'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},
 
{'code_id': '57025',
'groups': ['1234', '4567', '4321', '9876'], 'initials': 'XXXXX',
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET',
'number_1': '',
'number_2': ''},

{'code_id': '1',
'groups': ['1234', '4567', '4321', '9876'],
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''},

{'code_id': '1', 'groups': ['1234', '4567', '4321', '9876'],
'initials': 'YYYY',
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER',
'number_1': '',
'number_2': ''}]

通过这种方式，它会返回所有其他组，这些组不一定与代码 ID 相关。

对于每个 code_id，我需要返回一个带有组数组的字典。这就是我所期待的结果。

预期结果:

[{'code_id': '57025', 
'groups': ['1234','4567'] 
'initials': 'XXXXX', 
'name': 'XXXX',
'city': 'LOS SANTOS',
'postal_code': '02938402-9093',
'uf': 'US',
'address': 'GROOVE STREET', 
'number_1': '',
'number_2': ''}

{'code_id': '1', 
'groups': ['4321','9876'] 
'initials': 'YYYY', 
'name': 'YYYY',
'city': 'GOTHAM',
'postal_code': '930489038-5679',
'uf': 'US',
'address': 'WAYNE TOWER', 
'number_1': '',
'number_2': ''}]

最佳答案

我认为以下应该有效


keys = list(data_master[0].keys())

id_s = {}
blocks = []
for block in data_master:
    code = block["code_id"]
    if code in id_s.keys():
        id_s[code].append(block.pop("groups")) # delete "code_id" to compare
    else:
        id_s[code] = [block.pop("groups")] # delete "code_id" to compare
    
    b = dict(**block)
    if b not in blocks: # since "code_id" was the only differetn now we can compare
        blocks.append(b)

# add code_id for every block        
new_master = [{key: block[key] if key != "groups" else id_s[block["code_id"]] 
               for key in keys} for block in blocks]

输出

[{'code_id': '57025',
  'groups': ['1234', '4567'],
  'initials': 'XXXXX',
  'name': 'XXXX',
  'city': 'LOS SANTOS',
  'postal_code': '02938402-9093',
  'uf': 'US',
  'address': 'GROOVE STREET',
  'number_1': '',
  'number_2': ''},
 {'code_id': '1',
  'groups': ['4321', '9876'],
  'initials': 'YYYY',
  'name': 'YYYY',
  'city': 'GOTHAM',
  'postal_code': '930489038-5679',
  'uf': 'US',
  'address': 'WAYNE TOWER',
  'number_1': '',
  'number_2': ''}]

关于python - 如何匹配字典列表中的数据并删除重复项？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73802394/

python - 如何匹配字典列表中的数据并删除重复项？

输出:

结果:

预期结果:

上一篇：r - 将 R 数据框中列中的周分组为列中的月

下一篇：Powershell if 条件在某些存在时不返回结果