我在数组中有以下三个字典:
items = [
{
'FirstName': 'David',
'LastName': 'Smith',
'Language': set(['en'])
},
{
'FirstName': 'David',
'LastName': 'Smith',
'Language': set(['fr'])
},
{
'FirstName': 'Bob',
'LastName': 'Jones',
'Language': set(['en'])
} ]
如果两个词典相同减去指定的键,我想将这些词典合并在一起:并将该键加在一起。如果使用 "Language"
键,它会将数组合并为以下内容:
[ {
'FirstName': 'David',
'LastName': 'Smith',
'Language': set(['en','fr'])
},{
'FirstName': 'Bob',
'LastName': 'Jones',
'Language': set(['en'])
} ]
这是我目前正在做的事情:
from copy import deepcopy
def _merge_items_on_field(items, field):
'''Given an array of dicts, merge the
dicts together if they are the same except for the 'field'.
If merging dicts, add the unique values of that field together.'''
items = deepcopy(items)
items_merged_on_field = []
for num, item in enumerate(items):
# Remove that key/value from the dict
field_value = item.pop(field)
# Get an array of items *without* that field to compare against
items_without_field = deepcopy(items_merged_on_field)
map(lambda d: d.pop(field), items_without_field)
# If the dict item is found ("else"), add the fields together
# If not ("except"), then add in the dict item to the array
try:
index = items_without_field.index(item)
except ValueError:
item[field] = field_value
items_merged_on_field.append(item)
else:
items_merged_on_field[index][field] = items_merged_on_field[index][field].union(field_value)
return items_merged_on_field
>>> items = [{'LastName': 'Smith', 'Language': set(['en']), 'FirstName': 'David'}, {'LastName': 'Smith', 'Language': set(['fr']), 'FirstName': 'David'}, {'LastName': 'Jones', 'Language': set(['en']), 'FirstName': 'Bob'}]
>>> _merge_items_on_field(items, 'Language')
[{'LastName': 'Smith', 'Language': set(['fr', 'en']), 'FirstName': 'David'}, {'LastName': 'Jones', 'Language': set(['en']), 'FirstName': 'Bob'}]
这似乎有点复杂——有更好的方法吗?
最佳答案
有几种方法可以做到这一点。据我所知,最轻松的方法是使用 pandas 库——特别是 groupby
+ apply
。
import pandas as pd
merged = (
pd.DataFrame(items)
.groupby(['FirstName', 'LastName'], sort=False)
.Language
.apply(lambda x: set.union(*x))
.reset_index()
.to_dict(orient='records')
)
print(merged)
[
{'FirstName': 'David', 'LastName': 'Smith', 'Language': {'en', 'fr'}},
{'FirstName': 'Bob', 'LastName': 'Jones', 'Language': {'en'}}
]
另一种方法(我提到的)使用 itertools.groupby
,但鉴于您有 30 列要分组,我只建议坚持使用 pandas。
如果你想把它变成一个函数,
def merge(items, field):
df = pd.DataFrame(items)
columns = df.columns.difference([field]).tolist()
return (
df.groupby(columns, sort=False)[field]
.apply(lambda x: set.union(*x))
.reset_index()
.to_dict(orient='records')
)
merged = merge(items, 'Language')
print(merged)
[
{'FirstName': 'David', 'LastName': 'Smith', 'Language': {'en', 'fr'}},
{'FirstName': 'Bob', 'LastName': 'Jones', 'Language': {'en'}}
]
关于python - 根据排除键的相似性合并两个字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50260261/