python - 如何计算字典元素的频率并去除python列表中重复的字典元素？

我有一个列表，列表中的元素是一个dict类型。

例如，

da_list = [
    {'Surface':'APPLE','BaseForm':'apple','PN':0.5},
    {'Surface':'BANANA','BaseForm':'banana','PN':0.4},
    {'Surface':'ORANGE','BaseForm':'orange','PN':-0.1},
    {'Surface':'APPLE','BaseForm':'apple','PN':0.5},
    {'Surface':'BANANA','BaseForm':'banana','PN':0.4} 
]

我想定义一个名为 db_list 的新列表。 db_list 像这样存储 dict 元素:

db_list = [
    {'Surface':'APPLE','BaseForm':'apple','PN':0.5,'Frequency':2},
    {'Surface':'BANANA','BaseForm':'banana','PN':0.4,'Frequency':2},
    {'Surface':'ORANGE','BaseForm':'orange','PN':-0.1,'Frequency':1} 
]

db_list去除了da_list中的重复元素，并添加了每个字典的出现频率。

如何做到这一点？

最佳答案

你可以使用itertools.groupby:

import itertools
da_list = [{'Surface':'APPLE','BaseForm':'apple','PN':0.5}, {'Surface':'BANANA','BaseForm':'banana','PN':0.4}, {'Surface':'ORANGE','BaseForm':'orange','PN':-0.1}, {'Surface':'APPLE','BaseForm':'apple','PN':0.5}, {'Surface':'BANANA','BaseForm':'banana','PN':0.4}]
new_result = [list(b) for _, b in itertools.groupby(sorted(da_list, key=lambda x:x['Surface']), key=lambda x:x['Surface'])]
final_result = [{**i[0], 'Frequency':len(i)} for i in new_result]

输出:

[{'Surface': 'APPLE', 'BaseForm': 'apple', 'PN': 0.5, 'Frequency': 2}, {'Surface': 'BANANA', 'BaseForm': 'banana', 'PN': 0.4, 'Frequency': 2}, {'Surface': 'ORANGE', 'BaseForm': 'orange', 'PN': -0.1, 'Frequency': 1}]

关于python - 如何计算字典元素的频率并去除python列表中重复的字典元素？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51164989/

上一篇：python - 在 Keras 中，为什么必须根据神经网络的输出来计算损失函数？

下一篇：python - Keras 中的 LSTM 和 fit_generator 错误 "You must compile your model before using it"

python - python-单行嵌套循环

python-3.x - context.load_cert_chain 和 'OSError: [Errno 22] Invalid argument'

python - 如何在身份验证后获取 LinkedIn 返回 URL

python - 尝试将两个数据帧相互连接，但遇到索引值和长度不匹配的问题

python - 如何在使用 pyspark shell 时导入额外的 python 包

python-3.x - Python 程序给了我错误的答案

python - 将多个 .CSV 文件发送到 .ZIP 而无需在 Python 中存储到磁盘

python - 如何设置使用哪个版本的python sublime text

python - Tkinter 无法正确关闭并启动新文件