我正在努力解决以下问题。 想象一下,我有很多这样的数据:
one = {'A':'m','B':'n','C':'o'}
two = {'A':'m','B':'n','C':'p'}
three = {'A':'x','B':'n','C':'p'}
等等,不一定非要存储在dicts中。 我怎样才能获得包含最常见条目的数据子集?
在上面的例子中我想得到
one, two with same A and B = m,n
two, three with same B and C = n,p
one, two three with same B = n
one, two with same A = m
最佳答案
一种对长词典来说效率不高的方法是使用 itertools.combinations
找到你的字典之间的组合,然后遍历组合,然后遍历集合,并得到集合项之间的交集:
one = {'one':{'A':'m','B':'n','C':'o'}}
two ={'two':{'A':'m','B':'n','C':'p'}}
three = {'three':{'A':'x','B':'n','C':'p'}}
dict_list=[one,two,three]
v_item=[i.items() for i in dict_list]
from itertools import combinations
names=[]
items=[]
l=[combinations(v_item,i) for i in range(2,4)]
flat=[[[t[0] for t in k] for k in j] for j in l]
"""this line is for flattening the combinations i don't know why but python puts every elements within a list :
>>> l
[[([('one', {'A': 'm', 'C': 'o', 'B': 'n'})], [('two', {'A': 'm', 'C': 'p', 'B': 'n'})]),
([('one', {'A': 'm', 'C': 'o', 'B': 'n'})], [('three', {'A': 'x', 'C': 'p', 'B': 'n'})]),
([('two', {'A': 'm', 'C': 'p', 'B': 'n'})], [('three', {'A': 'x', 'C': 'p', 'B': 'n'})])],
[([('one', {'A': 'm', 'C': 'o', 'B': 'n'})], [('two', {'A': 'm', 'C': 'p', 'B': 'n'})], [('three', {'A': 'x', 'C': 'p', 'B': 'n'})])]]"""
for comb in flat :
for pair in comb:
names,items =zip(*pair)
items=[i.viewitems() for i in items]
print names,reduce(lambda x,y:x&y,items)
结果:
('one', 'two') set([('B', 'n'), ('A', 'm')])
('one', 'three') set([('B', 'n')])
('two', 'three') set([('B', 'n'), ('C', 'p')])
('one', 'two', 'three') set([('B', 'n')])
关于以下几行:
items=[i.viewitems() for i in items]
print names,reduce(lambda x,y:x&y,items)
您需要 c reate a view
object of your items作为 set
对象,然后您可以计算项目与 &
操作数的交集。
使用 reduce
功能。
关于Python:具有最常见条目的数据子集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29963285/