Python:具有最常见条目的数据子集

我正在努力解决以下问题。想象一下，我有很多这样的数据:

one = {'A':'m','B':'n','C':'o'}
two = {'A':'m','B':'n','C':'p'}
three = {'A':'x','B':'n','C':'p'}

等等，不一定非要存储在dicts中。我怎样才能获得包含最常见条目的数据子集？

在上面的例子中我想得到

one, two          with same A and B = m,n
two, three        with same B and C = n,p
one, two three    with same B       = n
one, two          with same A       = m

最佳答案

一种对长词典来说效率不高的方法是使用 itertools.combinations找到你的字典之间的组合，然后遍历组合，然后遍历集合，并得到集合项之间的交集:

one = {'one':{'A':'m','B':'n','C':'o'}}
two ={'two':{'A':'m','B':'n','C':'p'}}
three = {'three':{'A':'x','B':'n','C':'p'}}

dict_list=[one,two,three]
v_item=[i.items() for i in dict_list]

from itertools import combinations
names=[]
items=[]
l=[combinations(v_item,i) for i in range(2,4)]
flat=[[[t[0] for t in k] for k in j] for j in l]  
"""this line is for flattening the combinations i don't know why but python puts every elements within a list :
>>> l
[[([('one', {'A': 'm', 'C': 'o', 'B': 'n'})], [('two', {'A': 'm', 'C': 'p', 'B': 'n'})]), 
([('one', {'A': 'm', 'C': 'o', 'B': 'n'})], [('three', {'A': 'x', 'C': 'p', 'B': 'n'})]), 
([('two', {'A': 'm', 'C': 'p', 'B': 'n'})], [('three', {'A': 'x', 'C': 'p', 'B': 'n'})])], 
[([('one', {'A': 'm', 'C': 'o', 'B': 'n'})], [('two', {'A': 'm', 'C': 'p', 'B': 'n'})], [('three', {'A': 'x', 'C': 'p', 'B': 'n'})])]]"""


for comb in flat :
   for pair in comb:
     names,items =zip(*pair)
     items=[i.viewitems() for i in items]
     print names,reduce(lambda x,y:x&y,items)

结果:

('one', 'two') set([('B', 'n'), ('A', 'm')])
('one', 'three') set([('B', 'n')])
('two', 'three') set([('B', 'n'), ('C', 'p')])
('one', 'two', 'three') set([('B', 'n')])

关于以下几行:

     items=[i.viewitems() for i in items]
     print names,reduce(lambda x,y:x&y,items)

您需要 c reate a view object of your items作为 set 对象，然后您可以计算项目与 & 操作数的交集。使用 reduce功能。

关于Python:具有最常见条目的数据子集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29963285/

Python:具有最常见条目的数据子集

上一篇：python - Python for循环中发生的内部操作

下一篇：python - 如何使用 sphinx 生成 sitemap.xml 文件？