这是来自以下链接的扩展问题
Python - Find a tuple in list of lists
我一直在使用以下解决方案
# Your input data.
tuples = [(2,3), (3,6), (1,2)]
lists = [[1,2,3,4],[2,3,4,5],[2,3],[4,5,6]]
# Convert to sets just once, rather than repeatedly
# within the nested for-loops.
subsets = {t : set(t) for t in tuples}
mainsets = [set(xs) for xs in lists]
# Same as your algorithm, but written differently.
tallies = {
tup : sum(s.issubset(m) for m in mainsets)
for tup, s in subsets.items()
}
print(tallies)
它非常适合给定的解决方案,但是当我的列表大小= 541909
和元组大小= 3363671
时,它需要很多时间。它已经运行了 30 分钟
,但我还没有得到输出。每个列表/元组中的元素将按升序排列,我准备更改这些元素的数据结构。执行此操作最快的方法是什么?
最佳答案
通过使用 collections.defaultdict
构建字典,我看到了一些性能改进:
from collections import defaultdict
# Your input data.
tuples = [(i, i+1) for i in range(1000)]
lists = [[1,2,3,4],[2,3,4,5],[2,3],[4,5,6]] * 1000
def original(tuples, lists):
subsets = {t : set(t) for t in tuples}
mainsets = [set(xs) for xs in lists]
return { tup : sum(s.issubset(m) for m in mainsets) for tup, s in subsets.items() }
def jp(tuples, lists):
subsets = list(map(frozenset, tuples))
mainsets = list(map(set, lists))
d = defaultdict(int)
for item in mainsets:
for sub in subsets:
if sub.issubset(item):
d[sub] += 1
return d
%timeit original(tuples, lists) # 707 ms per loop
%timeit jp(tuples, lists) # 431 ms per loop
关于Python - 比较两个列表以查找计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49854734/