我想编写一个函数,列出在所有其他词典中至少出现 df 次的词典项的计数器。
例子:
prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]
prune(([{'a': 1, 'b': 10}, {'a': 2}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 2})]
正如我们所见,“a”在两个字典中出现了两次,它被列在输出中。
我的方法:
from collections import Counter
def prune(dicto,df=2):
new = Counter()
for d in dicto:
new += Counter(d.keys())
x = {}
for key,value in new.items():
if value >= df:
x[key] = value
print Counter(x)
输出:
Counter({'a': 2})
这给出了组合计数器的输出。正如我们所见,术语“a”总体上出现了 2 次,因此它满足 df 条件并列在输出中。现在,任何人都可以纠正我以获得所需的输出。
最佳答案
我建议:
from collections import Counter
def prune(dicto, min_df=2):
# Create all counters
counters = [Counter(d.keys()) for d in dicto]
# Sum all counters
total = sum(counters, Counter())
# Create set with keys of high frequency
keys = set(k for k, v in total.items() if v >= min_df)
# Reconstruct counters using high frequency keys
counters = (Counter({k: v for k, v in d.items() if k in keys}) for d in dicto)
# With filter(None, ...) we take only the non empty counters.
return filter(None, counters)
结果:
>>> prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]
关于python - 使用 Counter 的列表中的字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29638361/