python - 使用 Counter 的列表中的字典

我想编写一个函数，列出在所有其他词典中至少出现 df 次的词典项的计数器。

例子:

prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]
prune(([{'a': 1, 'b': 10}, {'a': 2}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 2})]

正如我们所见，“a”在两个字典中出现了两次，它被列在输出中。

我的方法:

from collections import Counter
def prune(dicto,df=2):
   new = Counter()
   for d in dicto:
       new += Counter(d.keys())
   x = {}
   for key,value in new.items():
       if value >= df:
           x[key] = value
   print Counter(x)

输出:

Counter({'a': 2})

这给出了组合计数器的输出。正如我们所见，术语“a”总体上出现了 2 次，因此它满足 df 条件并列在输出中。现在，任何人都可以纠正我以获得所需的输出。

最佳答案

我建议:

from collections import Counter
def prune(dicto, min_df=2):
    # Create all counters
    counters = [Counter(d.keys()) for d in dicto]

    # Sum all counters
    total = sum(counters, Counter()) 

    # Create set with keys of high frequency
    keys = set(k for k, v in total.items() if v >= min_df)

    # Reconstruct counters using high frequency keys
    counters = (Counter({k: v for k, v in d.items() if k in keys}) for d in dicto)

    # With filter(None, ...) we take only the non empty counters.
    return filter(None, counters)

结果:

>>> prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]

关于python - 使用 Counter 的列表中的字典，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29638361/

python - 使用 Counter 的列表中的字典

上一篇：python - 通配符 Django 日志记录

下一篇：python - 在响应头 flask 中设置多个选项