我有一个巨大的单词列表,例如['abc', 'def', 'python', 'abc', 'python', ...]
如何生成可以绘制成直方图/帕累托图的列表/字典,例如:
{'python': 10, 'abc': 8, 'def': 2,...}
此外,什么是合适的图表库来可视化上述单词出现率从高到低排序?
最佳答案
collections.Counter
提供了一种方便且相对快速的方法来创建像您展示的那样的字典:
from collections import Counter
x = ['spam', 'ham', 'eggs', 'ham', 'chips', 'eggs', 'spam', 'spam', 'spam']
counts = Counter(x)
print(counts)
# Counter({'spam': 4, 'eggs': 2, 'ham': 2, 'chips': 1})
要可视化计数,您可以使用 matplotlib条形图:
from matplotlib import pyplot as plt
import numpy as np
# sort counts in descending order
labels, heights = zip(*sorted(((k, v) for k, v in counts.items()), reverse=True))
# lefthand edge of each bar
left = np.arange(len(heights))
fig, ax = plt.subplots(1, 1)
ax.bar(left, heights, 1)
ax.set_xticks(left + 0.5)
ax.set_xticklabels(labels, fontsize='large')
关于python 生成关键字的直方图/帕累托图,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33370669/