我有以下代码片段,我想以一种方式扩展它,使每个循环的数据都绘制在同一 Canvas 上,而不是将每个循环绘制到不同的 Canvas 上。
for level in range(len(result)):
sizes = result[level].values()
distribution=pd.DataFrame(Counter(sizes).items(), columns=['community size','number of communities'])
distribution.plot(kind='scatter', x='community size', y='number of communities')
在最佳情况下,我还希望根据原始数据对散点图中的点进行颜色编码(属于一个循环的数据的点以相同颜色着色)。
我对 matplotlib 和 pandas 都比较陌生,非常感谢 andy 的帮助。
最佳答案
无需多次调用 plot
,您可以将整个数据集构建为一个
DataFrame 然后你只需要调用一次 plot
。
开始于
result = [{0: 21, 1: 7, 2: 67, 3: 12, 4: 15, 5: 7, 6: 54, 7: 49, 8: 50, 9: 31,
10: 6, 11: 2, 12: 8, 13: 2, 14: 2, 15: 1, 16: 35, 17: 2, 18: 1, 19:
4, 20: 2, 21: 4, 22: 3, 23: 1, 24: 1, 25: 1, 26: 1, 27: 1, 28: 1,
29: 1},
{0: 2, 1: 5, 2: 2, 3: 3, 4: 1, 5: 2, 6: 3, 7: 2, 8: 1, 9: 1, 10: 1,
11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1}]
您可以构建一个包含列 level
和 size
的 DataFrame:
df = pd.DataFrame([(level,val) for level, dct in enumerate(result)
for val in dct.values()],
columns=['level', 'size'])
看起来像这样:
level size
0 0 21
1 0 7
2 0 67
...
45 1 1
46 1 1
47 1 1
现在我们可以按级别分组,并计算每个组中每个大小
的项目有多少:
size_count = df.groupby(['level'])['size'].apply(lambda x: x.value_counts())
# level
# 0 1 9
# 2 5
# 7 2
# ...
# 1 1 11
# 2 4
# 3 2
# 5 1
# dtype: int64
groupby/apply
以上返回一个 pd.Series
。为了使它成为一个 DataFrame,我们可以通过调用 reset_index()
将索引级别的值放入列中,然后为列分配列名:
size_count = size_count.reset_index()
size_count.columns = ['level', 'community size', 'number of communities']
现在可以生成所需的绘图
size_count.plot(kind='scatter', x='community size', y='number of communities',
s=100, c='level')
s=100
控制点的大小,c='level'
告诉 plot
根据中的值给点着色级别
列。
import pandas as pd
import matplotlib.pyplot as plt
result = [{0: 21, 1: 7, 2: 67, 3: 12, 4: 15, 5: 7, 6: 54, 7: 49, 8: 50, 9: 31,
10: 6, 11: 2, 12: 8, 13: 2, 14: 2, 15: 1, 16: 35, 17: 2, 18: 1, 19:
4, 20: 2, 21: 4, 22: 3, 23: 1, 24: 1, 25: 1, 26: 1, 27: 1, 28: 1,
29: 1},
{0: 2, 1: 5, 2: 2, 3: 3, 4: 1, 5: 2, 6: 3, 7: 2, 8: 1, 9: 1, 10: 1,
11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1}]
df = pd.DataFrame([(level,val) for level, dct in enumerate(result)
for val in dct.values()],
columns=['level', 'size'])
size_count = df.groupby(['level'])['size'].apply(lambda x: x.value_counts())
size_count = size_count.reset_index()
size_count.columns = ['level', 'community size', 'number of communities']
cmap = plt.get_cmap('jet')
size_count.plot(kind='scatter', x='community size', y='number of communities',
s=100, c='level', cmap=cmap)
plt.show()
如果有几十个级别,使用颜色条可能是合适的。
另一方面,如果只有几个级别,使用图例会使
更有意义。在那种情况下,为每个调用一次 plot
会更方便
level 值,因为制作图例的 matplotlib 代码设置为
每个地 block 一个图例条目:
import pandas as pd
import matplotlib.pyplot as plt
result = [{0: 21, 1: 7, 2: 67, 3: 12, 4: 15, 5: 7, 6: 54, 7: 49, 8: 50, 9: 31,
10: 6, 11: 2, 12: 8, 13: 2, 14: 2, 15: 1, 16: 35, 17: 2, 18: 1, 19:
4, 20: 2, 21: 4, 22: 3, 23: 1, 24: 1, 25: 1, 26: 1, 27: 1, 28: 1,
29: 1},
{0: 2, 1: 5, 2: 2, 3: 3, 4: 1, 5: 2, 6: 3, 7: 2, 8: 1, 9: 1, 10: 1,
11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1}]
df = pd.DataFrame([(level,val) for level, dct in enumerate(result)
for val in dct.values()],
columns=['level', 'size'])
groups = df.groupby(['level'])
fig, ax = plt.subplots()
for level, grp in groups:
size_count = grp['size'].value_counts()
ax.plot(size_count.index, size_count, markersize=12, marker='o',
linestyle='', label='level {}'.format(level))
ax.legend(loc='best', numpoints=1)
ax.set_xlabel('community size')
ax.set_ylabel('number of communities')
ax.grid(True)
plt.show()
关于python - 如何在循环中动态更新绘图?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31380737/