我有一个与下表大致相似的数据集。
我需要为每列创建一个条形图 TS1
至 TS5
计算该列中每个项目的数量。这些项目是以下之一:NOT_SEEN
NOT_ABLE
HIGH_BAR
和 110
之间的数值和 140
由 2 分隔(所以 110
、 112
、 114
等)。
我找到了一种方法可以很好地做到这一点,但我要问的是是否有办法创建循环或其他东西,这样我就不必将相同的代码复制粘贴 5 次(对于 5 列)?
这是我尝试过和工作的:
num_range = list(range(110,140, 2))
OUTCOMES = ['NOT_SEEN', 'NOT_ABLE', 'HIGH_BAR']
OUTCOMES.extend([str(num) for num in num_range])
OUTCOMES = CategoricalDtype(OUTCOMES, ordered = True)
fig, ax =plt.subplots(2, 3, sharey=True)
fig.tight_layout(pad=3)
下面是我复制了 5 次并且只更改了标题( Testing 1
、 Testing 2
等)和 TS1
的内容TS2
..(在第一行)。df["outcomes"] = df["TS1"].astype(OUTCOMES)
bpt=sns.countplot(x= "outcomes", data=df, palette='GnBu', ax=ax[0,0])
plt.setp(bpt.get_xticklabels(), rotation=60, size=6, ha='right')
bpt.set(xlabel='')
bpt.set_title('Testing 1')
然后下面的代码在上面的“5”个实例下面。ax[1,2].set_visible(False)
plt.show()
我相信有一种方法可以更好地做到这一点,但我对这一切都不熟悉。另外,我需要确保条形图的条形按从左到右的顺序排列为:
NOT_SEEN
NOT_ABLE
HIGH_BAR
和 110
, 112
, 114
等等使用python 2.7(不幸的是不是我的选择)和pandas 0.24.2。
+----+------+------+----------+----------+----------+----------+----------+
| ID | VIEW | YEAR | TS1 | TS2 | TS3 | TS4 | TS5 |
+----+------+------+----------+----------+----------+----------+----------+
| AA | NO | 2005 | | 134 | | HIGH_BAR | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | YES | 2015 | | | NOT_SEEN | | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | YES | 2010 | 118 | | | | NOT_ABLE |
+----+------+------+----------+----------+----------+----------+----------+
| BB | NO | 2020 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | YES | 2020 | | | | NOT_SEEN | |
+----+------+------+----------+----------+----------+----------+----------+
| AA | NO | 2010 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | NO | 2015 | | | | | 132 |
+----+------+------+----------+----------+----------+----------+----------+
| BB | YES | 2010 | | HIGH_BAR | | 140 | NOT_ABLE |
+----+------+------+----------+----------+----------+----------+----------+
| AA | YES | 2020 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | NO | 2010 | | | | 112 | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | YES | 2015 | | | NOT_ABLE | | HIGH_BAR |
+----+------+------+----------+----------+----------+----------+----------+
| BB | NO | 2020 | | | | 145 | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | NO | 2015 | | 110 | | | |
+----+------+------+----------+----------+----------+----------+----------+
| AA | YES | 2010 | HIGH_BAR | | | NOT_SEEN | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | YES | 2015 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| AA | NO | 2020 | | | | 118 | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | YES | 2015 | | 180 | NOT_ABLE | | |
+----+------+------+----------+----------+----------+----------+----------+
| BB | YES | 2020 | | NOT_SEEN | | | 126 |
+----+------+------+----------+----------+----------+----------+----------+
最佳答案
您可以将绘图线放在一个函数中,并在 for 循环中调用它,在每次迭代中自动更改列、标题和轴:
fig, axes =plt.subplots(2, 3, sharey=True)
fig.tight_layout(pad=3)
def plotting(column, title, ax):
df["outcomes"] = df[column].astype(OUTCOMES)
bpt=sns.countplot(x= "outcomes", data=df, palette='GnBu', ax=ax)
plt.setp(bpt.get_xticklabels(), rotation=60, size=6, ha='right')
bpt.set(xlabel='')
bpt.set_title(title)
columns = ['TS1', 'TS2', 'TS3', 'TS4', 'TS5']
titles = ['Testing 1', 'Testing 2', 'Testing 3', 'Testing 4', 'Testing 5']
for column, title, ax in zip(columns, titles, axes.flatten()):
plotting(column, title, ax)
axes[1,2].set_visible(False)
plt.show()
关于python - 为多列绘制多个条形图,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69065620/