python - 在 Python 中绘制分类数据的三个维度

我的数据包含三个我试图可视化的分类变量:

城市(五个城市之一)
职业(四种之一)
血型(四种之一)

到目前为止，我已经成功地以一种我认为易于使用的方式对数据进行了分组:

import numpy as np, pandas as pd

# Make data
cities = ['Tijuana','Las Vegas','Los Angeles','Anaheim','Atlantis']
occupations = ['Doctor','Lawyer','Engineer','Drone security officer']
bloodtypes = ['A','B','AB','O']
df = pd.DataFrame({'City': np.random.choice(cities,500),
                   'Occupation': np.random.choice(occupations,500),
                   'Blood Type':np.random.choice(bloodtypes,500)})

# You need to make a dummy column, otherwise the groupby returns an empty df
df['Dummy'] = np.ones(500)

# This is now what I'd like to plot
df.groupby(by=['City','Occupation','Blood Type']).count().unstack(level=1)

                       Dummy
Occupation             Doctor Drone security officer Engineer Lawyer
City        Blood Type
Anaheim     A               7                      7        7      7
            AB              6                     10        8      5
            B               2                     10        4      2
            O               4                      3        3      6
Atlantis    A               6                      5        5      7
            AB             12                      7        7     10
            B               7                      4        7      3
            O               7                      4        6      4
Las Vegas   A               8                      4        8      5
            AB              5                      6        8      9
            B               6                     10        6      6
            O               6                      9        5      9
Los Angeles A               7                      4        8      8
            AB              9                      8        8      8
            B               3                      6        4      1
            O               9                     11       11      9
Tijuana     A               3                      4        5      3
            AB              9                      5        5      7
            B               3                      6        4      9
            O               3                      5        5      8

我的目标是创建如下所示的 Seaborn 群图，它来自 Seaborn documentation 。 Seaborn 将抖动应用于定量数据，以便您可以看到各个数据点及其色调:

根据我的数据，我想在 x 轴上绘制 City ，在 y 轴上绘制 Occupation ，对每个数据应用抖动，然后按色调血型。但是，sns.swarmplot 要求其中一个轴是定量的:

sns.swarmplot(data=df,x='City',y='Occupation',hue='Blood Type')

返回错误。

一个可接受的替代方案可能是创建 20 个分类条形图，每个City 和 Occupation 的交集一个，我会这样做在每个类别上运行 for 循环，但我无法想象如何将其提供给 matplotlib 子图以将它们放入 4x5 网格中。

most similar question我可以在 R 中找到 is，而提问者只想指出第三个变量最常见的值，所以我没有从那里得到任何好的想法。

感谢您提供的任何帮助。

最佳答案

好吧，我今天开始研究“可接受的替代方案”，并且我找到了一个使用基本上纯 matplotlib 的解决方案(但我将 Seaborn 样式放在它上面，只是因为)。

import numpy as np, pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cm import get_cmap
from matplotlib.patches import Patch
import seaborn as sns

# Make data
cities = ['Tijuana','Las Vegas','Los Angeles','Anaheim','Atlantis']
occupations = ['Doctor','Lawyer','Engineer','Drone security officer']
bloodtypes = ['A','B','AB','O']
df = pd.DataFrame({'City': np.random.choice(cities,500),
                   'Occupation': np.random.choice(occupations,500),
                   'Blood Type':np.random.choice(bloodtypes,500)})

# Make a dummy column, otherwise the groupby returns an empty df
df['Dummy'] = np.ones(500)

# This is now what I'd like to plot
grouped = df.groupby(by=['City','Occupation','Blood Type']).count().unstack()

# List of blood types, to use later as categories in subplots
kinds = grouped.columns.levels[1]

# colors for bar graph
colors = [get_cmap('viridis')(v) for v in np.linspace(0,1,len(kinds))]

sns.set(context="talk")
nxplots = len(grouped.index.levels[0])
nyplots = len(grouped.index.levels[1])
fig, axes = plt.subplots(nxplots,
                         nyplots,
                         sharey=True,
                         sharex=True,
                         figsize=(10,12))

fig.suptitle('City, occupation, and blood type')

# plot the data
for a, b in enumerate(grouped.index.levels[0]):
    for i, j in enumerate(grouped.index.levels[1]):
        axes[a,i].bar(kinds,grouped.loc[b,j],color=colors)
        axes[a,i].xaxis.set_ticks([])

axeslabels = fig.add_subplot(111, frameon=False)
plt.tick_params(labelcolor='none', top=False, bottom=False, left=False, right=False)
plt.grid(False)
axeslabels.set_ylabel('City',rotation='horizontal',y=1,weight="bold")
axeslabels.set_xlabel('Occupation',weight="bold")

# x- and y-axis labels
for i, j in enumerate(grouped.index.levels[1]):
    axes[nyplots,i].set_xlabel(j)
for i, j in enumerate(grouped.index.levels[0]):
    axes[i,0].set_ylabel(j)

# Tune this manually to make room for the legend
fig.subplots_adjust(right=0.82)

fig.legend([Patch(facecolor = i) for i in colors],
           kinds,
           title="Blood type",
           loc="center right")

返回这个:

我很感激任何反馈，如果有人能够提供首选解决方案，我仍然会很高兴。

关于python - 在 Python 中绘制分类数据的三个维度，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58303175/

python - 在 Python 中绘制分类数据的三个维度

上一篇：c# - 如何使用 ML.Net 执行具有多种特征的二元分类

下一篇：shell - 如何判断终端是否是windows终端