python - 如何拥有堆积条形的集群

这就是我的数据集的样子:

In [1]: df1=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"])

In [2]: df2=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"])

In [3]: df1
Out[3]: 
          I         J
A  0.675616  0.177597
B  0.675693  0.598682
C  0.631376  0.598966
D  0.229858  0.378817

In [4]: df2
Out[4]: 
          I         J
A  0.939620  0.984616
B  0.314818  0.456252
C  0.630907  0.656341
D  0.020994  0.538303

我想为每个数据框绘制堆积条形图，但由于它们具有相同的索引，我希望每个索引有 2 个堆积条形图。

我试图在同一轴上绘制两者:

In [5]: ax = df1.plot(kind="bar", stacked=True)

In [5]: ax2 = df2.plot(kind="bar", stacked=True, ax = ax)

但它重叠。

然后我尝试先连接两个数据集:

pd.concat(dict(df1 = df1, df2 = df2),axis = 1).plot(kind="bar", stacked=True)

但这里的一切都是堆叠的

我最好的尝试是:

 pd.concat(dict(df1 = df1, df2 = df2),axis = 0).plot(kind="bar", stacked=True)

这给出了:

enter image description here

这基本上是我想要的，除了我希望酒吧按如下顺序排列

(df1,A) (df2,A) (df1,B) (df2,B) 等等...

我猜有窍门，但我找不到!

在@bgschiller 的回答之后，我得到了这个:

enter image description here

这几乎是我想要的。我希望条形图按索引聚集，以便在视觉上清晰。

奖励:x-label 不是多余的，类似于:

df1 df2    df1 df2
_______    _______ ...
   A          B

最佳答案

我最终找到了一个窍门(编辑:使用 seaborn 和 longform 数据框见下文):

使用 pandas 和 matplotlib 的解决方案

这里有一个更完整的例子:

import pandas as pd
import matplotlib.cm as cm
import numpy as np
import matplotlib.pyplot as plt

def plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot",  H="/", **kwargs):
    """Given a list of dataframes, with identical columns and index, create a clustered stacked bar plot. 
labels is a list of the names of the dataframe, used for the legend
title is a string for the title of the plot
H is the hatch used for identification of the different dataframe"""

    n_df = len(dfall)
    n_col = len(dfall[0].columns) 
    n_ind = len(dfall[0].index)
    axe = plt.subplot(111)

    for df in dfall : # for each data frame
        axe = df.plot(kind="bar",
                      linewidth=0,
                      stacked=True,
                      ax=axe,
                      legend=False,
                      grid=False,
                      **kwargs)  # make bar plots

    h,l = axe.get_legend_handles_labels() # get the handles we want to modify
    for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df
        for j, pa in enumerate(h[i:i+n_col]):
            for rect in pa.patches: # for each index
                rect.set_x(rect.get_x() + 1 / float(n_df + 1) * i / float(n_col))
                rect.set_hatch(H * int(i / n_col)) #edited part     
                rect.set_width(1 / float(n_df + 1))

    axe.set_xticks((np.arange(0, 2 * n_ind, 2) + 1 / float(n_df + 1)) / 2.)
    axe.set_xticklabels(df.index, rotation = 0)
    axe.set_title(title)

    # Add invisible data to add another legend
    n=[]        
    for i in range(n_df):
        n.append(axe.bar(0, 0, color="gray", hatch=H * i))

    l1 = axe.legend(h[:n_col], l[:n_col], loc=[1.01, 0.5])
    if labels is not None:
        l2 = plt.legend(n, labels, loc=[1.01, 0.1]) 
    axe.add_artist(l1)
    return axe

# create fake dataframes
df1 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"],
                   columns=["I", "J", "K", "L", "M"])
df2 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"],
                   columns=["I", "J", "K", "L", "M"])
df3 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"], 
                   columns=["I", "J", "K", "L", "M"])

# Then, just call :
plot_clustered_stacked([df1, df2, df3],["df1", "df2", "df3"])

它给出了:

multiple stacked bar plot

您可以通过传递 cmap 参数来更改栏的颜色:

plot_clustered_stacked([df1, df2, df3],
                       ["df1", "df2", "df3"],
                       cmap=plt.cm.viridis)

seaborn 的解决方案:

给定下面相同的 df1、df2、df3，我将它们转换为长格式:

df1["Name"] = "df1"
df2["Name"] = "df2"
df3["Name"] = "df3"
dfall = pd.concat([pd.melt(i.reset_index(),
                           id_vars=["Name", "index"]) # transform in tidy format each df
                   for i in [df1, df2, df3]],
                   ignore_index=True)

seaborn 的问题在于它本身不会堆叠条形图，因此诀窍是将每个条形图的累积和相互叠加:

dfall.set_index(["Name", "index", "variable"], inplace=1)
dfall["vcs"] = dfall.groupby(level=["Name", "index"]).cumsum()
dfall.reset_index(inplace=True) 

>>> dfall.head(6)
  Name index variable     value       vcs
0  df1     A        I  0.717286  0.717286
1  df1     B        I  0.236867  0.236867
2  df1     C        I  0.952557  0.952557
3  df1     D        I  0.487995  0.487995
4  df1     A        J  0.174489  0.891775
5  df1     B        J  0.332001  0.568868

然后循环遍历每组变量并绘制累积和:

c = ["blue", "purple", "red", "green", "pink"]
for i, g in enumerate(dfall.groupby("variable")):
    ax = sns.barplot(data=g[1],
                     x="index",
                     y="vcs",
                     hue="Name",
                     color=c[i],
                     zorder=-i, # so first bars stay on top
                     edgecolor="k")
ax.legend_.remove() # remove the redundant legends

我认为它缺少可以轻松添加的图例。问题是，我们有一个亮度梯度，而不是阴影(可以很容易地添加)来区分数据帧，对于第一个来说它有点太亮了，我真的不知道如何在不改变每个的情况下改变它一个接一个矩形(如第一个解决方案)。

如果你不理解代码中的某些内容，请告诉我。

请随意重复使用 CC0 下的此代码。

关于python - 如何拥有堆积条形的集群，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22787209/

python - 如何拥有堆积条形的集群

使用 pandas 和 matplotlib 的解决方案

seaborn 的解决方案:

上一篇：python - 如何消除 "sys.excepthook is missing"错误？

下一篇：python - 无法让 argparse 读取带有破折号的引号字符串？