python - 一半(不是 split !)seaborn 中的 fiddle 情节

标签 python python-3.x pandas seaborn

目前 seaborn 优惠functionality for split violinplots根据 hue 变量设置 split=True。我想制作一个“半” fiddle 情节,即省略每把 fiddle 一半的情节。这样的图描绘了类似于每个连续变量的 pdf 的东西,仅绘制在每个分类变量的每条垂直线的一侧。

我已经设法欺骗 seaborn 用一个超出绘制值范围的额外数据点和一个额外的虚拟色调来绘制它,但我想知道这是否可以在不实际改变的情况下完成数据集,例如在 sns.violinplot() 参数中。

例如,这张图:

enter image description here

由以下片段创建:

# imports
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# load dataset from seaborn
datalist = sns.get_dataset_names()
dataset_name = 'iris'
if dataset_name in datalist:
    df = sns.load_dataset(dataset_name)
else:
    print("Dataset with name: " + dataset_name + " was not found in the available datasets online by seaborn.")

# prepare data
df2 = df.append([-999,-999,-999,-999,'setosa'])
df2['huecol'] = 0.0
df2['huecol'].iloc[-1]= -999

# plot
fig = plt.figure(figsize=(6,6))
sns.violinplot(x='species',y="sepal_width",
            split=True, hue ='huecol', inner = 'quartile',
            palette="pastel", data=df2, legend=False)
plt.title('iris')

# remove hue legend
leg = plt.gca().legend()
leg.remove()
plt.ylim([1,5.0])
plt.show()

最佳答案

我一直在寻找与此类似的解决方案,但没有找到令人满意的解决方案。我最后调用seaborn.kdeplot多次,因为 violinplot 本质上是一个单侧核密度图。

例子

categorical_kde_plot 的函数定义如下

categorical_kde_plot(
    df,
    variable="tip",
    category="day",
    category_order=["Thur", "Fri", "Sat", "Sun"],
    horizontal=False,
)

使用horizo​​ntal=True,输出看起来像:

代码

import seaborn as sns
from matplotlib import pyplot as plt


def categorical_kde_plot(
    df,
    variable,
    category,
    category_order=None,
    horizontal=False,
    rug=True,
    figsize=None,
):
    """Draw a categorical KDE plot

    Parameters
    ----------
    df: pd.DataFrame
        The data to plot
    variable: str
        The column in the `df` to plot (continuous variable)
    category: str
        The column in the `df` to use for grouping (categorical variable)
    horizontal: bool
        If True, draw density plots horizontally. Otherwise, draw them
        vertically.
    rug: bool
        If True, add also a sns.rugplot.
    figsize: tuple or None
        If None, use default figsize of (7, 1*len(categories))
        If tuple, use that figsize. Given to plt.subplots as an argument.
    """
    if category_order is None:
        categories = list(df[category].unique())
    else:
        categories = category_order[:]

    figsize = (7, 1.0 * len(categories))

    fig, axes = plt.subplots(
        nrows=len(categories) if horizontal else 1,
        ncols=1 if horizontal else len(categories),
        figsize=figsize[::-1] if not horizontal else figsize,
        sharex=horizontal,
        sharey=not horizontal,
    )

    for i, (cat, ax) in enumerate(zip(categories, axes)):
        sns.kdeplot(
            data=df[df[category] == cat],
            x=variable if horizontal else None,
            y=None if horizontal else variable,
            # kde kwargs
            bw_adjust=0.5,
            clip_on=False,
            fill=True,
            alpha=1,
            linewidth=1.5,
            ax=ax,
            color="lightslategray",
        )

        keep_variable_axis = (i == len(fig.axes) - 1) if horizontal else (i == 0)

        if rug:
            sns.rugplot(
                data=df[df[category] == cat],
                x=variable if horizontal else None,
                y=None if horizontal else variable,
                ax=ax,
                color="black",
                height=0.025 if keep_variable_axis else 0.04,
            )

        _format_axis(
            ax,
            cat,
            horizontal,
            keep_variable_axis=keep_variable_axis,
        )

    plt.tight_layout()
    plt.show()


def _format_axis(ax, category, horizontal=False, keep_variable_axis=True):

    # Remove the axis lines
    ax.spines["top"].set_visible(False)
    ax.spines["right"].set_visible(False)

    if horizontal:
        ax.set_ylabel(None)
        lim = ax.get_ylim()
        ax.set_yticks([(lim[0] + lim[1]) / 2])
        ax.set_yticklabels([category])
        if not keep_variable_axis:
            ax.get_xaxis().set_visible(False)
            ax.spines["bottom"].set_visible(False)
    else:
        ax.set_xlabel(None)
        lim = ax.get_xlim()
        ax.set_xticks([(lim[0] + lim[1]) / 2])
        ax.set_xticklabels([category])
        if not keep_variable_axis:
            ax.get_yaxis().set_visible(False)
            ax.spines["left"].set_visible(False)


if __name__ == "__main__":
    df = sns.load_dataset("tips")

    categorical_kde_plot(
        df,
        variable="tip",
        category="day",
        category_order=["Thur", "Fri", "Sat", "Sun"],
        horizontal=True,
    )

关于python - 一半(不是 split !)seaborn 中的 fiddle 情节,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53872439/

相关文章:

python - 如何通过 app.run() 将任意参数传递给 Flask?

python - 如何在 pymongo 中禁用 '_id'

python - 如何获取 CSV 文件中的列并将文本分隔并保存到 python 中的其他列

python - 我如何在 gdb 中使用 python 访问寄存器

python - 检查排列 python 上排列出现的行号

python - Django 使用 request.user.get_username() 填充隐藏输入

python-3.x - Pandas 通过取列之间的平均值来合并两个数据帧

python - 将数据帧转换为多索引数据帧

python - Pandas groupby diff 删除列

python - 使用xml打印子节点