python - 使用缺口形状时 matplotlibs boxplot 的奇怪行为

标签 python matplotlib boxplot

当我使用“notch”形状时,我在 matplotlibboxplot 函数中遇到了一些奇怪的行为。我正在使用我不久前编写的一些代码,但从未遇到过这些问题——我想知道问题出在哪里。有什么想法吗?

weird behaviour on notched boxplots

当我关闭凹口形状时,它看起来很正常

unnotched boxplots look normal

代码如下:

def boxplot_modified(data):

    fig = plt.figure(figsize=(8,6))
    ax = plt.subplot(111) 

    bplot = plt.boxplot(data, 
            #notch=True,          # notch shape 
            vert=True,           # vertical box aligmnent
            sym='ko',            # red circle for outliers
            patch_artist=True,   # fill with color
            )   

    # choosing custom colors to fill the boxes
    colors = 3*['lightgreen'] + 3*['lightblue'], 'lightblue', 'lightblue', 'lightblue']
    for patch, color in zip(bplot['boxes'], colors):
        patch.set_facecolor(color)

    # modifying the whiskers: straight lines, black, wider
    for whisker in bplot['whiskers']:
        whisker.set(color='black', linewidth=1.2, linestyle='-')    

    # making the caps a little bit wider 
    for cap in bplot['caps']:
        cap.set(linewidth=1.2)

    # hiding axis ticks
    plt.tick_params(axis="both", which="both", bottom="off", top="off",  
            labelbottom="on", left="off", right="off", labelleft="on")

    # adding horizontal grid lines 
    ax.yaxis.grid(True) 

    # remove axis spines
    ax.spines["top"].set_visible(False)  
    ax.spines["right"].set_visible(False) 
    ax.spines["bottom"].set_visible(True) 
    ax.spines["left"].set_visible(True)

    plt.xticks([y+1 for y in range(len(data))], 8*['x'])

    # raised title
    #plt.text(2, 1, 'Modified',
    #     horizontalalignment='center',
    #     fontsize=18)

    plt.tight_layout()
    plt.show()

boxplot_modified(df.values)

当我在没有自定义的情况下制作普通图时,问题仍然出现:

def boxplot(data):

    fig = plt.figure(figsize=(8,6))
    ax = plt.subplot(111) 

    bplot = plt.boxplot(data, 
            notch=True,          # notch shape 
            vert=True,           # vertical box aligmnent
            sym='ko',            # red circle for outliers
            patch_artist=True,   # fill with color
            )   

    plt.show()
boxplot(df.values)

notch plot without customization still looks weird

最佳答案

好吧,事实证明,这实际上是一个正确的行为;)

来自 Wikipedia :

Notched box plots apply a "notch" or narrowing of the box around the median. Notches are useful in offering a rough guide to significance of difference of medians; if the notches of two boxes do not overlap, this offers evidence of a statistically significant difference between the medians. The width of the notches is proportional to the interquartile range of the sample and inversely proportional to the square root of the size of the sample. However, there is uncertainty about the most appropriate multiplier (as this may vary depending on the similarity of the variances of the samples). One convention is to use +/-1.58*IQR/sqrt(n).

这也在 issue on GitHub 中进行了讨论; R 生成类似的输出作为此行为“正确”的证据。

因此,如果我们在带缺口的箱形图中出现这种奇怪的“翻转”外观,这仅意味着第一个四分位数的值低于第三个四分位数的均值置信度,反之亦然。虽然它看起来很难看,但它实际上是有关中位数(不)置信度的有用信息。

自举(通过放回随机抽样来估计抽样分布的参数,此处:置信区间)可能会减少这种影响:

来自 plt.boxplot 文档:

bootstrap : None (default) or integer Specifies whether to bootstrap the confidence intervals around the median for notched boxplots. If bootstrap==None, no bootstrapping is performed, and notches are calculated using a Gaussian-based asymptotic approximation (see McGill, R., Tukey, J.W., and Larsen, W.A., 1978, and Kendall and Stuart, 1967). Otherwise, bootstrap specifies the number of times to bootstrap the median to determine it's 95% confidence intervals. Values between 1000 and 10000 are recommended.

关于python - 使用缺口形状时 matplotlibs boxplot 的奇怪行为,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26291082/

相关文章:

python - 使用牛顿法求解 Python 中的非线性方程组

python - 子图中 seaborn 热图的一个颜色条

python - 如何使用python中内置的numpy或matplotlib正确生成3d直方图?

python - 如何创建分类进度条

r - 带有 ggplot2 的箱线图 : Trying to lay geom_jitter over code for plot, 奇怪的离群点

python - 二维数组中元素的顺序测试?

python - 在多个系统上使用 PIL

python - 如何在同一绘图上对多个字典进行箱线图

r - 如何在 R 中找到箱线图的上限和下限?

python - 在 Python subprocess.Popen 函数中使用 ls