python - 绘制多索引 DataFrame 条形图,其中颜色由类别决定

标签 python pandas matplotlib plot

我有一个多索引 DataFrame,看起来像下面的数据。当我绘制数据时,图表如下所示。

如何绘制条形图,其中条形的颜色由我想要的类别(例如:“城市”)决定。因此,无论年份如何,属于同一城市的所有条形图都具有相同的颜色。例如:在下图中,所有 ATL 条都应为红色,而所有 MIA 条均应为蓝色。

enter image description here

City            ATL                                    MIA               \
Year           2010         2011         2012         2010         2011   
Taste                                                                     
Bitter  3159.861983  3149.806667  2042.348937  3124.586470  3119.541240   
Sour    1078.897032  3204.689424  3065.818991  2084.322056  2108.568495   
Spicy   5280.847114  3134.597728  1015.311288  2036.494136  1001.532560   
Sweet   1056.169267  1015.368646  4217.145165  3134.734027  4144.826118   

City                 
Year           2012  
Taste                
Bitter  1070.925695  
Sour    3178.131540  
Spicy   3164.382635  
Sweet   3173.919338 

下面是我的代码:

import sys
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import random

matplotlib.style.use('ggplot')

def main():

    taste = ['Sweet','Spicy','Sour','Bitter']
    store = ['Asian','Italian','American','Greek','Mexican']

    df1 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})

    df2 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})

    df3 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})

    df4 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})

    df5 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})


    df6 = pd.DataFrame({'Taste':[random.choice(taste) for x in range(10)],
                       'Store':[random.choice(store) for x in range(10)],
                       'Sold':1000+100*np.random.rand(10)})



    df1['Year'] = '2010'
    df1['City'] = 'MIA'

    df2['Year'] = '2011'
    df2['City'] = 'MIA'

    df3['Year'] = '2012'
    df3['City'] = 'MIA'

    df4['Year'] = '2010'
    df4['City'] = 'ATL'

    df5['Year'] = '2011'
    df5['City'] = 'ATL'

    df6['Year'] = '2012'
    df6['City'] = 'ATL'


    DF = pd.concat([df1,df2,df3,df4,df5,df6])
    DFG = DF.groupby(['Taste', 'Year', 'City'])
    DFGSum = DFG.sum().unstack(['Year','City']).sum(axis=1,level=['City','Year'])
    print DFGSum

    '''
    In my plot, I want the color of the bars to be determined by the "City".
    For example: All "ATL" bar colors will be the same regardless of the year.
    '''
    DFGSum.plot(kind='bar')


    plt.show()

if __name__ == '__main__':
    main()

最佳答案

编辑以包括颜色循环和任意数量的城市

您将需要指定一些额外的参数以使其看起来不错,但像这样的东西可能会起作用

import itertools # for color cycling

# specify the colors you want for each city
color_cycle = itertools.cycle( plt.rcParams['axes.color_cycle']  )
colors = { cty:color_cycle.next() for cty in DF.City.unique() }

#spcify the relative position of each bar
n = len(list(DFGSum))
positions = linspace(-n/2., n/2., n)

# plot each column individually
for i,col in enumerate(list(DFGSum)):
    c = colors[col[0]]
    pos = positions[i]
    DFGSum[col].plot(kind='bar', color=c, 
                     position=pos, width=0.05)

plt.legend()
plt.show()

enter image description here

虽然在这里您无法分辨哪个柱对应于哪一年...

替代方案

您还可以制作一种稍微不同的绘图,它在刻度标签中保留年份信息。这可推广到任意数量的城市,并将保持默认颜色样式

df = DFG.sum().reset_index().set_index(['Taste','Year'])
u_cty = df.City.unique() #array(['ATL', 'MIA'], dtype=object)
df_list = []
for cty in u_cty:
    d = df.loc[ df.City==cty ]
    d = d[['Sold']].rename(columns={'Sold':cty}).reset_index()
    df_list.append(d)

df_merged = reduce(lambda left, right: pandas.merge(left, right, on=['Taste','Year'], how='outer'), df_list ) # merge the dataframes
df_merged.set_index(['Taste','Year'], inplace=True)
                     ATL          MIA
Taste  Year                          
Bitter 2010  3211.239754  2070.907629
       2011  2158.068222  2145.373251
       2012  2138.624730  1062.306874
Sour   2010  4188.024600          NaN
       2011  4323.003409          NaN
       2012  1042.772615  2136.742869
Spicy  2010  1018.737977  3155.450265
       2012  4171.954201  2096.569762
Sweet  2010  2098.679545  5324.078957
       2011  4215.376670  2115.964824
       2012  3152.998667  5277.410536
Spicy  2011          NaN  6295.032147

df_merged.plot(kind='bar')

enter image description here

关于python - 绘制多索引 DataFrame 条形图,其中颜色由类别决定,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31949769/

相关文章:

python - GridSearchCV 评分和 grid_scores_

pandas - 从列名重新索引 pandas 数据框

python - 从 pandas Dataframe 列制作条形图

python - 以清晰的方式在 1 个轴上显示 3 个直方图 - matplotlib

python - 以灰度保存 matplotlib 图

matplotlib - 获取文本的边界框并更新 Canvas

python - time.sleep 有问题

python - 使用不同 matplotlib 版本绘图的差异

Python multiprocess.Pool.map 不能处理大数组。

python-3.x - Pandas Series 值包含列表,如何计算唯一值并将其作为字典返回