python - 仅在图中显示前 (n) 条的两个分类轴 (X,Y)，可能是前 10 或 20 个(X 轴)唯一值

我在 pandas DF 中有一些 13961 行的数据，我在 X 轴(城市名称)中的类别有 30 多个唯一值，在 Y 轴上还有一个特征“保留标志”，只有两个级别(即保留/不保留) )。

使用 pd.crosstab 进行绘图时(它在 X 轴上显示了所有 30 多个城市的唯一值，这太笨拙和密集而难以理解)。相反，我可以仅显示 X 轴中前 20/10 个独特级别并保留其他级别(或将它们添加到其他类别中)吗？请帮忙，而不是强制只处理 pd.crosstab。

我创建了 pd.cross 选项卡，X 轴-BORROWER_CITY ，Y 轴-'Retention_Flag'

这在 X 轴上显示了所有 30 多个城市，而我只需要在 X 轴标签中显示前 n (20/30)

     df2=data.groupby("BORROWER_CITY") ['Retention_Flag'].value_counts().groupby(level=1).nlargest(4).unstack(fill_value=0)
    df2.plot(kind='bar')


The o/p obtained is shown above:

Retention_Flag                Non Retained  Retained
Retention_Flag BORROWER_CITY                        
Non Retained   Bangalore               837         0
               Delhi                  1477         0
               Mumbai                 2507         0
               Pune                    838         0
Retained       Bangalore                 0        52
               Chennai                   0       106
               Mumbai                    0       168
               Pune                      0        67

the plot is shown above with 'RetentionFlag,BORROWER_CITY'in X axis - 8 
entries

instead of having observed pic 2, having two entries for city column in 
Xaxis for ( retained/non retained ) can i have single entry alone with city 
name since, already i have legend for the flag. 

second try:
instead of nlargest, while trying with head(4) the picture is shown as 
myself expected , but it is not giving the largest value_counts() instead 
resulting the city name in alphabetical order. observed pic 3
df3=data.groupby("BORROWER_CITY")['Retention_Flag'].value_counts().groupby(level=1).head(4).unstack(fill_value=0)
print(df3)
    Retention_Flag  Non Retained  Retained
BORROWER_CITY                         
Adilabad                   2         0
Agra                      17         0
Ahmedabad                434        21
Ahmednagar                19         1
Alappuzha                  0         1
Ambala                     0         2
df3.plot(kind='bar')

the plot is shown above with 'BORROWER_CITY'in X axis - 6 entries

最佳答案

您可以使用 SeriesGroupBy.value_counts 计算两个类别的最高值和 GroupBy.head ，然后通过 Series.unstack reshape 形状:

data = pd.DataFrame({
        'BORROWER_CITY':list('abcdabaaadab'),
         'Retention_Flag':['Ret', 'Non ret'] * 6,

})

print (data)
   BORROWER_CITY Retention_Flag
0              a            Ret
1              b        Non ret
2              c            Ret
3              d        Non ret
4              a            Ret
5              b        Non ret
6              a            Ret
7              a        Non ret
8              a            Ret
9              d        Non ret
10             a            Ret
11             b        Non ret

<小时/>

df1 = pd.crosstab(data['BORROWER_CITY'],data['Retention_Flag'])
print (df1)
Retention_Flag  Non ret  Ret
BORROWER_CITY               
a                     1    5
b                     3    0
c                     0    1
d                     2    0

<小时/>

N = 2
df2 = (data.groupby('BORROWER_CITY')['Retention_Flag']
           .value_counts()
           .groupby(level=1)
           .head(N)
           .unstack(fill_value=0))
print (df2)
Retention_Flag  Non ret  Ret
BORROWER_CITY               
a                     1    5
b                     3    0
c                     0    1

<小时/>

df2.plot(kind='bar')

编辑:

使用nlargest的解决方案:

N = 2
df3 = (data.groupby('BORROWER_CITY')['Retention_Flag']
           .value_counts()
           .groupby(level=1)
           .head(N)
           .unstack(fill_value=0))
print (df3)

关于python - 仅在图中显示前 (n) 条的两个分类轴 (X,Y)，可能是前 10 或 20 个(X 轴)唯一值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57458328/

python - 仅在图中显示前 (n) 条的两个分类轴 (X,Y)，可能是前 10 或 20 个(X 轴)唯一值

上一篇：python - 如何更改 Django Rest Framework 中多对多相关对象的表示

下一篇：python - 有没有一个覆盖所有html实体的python模块？

python - 仅在图中显示前 (n) 条的两个分类轴 (X,Y)，可能是前 10 或 20 个(X 轴)唯一值

上一篇：python - 如何更改 Django Rest Framework 中多对多相关对象的表示

下一篇：python - 有没有一个覆盖所有ht​​ml实体的python模块？

下一篇：python - 有没有一个覆盖所有html实体的python模块？