我在 pandas DF 中有一些 13961 行的数据,我在 X 轴(城市名称)中的类别有 30 多个唯一值,在 Y 轴上还有一个特征“保留标志”,只有两个级别(即保留/不保留) )。
使用 pd.crosstab 进行绘图时(它在 X 轴上显示了所有 30 多个城市的唯一值,这太笨拙和密集而难以理解)。相反,我可以仅显示 X 轴中前 20/10 个独特级别并保留其他级别(或将它们添加到其他类别中)吗?请帮忙,而不是强制只处理 pd.crosstab。
我创建了 pd.cross 选项卡,X 轴-BORROWER_CITY ,Y 轴-'Retention_Flag'
这在 X 轴上显示了所有 30 多个城市,而我只需要在 X 轴标签中显示前 n (20/30)
df2=data.groupby("BORROWER_CITY") ['Retention_Flag'].value_counts().groupby(level=1).nlargest(4).unstack(fill_value=0)
df2.plot(kind='bar')
The o/p obtained is shown above:
Retention_Flag Non Retained Retained
Retention_Flag BORROWER_CITY
Non Retained Bangalore 837 0
Delhi 1477 0
Mumbai 2507 0
Pune 838 0
Retained Bangalore 0 52
Chennai 0 106
Mumbai 0 168
Pune 0 67
the plot is shown above with 'RetentionFlag,BORROWER_CITY'in X axis - 8
entries
instead of having observed pic 2, having two entries for city column in
Xaxis for ( retained/non retained ) can i have single entry alone with city
name since, already i have legend for the flag.
second try:
instead of nlargest, while trying with head(4) the picture is shown as
myself expected , but it is not giving the largest value_counts() instead
resulting the city name in alphabetical order. observed pic 3
df3=data.groupby("BORROWER_CITY")['Retention_Flag'].value_counts().groupby(level=1).head(4).unstack(fill_value=0)
print(df3)
Retention_Flag Non Retained Retained
BORROWER_CITY
Adilabad 2 0
Agra 17 0
Ahmedabad 434 21
Ahmednagar 19 1
Alappuzha 0 1
Ambala 0 2
df3.plot(kind='bar')
the plot is shown above with 'BORROWER_CITY'in X axis - 6 entries
最佳答案
您可以使用 SeriesGroupBy.value_counts
计算两个类别的最高值和 GroupBy.head
,然后通过 Series.unstack
reshape 形状:
data = pd.DataFrame({
'BORROWER_CITY':list('abcdabaaadab'),
'Retention_Flag':['Ret', 'Non ret'] * 6,
})
print (data)
BORROWER_CITY Retention_Flag
0 a Ret
1 b Non ret
2 c Ret
3 d Non ret
4 a Ret
5 b Non ret
6 a Ret
7 a Non ret
8 a Ret
9 d Non ret
10 a Ret
11 b Non ret
<小时/>
df1 = pd.crosstab(data['BORROWER_CITY'],data['Retention_Flag'])
print (df1)
Retention_Flag Non ret Ret
BORROWER_CITY
a 1 5
b 3 0
c 0 1
d 2 0
<小时/>
N = 2
df2 = (data.groupby('BORROWER_CITY')['Retention_Flag']
.value_counts()
.groupby(level=1)
.head(N)
.unstack(fill_value=0))
print (df2)
Retention_Flag Non ret Ret
BORROWER_CITY
a 1 5
b 3 0
c 0 1
<小时/>
df2.plot(kind='bar')
编辑:
使用nlargest
的解决方案:
N = 2
df3 = (data.groupby('BORROWER_CITY')['Retention_Flag']
.value_counts()
.groupby(level=1)
.head(N)
.unstack(fill_value=0))
print (df3)
关于python - 仅在图中显示前 (n) 条的两个分类轴 (X,Y),可能是前 10 或 20 个(X 轴)唯一值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57458328/