寻找一种基于 Pandas 中的 2 列进行分箱的快速而优雅的方法。
这是我的数据框
filename height width
0 shopfronts_23092017_3_285.jpg 750.0 560.0
1 shopfronts_200.jpg 4395.0 6020.0
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0
3 shopfronts_101.jpg 480.0 640.0
4 shopfronts_138.jpg 3733.0 8498.0
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0
6 shopfronts_25092017_neon_33.jpg 100.0 200.0
7 shopfronts_322.jpg 682.0 1024.0
8 shopfronts_171.jpg 800.0 600.0
9 shopfronts_23092017_3_35.jpg 120.0 210.0
我需要根据 2 列的高度和宽度(图像分辨率)对记录进行分类
我正在寻找这样的东西
filename height width group
0 shopfronts_23092017_3_285.jpg 750.0 560.0 g3
1 shopfronts_200.jpg 4395.0 6020.0 g4
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 others
3 shopfronts_101.jpg 480.0 640.0 others
4 shopfronts_138.jpg 3733.0 8498.0 g4
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0 g1
6 shopfronts_25092017_neon_33.jpg 100.0 200.0 g1
7 shopfronts_322.jpg 682.0 1024.0 others
8 shopfronts_171.jpg 800.0 600.0 g3
9 shopfronts_23092017_3_35.jpg 120.0 210.0 g1
where
g1: <= 400x300]
g2: (400x300, 640x480]
g3: (640x480, 800x600]
g4: > 800x600
others: If they don't comply to the requirement (Ex: records 7,2,3 - either height or width will fall in the categories defined but not both)
希望使用组列获取频率计数。如果这不是解决问题的最佳方法,如果有更好的方法,请告诉我。
最佳答案
您可以使用双重 pd.cut
即
bins = [0,400,640,800,np.inf]
df['group'] = pd.cut(df['height'].values, bins,labels=["g1","g2","g3",'g4'])
nbin = [0,300,480,600,np.inf]
t = pd.cut(df['width'].values, nbin,labels=["g1","g2","g3",'g4'])
df['group'] =np.where(df['group'] == t,df['group'],'others')
filename height width group 0 shopfronts_23092017_3_285.jpg 750.0 560.0 g3 1 shopfronts_200.jpg 4395.0 6020.0 g4 2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 others 3 shopfronts_101.jpg 480.0 640.0 others 4 shopfronts_138.jpg 3733.0 8498.0 g4 5 shopfronts_25092017_eateries_95.jpg 187.0 250.0 g1 6 shopfronts_25092017_neon_33.jpg 100.0 200.0 g1 7 shopfronts_322.jpg 682.0 1024.0 others 8 shopfronts_171.jpg 800.0 600.0 g3 9 shopfronts_23092017_3_35.jpg 120.0 210.0 g1
关于Python:基于 Pandas 中的 2 列分箱,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46472809/