我正在尝试使用 group by 对 pandas 数据框中的唯一值进行排序;
df = pd.DataFrame({
... 'gr1': ['A', 'A', 'A','A', 'B', 'B', 'B','B'],
'gr1_sum' : [100,100 ,100,100, 200,200,200,200],
'rank_gr1': [2, 2, 2, 2, 1, 1, 1, 1],
... 'gr2': ['a1', 'a1', 'a2','a2', 'b1', 'b1', 'b2','b2'],
'gr2_sum' : [30,30 ,40,40, 20,20,10,10]})
#df.sort_values(by=['col2'],inplace = True)
rank_gr1_sort = pd.unique(df['rank_gr1'].values)
rank_gr2_sort = df.sort_values(['rank_gr1']).groupby(['gr1','gr2'])['gr2_sum'].unique()
rank_gr1_sort
array([2, 1], dtype=int64)
rank_gr2_sort
gr1 gr2
A a1 [30]
a2 [40]
B b1 [20]
b2 [10]
Name: gr2_sum, dtype: object
我需要的是这个;
gr1 gr2
B b1 [20]
b2 [10]
A a1 [30]
a2 [40]
Name: gr2_sum, dtype: object
我如何实现这个输出?
谢谢!
pandas groupby sort within groups
Pandas Number of Unique Values and sort by the number of unique
最佳答案
在 groupby 下传递 sort=False
。
来自文档:
sort : bool, default True Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
rank_gr2_sort = df.sort_values(['rank_gr1']).groupby(
['gr1','gr2'],sort=False)['gr2_sum'].unique()
<小时/>
gr1 gr2
B b1 [20]
b2 [10]
A a1 [30]
a2 [40]
Name: gr2_sum, dtype: object
关于python - 根据 pandas 中的另一列对唯一值进行排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59522655/