我正在研究国内足球比赛的排名, 我有以下数据框。
df = pd.DataFrame()
df ['Season'] = ['1314','1314','1314','1314','1314','1314','1314','1314','1314','1415','1415','1415','1415','1415','1415','1415','1415','1415']
df ['Team'] = ['A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B','C']
df ['GW'] = [1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3]
df['Position'] = [1,2,3,3,1,2,2,3,1,2,1,3,2,1,3,3,2,1]
df = df.sort_values (['Season','Team'])
df['Position_Change']=df.groupby(['Season','Team'])['Position'].apply(lambda x : x.diff().fillna(0))
以上代码可以跟踪排名和位置的变化。 现在,我想为最先成为冠军的团队分配一个状态。这意味着在所有 GW 中,地位冠军将被分配给该团队。其他团队作为他们在比赛上周结束的位置(在此示例中,最后 GW 为 3) 我的预期输出如下:
这是原始数据集: Click to download the dataset
我们将非常感谢您的建议。 谢谢,
泽普
最佳答案
IIUC 您需要如下所示的内容:
df1 = df.groupby(['Season','Team'])['Position'].apply(lambda x : np.select([(x.iloc[-1]==1),(x.iloc[-1]==2),(x.iloc[-1]==3)],['Champion','Second','Third'])).reset_index().rename(columns={'Position':'Status'})
print(df.merge(df1,on=['Team','Season']))
Season Team GW Position Position_Change Status
0 1314 A 1 1 0.0 Second
1 1314 A 2 3 2.0 Second
2 1314 A 3 2 -1.0 Second
3 1314 B 1 2 0.0 Third
4 1314 B 2 1 -1.0 Third
5 1314 B 3 3 2.0 Third
6 1314 C 1 3 0.0 Champion
7 1314 C 2 2 -1.0 Champion
8 1314 C 3 1 -1.0 Champion
9 1415 A 1 2 0.0 Third
10 1415 A 2 2 0.0 Third
11 1415 A 3 3 1.0 Third
12 1415 B 1 1 0.0 Second
13 1415 B 2 1 0.0 Second
14 1415 B 3 2 1.0 Second
15 1415 C 1 3 0.0 Champion
16 1415 C 2 3 0.0 Champion
17 1415 C 3 1 -2.0 Champion
基于Chat,将原代码中df1的代码替换为:
df1 = df.groupby(['Season','Team'])['Position'].apply(lambda x : np.select([(x.iloc[-1]==1),(2<=x.iloc[-1]<=4),(5<=x.iloc[-1]<=6),(7<=x.iloc[-1]<=17),(x.iloc[-1] > 17)],['Champion','UCL','UEL','Other','Relegation'])).reset_index().rename(columns={'Position':'Status'})
关于python - pandas 最终排名状态,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54255230/