我是 Pandas 新手。 我有一个数据集是赛马结果。示例如下:
RaceID RaceDate RaceMeet Position Horse Jockey Trainer RaceLength race win HorseWinPercentage
446252 01/01/2008 Southwell (AW) 1 clear reef tom mclaughlin jane chapple-hyam 3101 1 1 0
447019 14/01/2008 Southwell (AW) 5 clear reef tom mclaughlin jane chapple-hyam 2654 1 0 100
449057 21/02/2008 Southwell (AW) 2 clear reef tom mclaughlin jane chapple-hyam 3101 1 0 50
463805 26/08/2008 Chelmsford (AW) 6 clear reef tom mclaughlin jane chapple-hyam 3080 1 0 33.33333333
469220 27/11/2008 Chelmsford (AW) 3 clear reef tom mclaughlin jane chapple-hyam 3080 1 0 25
470195 11/12/2008 Chelmsford (AW) 5 clear reef tom mclaughlin jane chapple-hyam 3080 1 0 20
471052 26/12/2008 Wolhampton (AW) 1 clear reef andrea atzeni jane chapple-hyam 2690 1 1 16.66666667
471769 07/01/2009 Wolhampton (AW) 6 clear reef ian mongan jane chapple-hyam 2690 1 0 28.57142857
472137 13/01/2009 Chelmsford (AW) 2 clear reef jamie spencer jane chapple-hyam 3080 1 0 25
472213 20/01/2009 Southwell (AW) 5 clear reef jamie spencer jane chapple-hyam 2654 1 0 22.22222222
476595 25/03/2009 Kempton (AW) 4 clear reef pat cosgrave jane chapple-hyam 2639 1 0 20
477674 08/04/2009 Kempton (AW) 5 clear reef pat cosgrave jane chapple-hyam 2639 1 0 18.18181818
479098 21/04/2009 Kempton (AW) 3 clear reef andrea atzeni jane chapple-hyam 2639 1 0 16.66666667
492913 14/11/2009 Wolhampton (AW) 1 clear reef andrea atzeni jane chapple-hyam 3639 1 1 15.38461538
493720 25/11/2009 Kempton (AW) 3 clear reef andrea atzeni jane chapple-hyam 3518 1 0 21.42857143
495863 29/12/2009 Southwell (AW) 1 clear reef shane kelly jane chapple-hyam 3101 1 1 20
我希望能够通过 groupby() 多个轴来计算获胜次数并创建特定赛道和长度的组合获胜百分比或结果。
当我只需要按单个轴进行分组时 - 它效果很好:
df['horse_win_count'] = df.groupby(['Horse'])['win'].cumsum()
df['horse_race_count'] = df.groupby(['Horse'])['race'].cumsum()
df['HorseWinPercentage2'] = df['horse_win_count'] / df['horse_race_count'] * 100
df['HorseWinPercentage'] = df.groupby('Horse')['HorseWinPercentage2'].shift(+1)
但是,当我需要对多个轴进行分组时,我会得到一些非常奇怪的结果。
例如,我要为特定骑师骑特定训练师的马时创建获胜百分比 - groupby([‘Jockey’,’Trainer’])。然后我需要知道每行(比赛)变化的百分比。
df['jt_win_count'] = df.groupby(['Jockey','Trainer'])['win'].cumsum()
df['jt_race_count'] = df.groupby(['Jockey','Trainer'])['race'].cumsum()
df['JTWinPercentage2'] = df['jt_win_count'] / df['jt_race_count'] * 100
df['JTWinPercentage'] = df.groupby(['Jockey','Trainer'])['JTWinPercentage2'].shift(+1)
df['JTWinPercentage'].fillna(0, inplace=True)
或者我想计算一匹马在该路线和距离上获胜的次数。所以我需要 groupby(['Horse', 'RaceMeet','RaceLength']):
df['CD'] = df.groupby([‘RaceMeet’,’RaceLength’,’Horse’])[‘win’].cumsum()
df['CD'] = df.groupby(["RaceMeet","RaceLength","Horse"]).shift(+1)
我得到了 10 到 1000 的结果。
如何在按多个条目分组时按多个轴进行分组、进行计算并将结果向后移一个条目?
您能解释一下为什么我上面的代码不起作用吗?就像我说的,我是 Pandas 新手,并且热衷于学习。
干杯。
最佳答案
问题已被提出:Pandas DataFrame Groupby two columns and get counts和这里 python pandas groupby() result
我真的不知道你的目标是什么。
我想您应该首先添加另一列,其中包含要分组的新参数。例如: df['jockeyTrainer']=df.loc['Jockey']+df.loc['Trainer']
然后你可以用它来分组。或者您可以按照链接中的信息进行操作。
关于python - Pandas - 多轴上的 groupby() 上的 cumsum(),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58097508/