python - 在值(value)条件下,有没有更快的方法来计算带有 pandas 的 groupby 对象的历史比率?

标签 python pandas pandas-groupby


| Player |   Result   |    Winning ratio (historical)     |
| K2000  | Lose       | 0% #first game so no hist         |
| K2000  | Lose       | 0% #0 game winned on 1 contested  |
| K2000  | Win        | 0% #0 game winned on 2 contested  |
| K2000  | Not ranked | 33% #1 game winned on 3 contested |
| K2000  | Lose       | 25% #and so on.                   |
| K2000  | Win        | 20%                               |
| K2000  | Win        | 33%                               |
| Kssis  | Win        | 0%                                |
| Kssis  | Win        | 100%                              |
| Kssis  | Not ranked | 100%                              |
| Kssis  | Lose       | 66%                               |
| Kssis  | Win        | 50%                               |


df['sucess'] = df.apply(lambda row: 1 if row['result'] == 'Win' else 0, axis = 1)
df['nb_of_contests'] = df.apply(lambda row: 1 , axis = 1)
| Player |   Result   | Sucess | Nb_of_contests |
| K2000  | Lose       |      0 |              1 |
| K2000  | Lose       |      0 |              1 |
| K2000  | Win        |      1 |              1 |
| K2000  | Not ranked |      0 |              1 |
| K2000  | Lose       |      0 |              1 |
| K2000  | Win        |      1 |              1 |
| K2000  | Win        |      1 |              1 |
| Kssis  | Win        |      1 |              1 |
| Kssis  | Win        |      1 |              1 |
| Kssis  | Not ranked |      0 |              1 |
| Kssis  | Lose       |      0 |              1 |
| Kssis  | Win        |      1 |              1 |

#then the sums cumulated
cumul = df.groupby('Player')['sucess','nb_of_contests'].cumsum()
#cumul gives
| Player |   Result   | Sucess | Nb_of_contests |
| K2000  | Lose       |      0 |              1 |
| K2000  | Lose       |      0 |              2 |
| K2000  | Win        |      1 |              3 |
| K2000  | Not ranked |      0 |              4 |
| K2000  | Lose       |      0 |              5 |
| K2000  | Win        |      2 |              6 |
| K2000  | Win        |      3 |              7 |
| Kssis  | Win        |      1 |              1 |
| Kssis  | Win        |      2 |              2 |
| Kssis  | Not ranked |      0 |              3 |
| Kssis  | Lose       |      0 |              4 |
| Kssis  | Win        |      3 |              5 |

#then compute the ratio
winning_ratio = cumul['sucess']/cumul['nb_of_contests']
#finnaly shift
gb_winning_ratio = winning_ratio.groupby('Player') #in order to shift inside group, because cumul is a dataframe not a groupby object.
winning_ratio_shifted = gb_winning_ratio.shift(1)


Pandas 版本:0.23.4 Python 版本:3.7.4




ValueError: cannot reindex from a duplicate axis


df = df.reset_index(drop=True)


df['sucess'] = (df['Result'] == 'Win').astype(int)
df['nb_of_contests'] = 1

cumul = df.groupby('Player')['sucess','nb_of_contests'].cumsum()
winning_ratio = cumul['sucess'].div(cumul['nb_of_contests'])

winning_ratio_shifted = winning_ratio.groupby(df['Player']).shift().fillna(0)

print (winning_ratio_shifted)
0     0.000000
1     0.000000
2     0.000000
3     0.333333
4     0.250000
5     0.200000
6     0.333333
7     0.000000
8     1.000000
9     1.000000
10    0.666667
11    0.500000
dtype: float64

或者您可以使用 DataFrame.assign 的一行解决方案每组带有链 cumsumshift:

winning_ratio_shifted = (df.assign(sucess = (df['Result'] == 'Win').astype(int), 
                                   nb_of_contests = 1)
                          .apply(lambda x: x.cumsum().shift())
                          .assign(new=lambda x: x['sucess'] / x['nb_of_contests'])['new']

print (winning_ratio_shifted)

1     0.000000
2     0.000000
3     0.333333
4     0.250000
5     0.200000
6     0.333333
7     0.000000
8     1.000000
9     1.000000
10    0.666667
11    0.500000
Name: new, dtype: float64

关于python - 在值(value)条件下,有没有更快的方法来计算带有 pandas 的 groupby 对象的历史比率?,我们在Stack Overflow上找到一个类似的问题:


python - 通过第二个索引访问 pandas groupby multiindex

python - 从文件中存储和检索列表

python - django.db.utils.OperationalError : 1005, 'Can' t创建表 `xyz` .`#sql-600_237`(错误号:150 "Foreign key constraint is incorrectly formed")

python - 如何在Python和C/C++中使用共享内存

python - 按一定顺序排序(情况: pandas DataFrame Groupby)

sqlite - Pandas/iPython 笔记本(Jupyter)中 DataFrame/table 中的 GROUP BY 行?

python - PyMySQL 插入 NULL 或字符串

python - pandas 计算中的最小值

python - 字典键内数据帧的外部合并

python - 检查 GROUP BY 和列之间的值