python - Pandas Groupby 计算 ewm 未按预期工作

标签 python pandas pandas-groupby

假设我有一个如下所示的数据框

import pandas as pd

data = {'team': ['team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1',
              'team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2',],
     'score': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,1,2,3,4,5,6,7,8,9,10,11,12,13,14],
     'yards': [10,20,30,40,50,60,70,80,90,100,110,120,130,140,10,20,30,40,50,60,70,80,90,100,110,120,130,140]}

df = pd.DataFrame.from_dict(data)

我正在尝试使用这篇文章中找到的手动方法(Does Pandas calculate ewm wrong?)来计算“分数”和“码数”列的 ewm,但我注意到我的跨度对于每个分组团队来说并没有按预期工作。这就是我到目前为止的代码

ema_features = df[['team']].copy()

for feature_name in df[['score','yards']]:
    span=10
    feature_ema = (df.groupby('team')[feature_name].rolling(window=span, min_periods=span).mean()[:span])
    rest = df[feature_name][span:]
    x = pd.concat([feature_ema, rest]).ewm(span=span, adjust=False).mean()


    ema_features[feature_name] = x

输出如下

ema_features

    team    score   yards
0   team1   NaN NaN
1   team1   NaN NaN
2   team1   NaN NaN
3   team1   NaN NaN
4   team1   NaN NaN
5   team1   NaN NaN
6   team1   NaN NaN
7   team1   NaN NaN
8   team1   NaN NaN
9   team1   NaN NaN
10  team1   6.500000    65.000000
11  team1   7.500000    75.000000
12  team1   8.500000    85.000000
13  team1   9.500000    95.000000
14  team2   7.954545    79.545455
15  team2   6.871901    68.719008
16  team2   6.167919    61.679189
17  team2   5.773752    57.737518
18  team2   5.633070    56.330696
19  team2   5.699784    56.997843
20  team2   5.936187    59.361871
21  team2   6.311426    63.114258
22  team2   6.800257    68.002575
23  team2   7.382029    73.820289
24  team2   8.039842    80.398418
25  team2   8.759871    87.598706
26  team2   9.530803    95.308032
27  team2   10.343384   103.433844

我的问题是,如何让我的跨度也适用于团队 2?而不是上面的输出,其中团队 2 的 ewm 是与团队 1 一起计算的。我希望每个团队的 ewm 彼此单独计算,这需要应用正确的跨度,然后进行计算,就像我在下面所期望的那样。

   ema_features

        team    score   yards
    0   team1   NaN NaN
    1   team1   NaN NaN
    2   team1   NaN NaN
    3   team1   NaN NaN
    4   team1   NaN NaN
    5   team1   NaN NaN
    6   team1   NaN NaN
    7   team1   NaN NaN
    8   team1   NaN NaN
    9   team1   NaN NaN
    10  team1   6.500000    65.000000
    11  team1   7.500000    75.000000
    12  team1   8.500000    85.000000
    13  team1   9.500000    95.000000
    14  team2   NaN NaN
    15  team2   NaN NaN
    16  team2   NaN NaN
    17  team2   NaN NaN
    18  team2   NaN NaN
    19  team2   NaN NaN
    20  team2   NaN NaN
    21  team2   NaN NaN
    22  team2   NaN NaN
    23  team2   6.500000    65.000000
    24  team2   7.500000    75.000000
    25  team2   8.500000    85.000000
    26  team2   9.500000    95.000000

最佳答案

您可以尝试使用 GroupBy.apply具有自定义功能。因此,调整您的 for 循环,尝试如下操作:

def team_ema(team, span=10):
    feature_ema = team.rolling(window=span, min_periods=span).mean()[:span]
    rest = team[span:]
    return pd.concat([feature_ema, rest]).ewm(span=span, adjust=False).mean()

df.groupby('team').apply(team_ema)

关于python - Pandas Groupby 计算 ewm 未按预期工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52459397/

相关文章:

python - 获取字数大于 1 的索引值组

python - Pandas applymap 函数在应用于太多列时会删除行吗?

python - Django1.5 - python2.7 -如何显示和更新具有3个外键的数据库

python-3.x - 根据另一个数据帧中的位置从主数据帧中提取子序列

Python pandas hub_table 多个时间索引

python - Pandas 计数正/负/中性值

python - Airflow 中的成功邮件

python - 矢量化 numpy bincount

python - 为什么 int 转换比 pandas 中的 float 慢得多?

python - Pandas groupby 表示多列并计数单列