python - 根据列的最大行数创建新行

标签 python pandas dataframe

所以我正在尝试根据过去的数据在时间序列中创建新数据。例如,我这里有球员数据,每一行都是在某个年龄段积累的统计数据。我想在 Dataframe 中创建新行,我将最大年龄增加一,然后取 sa 的平均值和 ga前两年的专栏。

这是数据

import pandas as pd

data = [['Adam Wilcox', 8476330, 25, 14.0, 0.0],
        ['Adin Hill', 8478499, 21, 129.0, 14.0],
        ['Adin Hill', 8478499, 22, 322.0, 32.0],
        ['Adin Hill', 8478499, 23, 343.0, 28.0],
        ['Adin Hill', 8478499, 24, 530.0, 46.0],
        ['Adin Hill', 8478499, 25, 237.0, 26.0],
        ['Al Montoya', 8471219, 24, 120.0, 9.0],
        ['Al Montoya', 8471219, 26, 585.0, 46.0],
        ['Al Montoya', 8471219, 27, 832.0, 89.0],
        ['Al Montoya', 8471219, 28, 168.0, 17.0]]

model_df = pd.DataFrame(data, 
                         columns=['player', 'player_id', 'season_age', 'sa', 'ga'])

例如我想要创建的是 ['Al Montoya', 8471219, 29, 500, 53] (请记住,最后两个值是 28 岁和 27 岁的 saga 列的平均值)。

我已经使用 iterrows 完成了这个并创建一个新的 Dataframe 并像这样附加:

max_ages = model_df.groupby(['player', 'player_id'])[['season_age']].max().reset_index()
added_ages = []
for player in max_ages.iterrows():

    row = [player[1][0],
           player[1][1],
           player[1][2] + 1, 
           (model_df[(model_df['player_id'] == player[1][1]) &
                    (model_df['season_age'] == player[1][2] - 1)]['sa'].sum() +
           model_df[(model_df['player_id'] == player[1][1]) &
                    (model_df['season_age'] == player[1][2] - 2)]['sa'].sum())/2,
           (model_df[(model_df['player_id'] == player[1][1]) &
                    (model_df['season_age'] == player[1][2] - 1)]['ga'].sum() +
           model_df[(model_df['player_id'] == player[1][1]) &
                    (model_df['season_age'] == player[1][2] - 2)]['ga'].sum())/2
          ]
    added_ages.append(row)

added_ages_df = pd.DataFrame(added_ages, 
                             columns=['player', 'player_id', 'season_age', 'sa', 'ga'])
model_df = pd.concat([model_df, added_ages_df])

显然这是一个非常脆弱的临时解决方案,我的问题是 pandas 中是否有内置方式?在不使用 iterrows 的情况下执行此操作

预期的 Dataframe 看起来更容易以列表形式表示

data = [['Adam Wilcox', 8476330, 25, 14.0, 0.0],
        ['Adin Hill', 8478499, 21, 129.0, 14.0],
        ['Adin Hill', 8478499, 22, 322.0, 32.0],
        ['Adin Hill', 8478499, 23, 343.0, 28.0],
        ['Adin Hill', 8478499, 24, 530.0, 46.0],
        ['Adin Hill', 8478499, 25, 237.0, 26.0],
        ['Adin Hill', 8478499, 26, 502, 36],
        ['Al Montoya', 8471219, 24, 120.0, 9.0],
        ['Al Montoya', 8471219, 26, 585.0, 46.0],
        ['Al Montoya', 8471219, 27, 832.0, 89.0],
        ['Al Montoya', 8471219, 28, 168.0, 17.0],
        ['Al Montoya', 8471219, 29, 500, 53]]

最佳答案

您可以定义一个名为 add_row 的函数并将其传递给 groupby。我假设如果没有两年的球员数据,您将希望 saga 列填充 NaN:

def add_row(x):
    last_row = x.iloc[-1]
    last_row['season_age'] = last_row['season_age']+1
    if len(x) < 2:
        last_row['sa'], last_row['ga'] = float("nan"), float("nan")
        return x.append(last_row)
    else:
        last_row['sa'], last_row['ga'] = x[['sa','ga']].iloc[-2:].mean()
        return x.append(last_row)

new_model_df = model_df.groupby("player").apply(add_row).reset_index(drop=True)

输出:

>>> new_model_df
         player  player_id  season_age     sa    ga
0   Adam Wilcox    8476330          25   14.0   0.0
1   Adam Wilcox    8476330          26    NaN   NaN
2     Adin Hill    8478499          21  129.0  14.0
3     Adin Hill    8478499          22  322.0  32.0
4     Adin Hill    8478499          23  343.0  28.0
5     Adin Hill    8478499          24  530.0  46.0
6     Adin Hill    8478499          25  237.0  26.0
7     Adin Hill    8478499          26  383.5  36.0
8    Al Montoya    8471219          24  120.0   9.0
9    Al Montoya    8471219          26  585.0  46.0
10   Al Montoya    8471219          27  832.0  89.0
11   Al Montoya    8471219          28  168.0  17.0
12   Al Montoya    8471219          29  500.0  53.0

关于python - 根据列的最大行数创建新行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70131912/

相关文章:

python - 使用另一个时间序列的索引对时间序列重新采样

复制data.frame的每一行并指定每行的复制次数?

python - 从列表中仅替换数据框中的几个标题

python - gdb python api : is it possible to make a call to a class/struct method

Python:如何在小于目标的列表中查找值

python /PyQt4 : How do you find the SIZE of a monitor (in inches)?

python - 复制的OpenCV图像与原始图像不同

python - 将大文件从 .zip 存档写入 Pandas 数据帧

python - pandas python 中的 VLOOKUP Excel 模拟

python - 索引 - 使用 Pandas 匹配