Python pandas 到 groupby 数据框列并使用它们计算 Excel 工作表中的新列

标签 python pandas pandas-groupby

dataset1.xlsx 收集的我的 DataFrame 如下所示:

TimePoint  Object 0  Object 1  Object 2  Object 3  Object 4  Object 0  Object 1  Object 2  Object 3  Object 4
0       10   4642.99   2000.71  4869.52   4023.69   3008.99   11188.15  2181.62   12493.47  10275.15  8787.99
1       20   4640.09   2005.17  4851.07   4039.73   3007.16   11129.38  2172.37   12438.31  10218.92  8723.45

问题: 数据包含具有重复名称的标题列,需要聚合它们以查找出现的情况,然后初始化每个对象的 IDAIAA 值。

根据这些新值需要计算FcEAPP值。因此,最终的 Excel 输出应如下所示:

   TimePoint    Objects     IDA         IAA         Fc (using IDA- (a * IAA))   EAPP (using Fc/ (Fc + (G *Fc)))
    10          Object 0    4642.99     11188.15    3300.412                    0.463177397
    10          Object 1    2000.71     2181.62     -527.78758                  1
    10          Object 2    4869.52     12493.47    4869.52                     1
    10          Object 3    4023.69     10275.15    4023.69                     1
    10          Object 4    3008.99     8787.99     3008.99                     1
    20          Object 0    4640.09     11129.38    4640.09                     1
    20          Object 1    2005.17     2172.37     2005.17                     1
    20          Object 2    4851.07     12438.31    4851.07                     1
    20          Object 3    4039.73     10218.92    4039.73                     1
    20          Object 4    3007.16     8723.45     3007.16                     1

我尝试使用以下 python 脚本解决这个问题:

def main():
    all_data = pd.DataFrame()
    a = 0.12
    G = 1.159

    for f in glob.glob("data/dataset1.xlsx"):
        df = pd.read_excel(f, 'Sheet1') # , header=[1]
        all_data = all_data.append(df, ignore_index=True, sort=False)

        all_data.columns = all_data.columns.str.split('.').str[0]
        print(all_data)

        object_df = all_data.groupby(all_data.columns, axis=1)
        print(object_df)
        for k in object_df.groups.keys():
            if k != 'TimePoint':
                for row_index, row in object_df.get_group(k).iterrows():
                    print(row)
                    # This logic is not working to group by Object and then apply the Following formula

                # TODO: Calculation for the new added  columns Assumption every time there will be two occurrence of any
                # Object i.e. Object 0...4 in this example but Object count can varies sometime only one Object can
                # appear
                # IDA is the first occurrence value of the Object
                all_data['IDA'] = row[0] # This is NOT correct
                # IAA is the second occurrence value of the Object
                all_data['IAA'] = row[1]

                all_data['Fc'] = all_data.IDA.fillna(0) - (a * all_data.IAA.fillna(0))
                all_data['EAPP'] = all_data.Fc.fillna(0) / (all_data.Fc.fillna(0) + (G * all_data.Fc.fillna(0)))


    # now save the data frame
    writer = pd.ExcelWriter('data/dataset1.xlsx')
    all_data.to_excel(writer, 'Sheet2', index=True)
    writer.save()


if __name__ == '__main__':
    main()

请让我知道如何在 pandas 中使用 groupby 为每个对象分配 IDA 和 IAA 值,引用我上面的代码。

最佳答案

我认为melt可能对你有很大帮助

import pandas as pd

df = pd.read_clipboard()

# This part of breaking the df into 2 might be different based on how your reading the dataframe into memory
df1 = df[df.columns[:6]]
df2 = df[['TimePoint'] + df.columns.tolist()[6:]]

tdf1 = df1.melt(['TimePoint']).assign(key=range(10))
tdf2 = df2.melt(['TimePoint']).assign(key=range(10)).drop(['TimePoint', 'variable'], axis=1)

df = tdf1.merge(tdf2, on='key', how='left').drop(['key'], axis=1).rename(columns={'value_x': 'IDA', 'value_y': 'IAA'})

a = 0.12
G = 1.159

df['Fc'] = df['IDA'] - a * df['IAA']
df['EAPP'] = df['Fc'].div(df['Fc']+(G*df['Fc']))
   TimePoint  variable      IDA       IAA         Fc      EAPP
0         10  Object_0  4642.99  11188.15  3300.4120  0.463177
1         20  Object_0  4640.09  11129.38  3304.5644  0.463177
2         10  Object_1  2000.71   2181.62  1738.9156  0.463177
3         20  Object_1  2005.17   2172.37  1744.4856  0.463177
4         10  Object_2  4869.52  12493.47  3370.3036  0.463177
5         20  Object_2  4851.07  12438.31  3358.4728  0.463177
6         10  Object_3  4023.69  10275.15  2790.6720  0.463177
7         20  Object_3  4039.73  10218.92  2813.4596  0.463177
8         10  Object_4  3008.99   8787.99  1954.4312  0.463177
9         20  Object_4  3007.16   8723.45  1960.3460  0.463177

关于Python pandas 到 groupby 数据框列并使用它们计算 Excel 工作表中的新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59708753/

相关文章:

python - 将整数转换为字符串 MM :SS in Python

python - 如何使用 Python 和 OpenCV 从左到右、从上到下对轮廓进行排序

python - 将字典映射到数据框中的部分字符串匹配

python - pandas to_timedelta函数似乎将数据转换为0

python - 旋转 DataFrameGroupBy panadas 对象

python: Pandas - 将复杂的 json 转换为数据帧

python - 使用 __init__ 函数来初始化 Python 中的对象

python - 根据 pandas DataFrame 中的组将字符串组合到列中

python - 使用groupby后如何获得正确的平均值?

python - 如何计算多索引数据框中每天的行数?