从 dataset1.xlsx 收集的我的 DataFrame 如下所示:
TimePoint Object 0 Object 1 Object 2 Object 3 Object 4 Object 0 Object 1 Object 2 Object 3 Object 4
0 10 4642.99 2000.71 4869.52 4023.69 3008.99 11188.15 2181.62 12493.47 10275.15 8787.99
1 20 4640.09 2005.17 4851.07 4039.73 3007.16 11129.38 2172.37 12438.31 10218.92 8723.45
问题: 数据包含具有重复名称的标题列,需要聚合它们以查找出现的情况,然后初始化每个对象的 IDA 和 IAA 值。
根据这些新值需要计算Fc和EAPP值。因此,最终的 Excel 输出应如下所示:
TimePoint Objects IDA IAA Fc (using IDA- (a * IAA)) EAPP (using Fc/ (Fc + (G *Fc)))
10 Object 0 4642.99 11188.15 3300.412 0.463177397
10 Object 1 2000.71 2181.62 -527.78758 1
10 Object 2 4869.52 12493.47 4869.52 1
10 Object 3 4023.69 10275.15 4023.69 1
10 Object 4 3008.99 8787.99 3008.99 1
20 Object 0 4640.09 11129.38 4640.09 1
20 Object 1 2005.17 2172.37 2005.17 1
20 Object 2 4851.07 12438.31 4851.07 1
20 Object 3 4039.73 10218.92 4039.73 1
20 Object 4 3007.16 8723.45 3007.16 1
我尝试使用以下 python 脚本解决这个问题:
def main():
all_data = pd.DataFrame()
a = 0.12
G = 1.159
for f in glob.glob("data/dataset1.xlsx"):
df = pd.read_excel(f, 'Sheet1') # , header=[1]
all_data = all_data.append(df, ignore_index=True, sort=False)
all_data.columns = all_data.columns.str.split('.').str[0]
print(all_data)
object_df = all_data.groupby(all_data.columns, axis=1)
print(object_df)
for k in object_df.groups.keys():
if k != 'TimePoint':
for row_index, row in object_df.get_group(k).iterrows():
print(row)
# This logic is not working to group by Object and then apply the Following formula
# TODO: Calculation for the new added columns Assumption every time there will be two occurrence of any
# Object i.e. Object 0...4 in this example but Object count can varies sometime only one Object can
# appear
# IDA is the first occurrence value of the Object
all_data['IDA'] = row[0] # This is NOT correct
# IAA is the second occurrence value of the Object
all_data['IAA'] = row[1]
all_data['Fc'] = all_data.IDA.fillna(0) - (a * all_data.IAA.fillna(0))
all_data['EAPP'] = all_data.Fc.fillna(0) / (all_data.Fc.fillna(0) + (G * all_data.Fc.fillna(0)))
# now save the data frame
writer = pd.ExcelWriter('data/dataset1.xlsx')
all_data.to_excel(writer, 'Sheet2', index=True)
writer.save()
if __name__ == '__main__':
main()
请让我知道如何在 pandas 中使用 groupby 为每个对象分配 IDA 和 IAA 值,引用我上面的代码。
最佳答案
我认为melt可能对你有很大帮助
import pandas as pd
df = pd.read_clipboard()
# This part of breaking the df into 2 might be different based on how your reading the dataframe into memory
df1 = df[df.columns[:6]]
df2 = df[['TimePoint'] + df.columns.tolist()[6:]]
tdf1 = df1.melt(['TimePoint']).assign(key=range(10))
tdf2 = df2.melt(['TimePoint']).assign(key=range(10)).drop(['TimePoint', 'variable'], axis=1)
df = tdf1.merge(tdf2, on='key', how='left').drop(['key'], axis=1).rename(columns={'value_x': 'IDA', 'value_y': 'IAA'})
a = 0.12
G = 1.159
df['Fc'] = df['IDA'] - a * df['IAA']
df['EAPP'] = df['Fc'].div(df['Fc']+(G*df['Fc']))
TimePoint variable IDA IAA Fc EAPP
0 10 Object_0 4642.99 11188.15 3300.4120 0.463177
1 20 Object_0 4640.09 11129.38 3304.5644 0.463177
2 10 Object_1 2000.71 2181.62 1738.9156 0.463177
3 20 Object_1 2005.17 2172.37 1744.4856 0.463177
4 10 Object_2 4869.52 12493.47 3370.3036 0.463177
5 20 Object_2 4851.07 12438.31 3358.4728 0.463177
6 10 Object_3 4023.69 10275.15 2790.6720 0.463177
7 20 Object_3 4039.73 10218.92 2813.4596 0.463177
8 10 Object_4 3008.99 8787.99 1954.4312 0.463177
9 20 Object_4 3007.16 8723.45 1960.3460 0.463177
关于Python pandas 到 groupby 数据框列并使用它们计算 Excel 工作表中的新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59708753/