我有一个电子表格,其中包含以下格式的数据:
Brand | Model | Year | Cost | Tax
--------------------------------------
Apple | iPhone 7 | 2017 | $1000 | $100
Apple | iphone 7 | 2018 | $800 | $80
Xiomi | Note 5 | 2017 | $300 | $30
Xiomi | Note 5 | 2018 | $200 | $20
我想将上面的数据集转换为以下我想要显示 Mean
的数据集当行按['Brand', 'Model']
分组时成本列的和一个 结果 列,它是 Mean
的总和和Tax
列值:
Brand | Model | Year | Cost | Mean | Tax | Result
------------------------------------------------------------
Apple | iPhone 7 | 2017 | $1000 | $900 | $100 | $1000
Apple | iphone 7 | 2018 | $800 | $900 | $80 | $980
Xiomi | Note 5 | 2017 | $300 | $250 | $30 | $280
Xiomi | Note 5 | 2018 | $200 | $250 | $25 | $275
我一直在尝试使用 groupby函数,但无法获得如上所述的所需结果。
期待您的回复。谢谢。
最佳答案
首先使用replace
将值转换为整数,通过 transform
得到mean
,然后 sum
并在必要时最后转换回字符串:
cols = ['Cost','Tax']
df[cols] = df[cols].replace('\$','', regex=True).astype(int)
df['Mean'] = df.groupby(['Brand', 'Model'])['Cost'].transform('mean')
df['Result'] = df[['Mean','Tax']].sum(axis=1)
print (df)
Brand Model Year Cost Tax Mean Result
0 Apple iPhone 7 2017 1000 100 1000 1100
1 Apple iphone 7 2018 800 80 800 880
2 Xiomi Note 5 2017 300 30 250 280
3 Xiomi Note 5 2018 200 20 250 270
然后:
cols1 = cols + ['Result', 'Mean']
df[cols1] = '$' + df[cols1].astype(str)
print (df)
Brand Model Year Cost Tax Mean Result
0 Apple iPhone 7 2017 $1000 $100 $1000 $1100
1 Apple iphone 7 2018 $800 $80 $800 $880
2 Xiomi Note 5 2017 $300 $30 $250 $280
3 Xiomi Note 5 2018 $200 $20 $250 $270
关于python - 如何在 Pandas 中执行 groupby 并计算原始数据集中每行的平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54591062/