我正在尝试通过 assign()
将列总和列表添加回我的 pandas 数据帧。但我不太确定当有多于一列时该怎么做。鉴于我之前已经执行过其他步骤,最好的方法是什么或以链式方式执行此操作的任何其他方法是什么?
data2.assign(data2[rate_name].abs() / data2.groupby(level = 'date')[rate_name].transform('sum'))
rate_water rate_fire rate_wood
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000
orange 2019-01-01 -0.636364 -0.963636 3.000000
melon 2019-01-01 -0.333333 5.666667 27.888889
apple 2020-01-01 -0.100000 7.900000 76.000000
orange 2020-01-01 0.363636 -0.963636 26.500000
melon 2020-01-01 0.166667 6.166667 27.235043
apple 2021-01-01 0.328571 26.261702 84.220779
orange 2021-01-01 0.363636 28.036364 28.683673
melon 2021-01-01 0.611111 39.944444 27.679487
可重现:
from pandas import Timestamp
data2 = pd.DataFrame.from_dict({'rate_water': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.6363636363636364, ('melon', Timestamp('2019-01-01 00:00:00')): -0.33333333333333337, ('apple', Timestamp('2020-01-01 00:00:00')): -0.10000000000000009, ('orange', Timestamp('2020-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2020-01-01 00:00:00')): 0.16666666666666663, ('apple', Timestamp('2021-01-01 00:00:00')): 0.3285714285714285, ('orange', Timestamp('2021-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2021-01-01 00:00:00')): 0.611111111111111}, 'rate_fire': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2019-01-01 00:00:00')): 5.666666666666667, ('apple', Timestamp('2020-01-01 00:00:00')): 7.9, ('orange', Timestamp('2020-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2020-01-01 00:00:00')): 6.166666666666667, ('apple', Timestamp('2021-01-01 00:00:00')): 26.261702127659575, ('orange', Timestamp('2021-01-01 00:00:00')): 28.036363636363635, ('melon', Timestamp('2021-01-01 00:00:00')): 39.94444444444444}, 'rate_wood': {('apple', Timestamp('2019-01-01 00:00:00')): 0.0, ('orange', Timestamp('2019-01-01 00:00:00')): 3.0, ('melon', Timestamp('2019-01-01 00:00:00')): 27.88888888888889, ('apple', Timestamp('2020-01-01 00:00:00')): 76.0, ('orange', Timestamp('2020-01-01 00:00:00')): 26.5, ('melon', Timestamp('2020-01-01 00:00:00')): 27.235042735042736, ('apple', Timestamp('2021-01-01 00:00:00')): 84.22077922077922, ('orange', Timestamp('2021-01-01 00:00:00')): 28.683673469387756, ('melon', Timestamp('2021-01-01 00:00:00')): 27.67948717948718}})
rate_water rate_fire rate_wood sum_water sum_fire sum_wood
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000 -1.469697 4.20303 30.888889
orange 2019-01-01 -0.636364 -0.963636 3.000000 -1.469697 4.20303 30.888889
melon 2019-01-01 -0.333333 5.666667 27.888889 -1.469697 4.20303 30.888889
apple 2020-01-01 -0.100000 7.900000 76.000000 0.430303 13.10303 129.735043
orange 2020-01-01 0.363636 -0.963636 26.500000 0.430303 13.10303 129.735043
melon 2020-01-01 0.166667 6.166667 27.235043 0.430303 13.10303 129.735043
apple 2021-01-01 0.328571 26.261702 84.220779 1.303319 94.24251 140.583940
orange 2021-01-01 0.363636 28.036364 28.683673 1.303319 94.24251 140.583940
melon 2021-01-01 0.611111 39.944444 27.679487 1.303319 94.24251 140.583940
最佳答案
对 Series 的 dict 使用字典理解,并使用 unpack **
将其添加到 dataFrame 中:
from pandas import Timestamp
data2 = pd.DataFrame.from_dict({'rate_water': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.6363636363636364, ('melon', Timestamp('2019-01-01 00:00:00')): -0.33333333333333337, ('apple', Timestamp('2020-01-01 00:00:00')): -0.10000000000000009, ('orange', Timestamp('2020-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2020-01-01 00:00:00')): 0.16666666666666663, ('apple', Timestamp('2021-01-01 00:00:00')): 0.3285714285714285, ('orange', Timestamp('2021-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2021-01-01 00:00:00')): 0.611111111111111}, 'rate_fire': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2019-01-01 00:00:00')): 5.666666666666667, ('apple', Timestamp('2020-01-01 00:00:00')): 7.9, ('orange', Timestamp('2020-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2020-01-01 00:00:00')): 6.166666666666667, ('apple', Timestamp('2021-01-01 00:00:00')): 26.261702127659575, ('orange', Timestamp('2021-01-01 00:00:00')): 28.036363636363635, ('melon', Timestamp('2021-01-01 00:00:00')): 39.94444444444444}, 'rate_wood': {('apple', Timestamp('2019-01-01 00:00:00')): 0.0, ('orange', Timestamp('2019-01-01 00:00:00')): 3.0, ('melon', Timestamp('2019-01-01 00:00:00')): 27.88888888888889, ('apple', Timestamp('2020-01-01 00:00:00')): 76.0, ('orange', Timestamp('2020-01-01 00:00:00')): 26.5, ('melon', Timestamp('2020-01-01 00:00:00')): 27.235042735042736, ('apple', Timestamp('2021-01-01 00:00:00')): 84.22077922077922, ('orange', Timestamp('2021-01-01 00:00:00')): 28.683673469387756, ('melon', Timestamp('2021-01-01 00:00:00')): 27.67948717948718}})
data2.index.names=['id','date']
cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.assign(**{rate_name.replace('rate','sum'):
data2[rate_name].abs() / data2.groupby(level = 'date')[rate_name].transform('sum')
for rate_name in cols})
print (data2)
rate_water rate_fire rate_wood sum_water sum_fire \
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000 -0.340206 0.118962
orange 2019-01-01 -0.636364 -0.963636 3.000000 -0.432990 0.229272
melon 2019-01-01 -0.333333 5.666667 27.888889 -0.226804 1.348234
apple 2020-01-01 -0.100000 7.900000 76.000000 0.232394 0.602914
orange 2020-01-01 0.363636 -0.963636 26.500000 0.845070 0.073543
melon 2020-01-01 0.166667 6.166667 27.235043 0.387324 0.470629
apple 2021-01-01 0.328571 26.261702 84.220779 0.252104 0.278661
orange 2021-01-01 0.363636 28.036364 28.683673 0.279008 0.297492
melon 2021-01-01 0.611111 39.944444 27.679487 0.468888 0.423847
sum_wood
id date
apple 2019-01-01 0.000000
orange 2019-01-01 0.097122
melon 2019-01-01 0.902878
apple 2020-01-01 0.585809
orange 2020-01-01 0.204262
melon 2020-01-01 0.209928
apple 2021-01-01 0.599078
orange 2021-01-01 0.204032
melon 2021-01-01 0.196889
另一种方法是一起处理所有列:
cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.join(data2[cols].abs().div(data2.groupby(level = 'date')[cols].transform('sum') )
.rename(columns=lambda x: x.replace('rate','sum')))
cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.assign(**data2[cols].abs().div(data2.groupby(level = 'date')[cols].transform('sum') )
.rename(columns=lambda x: x.replace('rate','sum')))
print (data2)
rate_water rate_fire rate_wood sum_water sum_fire \
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000 -0.340206 0.118962
orange 2019-01-01 -0.636364 -0.963636 3.000000 -0.432990 0.229272
melon 2019-01-01 -0.333333 5.666667 27.888889 -0.226804 1.348234
apple 2020-01-01 -0.100000 7.900000 76.000000 0.232394 0.602914
orange 2020-01-01 0.363636 -0.963636 26.500000 0.845070 0.073543
melon 2020-01-01 0.166667 6.166667 27.235043 0.387324 0.470629
apple 2021-01-01 0.328571 26.261702 84.220779 0.252104 0.278661
orange 2021-01-01 0.363636 28.036364 28.683673 0.279008 0.297492
melon 2021-01-01 0.611111 39.944444 27.679487 0.468888 0.423847
sum_wood
id date
apple 2019-01-01 0.000000
orange 2019-01-01 0.097122
melon 2019-01-01 0.902878
apple 2020-01-01 0.585809
orange 2020-01-01 0.204262
melon 2020-01-01 0.209928
apple 2021-01-01 0.599078
orange 2021-01-01 0.204032
melon 2021-01-01 0.196889
关于python - 通过链式将总和列添加回 pandas 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72110839/