python - 通过链式将总和列添加回 pandas 数据框

标签 python pandas

我正在尝试通过 assign() 将列总和列表添加回我的 pandas 数据帧。但我不太确定当有多于一列时该怎么做。鉴于我之前已经执行过其他步骤,最好的方法是什么或以链式方式执行此操作的任何其他方法是什么?

data2.assign(data2[rate_name].abs() / data2.groupby(level = 'date')[rate_name].transform('sum'))
                   rate_water  rate_fire  rate_wood
id     date                                        
apple  2019-01-01   -0.500000  -0.500000   0.000000
orange 2019-01-01   -0.636364  -0.963636   3.000000
melon  2019-01-01   -0.333333   5.666667  27.888889
apple  2020-01-01   -0.100000   7.900000  76.000000
orange 2020-01-01    0.363636  -0.963636  26.500000
melon  2020-01-01    0.166667   6.166667  27.235043
apple  2021-01-01    0.328571  26.261702  84.220779
orange 2021-01-01    0.363636  28.036364  28.683673
melon  2021-01-01    0.611111  39.944444  27.679487

可重现:

from pandas import Timestamp
data2 = pd.DataFrame.from_dict({'rate_water': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.6363636363636364, ('melon', Timestamp('2019-01-01 00:00:00')): -0.33333333333333337, ('apple', Timestamp('2020-01-01 00:00:00')): -0.10000000000000009, ('orange', Timestamp('2020-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2020-01-01 00:00:00')): 0.16666666666666663, ('apple', Timestamp('2021-01-01 00:00:00')): 0.3285714285714285, ('orange', Timestamp('2021-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2021-01-01 00:00:00')): 0.611111111111111}, 'rate_fire': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2019-01-01 00:00:00')): 5.666666666666667, ('apple', Timestamp('2020-01-01 00:00:00')): 7.9, ('orange', Timestamp('2020-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2020-01-01 00:00:00')): 6.166666666666667, ('apple', Timestamp('2021-01-01 00:00:00')): 26.261702127659575, ('orange', Timestamp('2021-01-01 00:00:00')): 28.036363636363635, ('melon', Timestamp('2021-01-01 00:00:00')): 39.94444444444444}, 'rate_wood': {('apple', Timestamp('2019-01-01 00:00:00')): 0.0, ('orange', Timestamp('2019-01-01 00:00:00')): 3.0, ('melon', Timestamp('2019-01-01 00:00:00')): 27.88888888888889, ('apple', Timestamp('2020-01-01 00:00:00')): 76.0, ('orange', Timestamp('2020-01-01 00:00:00')): 26.5, ('melon', Timestamp('2020-01-01 00:00:00')): 27.235042735042736, ('apple', Timestamp('2021-01-01 00:00:00')): 84.22077922077922, ('orange', Timestamp('2021-01-01 00:00:00')): 28.683673469387756, ('melon', Timestamp('2021-01-01 00:00:00')): 27.67948717948718}})
                   rate_water  rate_fire  rate_wood  sum_water  sum_fire    sum_wood
id     date                                                                         
apple  2019-01-01   -0.500000  -0.500000   0.000000  -1.469697   4.20303   30.888889
orange 2019-01-01   -0.636364  -0.963636   3.000000  -1.469697   4.20303   30.888889
melon  2019-01-01   -0.333333   5.666667  27.888889  -1.469697   4.20303   30.888889
apple  2020-01-01   -0.100000   7.900000  76.000000   0.430303  13.10303  129.735043
orange 2020-01-01    0.363636  -0.963636  26.500000   0.430303  13.10303  129.735043
melon  2020-01-01    0.166667   6.166667  27.235043   0.430303  13.10303  129.735043
apple  2021-01-01    0.328571  26.261702  84.220779   1.303319  94.24251  140.583940
orange 2021-01-01    0.363636  28.036364  28.683673   1.303319  94.24251  140.583940
melon  2021-01-01    0.611111  39.944444  27.679487   1.303319  94.24251  140.583940

最佳答案

对 Series 的 dict 使用字典理解,并使用 unpack ** 将其添加到 dataFrame 中:

from pandas import Timestamp
data2 = pd.DataFrame.from_dict({'rate_water': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.6363636363636364, ('melon', Timestamp('2019-01-01 00:00:00')): -0.33333333333333337, ('apple', Timestamp('2020-01-01 00:00:00')): -0.10000000000000009, ('orange', Timestamp('2020-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2020-01-01 00:00:00')): 0.16666666666666663, ('apple', Timestamp('2021-01-01 00:00:00')): 0.3285714285714285, ('orange', Timestamp('2021-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2021-01-01 00:00:00')): 0.611111111111111}, 'rate_fire': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2019-01-01 00:00:00')): 5.666666666666667, ('apple', Timestamp('2020-01-01 00:00:00')): 7.9, ('orange', Timestamp('2020-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2020-01-01 00:00:00')): 6.166666666666667, ('apple', Timestamp('2021-01-01 00:00:00')): 26.261702127659575, ('orange', Timestamp('2021-01-01 00:00:00')): 28.036363636363635, ('melon', Timestamp('2021-01-01 00:00:00')): 39.94444444444444}, 'rate_wood': {('apple', Timestamp('2019-01-01 00:00:00')): 0.0, ('orange', Timestamp('2019-01-01 00:00:00')): 3.0, ('melon', Timestamp('2019-01-01 00:00:00')): 27.88888888888889, ('apple', Timestamp('2020-01-01 00:00:00')): 76.0, ('orange', Timestamp('2020-01-01 00:00:00')): 26.5, ('melon', Timestamp('2020-01-01 00:00:00')): 27.235042735042736, ('apple', Timestamp('2021-01-01 00:00:00')): 84.22077922077922, ('orange', Timestamp('2021-01-01 00:00:00')): 28.683673469387756, ('melon', Timestamp('2021-01-01 00:00:00')): 27.67948717948718}})
data2.index.names=['id','date']

cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.assign(**{rate_name.replace('rate','sum'): 
                        data2[rate_name].abs() / data2.groupby(level = 'date')[rate_name].transform('sum') 
                        for rate_name in cols})
print (data2)
                   rate_water  rate_fire  rate_wood  sum_water  sum_fire  \
id     date                                                                
apple  2019-01-01   -0.500000  -0.500000   0.000000  -0.340206  0.118962   
orange 2019-01-01   -0.636364  -0.963636   3.000000  -0.432990  0.229272   
melon  2019-01-01   -0.333333   5.666667  27.888889  -0.226804  1.348234   
apple  2020-01-01   -0.100000   7.900000  76.000000   0.232394  0.602914   
orange 2020-01-01    0.363636  -0.963636  26.500000   0.845070  0.073543   
melon  2020-01-01    0.166667   6.166667  27.235043   0.387324  0.470629   
apple  2021-01-01    0.328571  26.261702  84.220779   0.252104  0.278661   
orange 2021-01-01    0.363636  28.036364  28.683673   0.279008  0.297492   
melon  2021-01-01    0.611111  39.944444  27.679487   0.468888  0.423847   

                   sum_wood  
id     date                  
apple  2019-01-01  0.000000  
orange 2019-01-01  0.097122  
melon  2019-01-01  0.902878  
apple  2020-01-01  0.585809  
orange 2020-01-01  0.204262  
melon  2020-01-01  0.209928  
apple  2021-01-01  0.599078  
orange 2021-01-01  0.204032  
melon  2021-01-01  0.196889  

另一种方法是一起处理所有列:

cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.join(data2[cols].abs().div(data2.groupby(level = 'date')[cols].transform('sum') )
                     .rename(columns=lambda x: x.replace('rate','sum')))

cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.assign(**data2[cols].abs().div(data2.groupby(level = 'date')[cols].transform('sum') )
                       .rename(columns=lambda x: x.replace('rate','sum')))
print (data2)
                   rate_water  rate_fire  rate_wood  sum_water  sum_fire  \
id     date                                                                
apple  2019-01-01   -0.500000  -0.500000   0.000000  -0.340206  0.118962   
orange 2019-01-01   -0.636364  -0.963636   3.000000  -0.432990  0.229272   
melon  2019-01-01   -0.333333   5.666667  27.888889  -0.226804  1.348234   
apple  2020-01-01   -0.100000   7.900000  76.000000   0.232394  0.602914   
orange 2020-01-01    0.363636  -0.963636  26.500000   0.845070  0.073543   
melon  2020-01-01    0.166667   6.166667  27.235043   0.387324  0.470629   
apple  2021-01-01    0.328571  26.261702  84.220779   0.252104  0.278661   
orange 2021-01-01    0.363636  28.036364  28.683673   0.279008  0.297492   
melon  2021-01-01    0.611111  39.944444  27.679487   0.468888  0.423847   

                   sum_wood  
id     date                  
apple  2019-01-01  0.000000  
orange 2019-01-01  0.097122  
melon  2019-01-01  0.902878  
apple  2020-01-01  0.585809  
orange 2020-01-01  0.204262  
melon  2020-01-01  0.209928  
apple  2021-01-01  0.599078  
orange 2021-01-01  0.204032  
melon  2021-01-01  0.196889  

关于python - 通过链式将总和列添加回 pandas 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72110839/

相关文章:

python - 应用程序在 Mercurial 存储库中访问自己的版本

python - 三个数字的组合总和为 1000

python - 有没有办法检查你正在运行的Python程序是什么?

python - 为什么sklearn.metrics.plot_confusion_matrix中的sklearn.metrics.confusion_matrix函数定义不一致?

python - 如何在 xticks 上方左右移动分类散点标记(每个类别多个数据集)?

python - 使用Python实现链表删除功能时出错

python - 在 pandas dataframe map 函数中使用 eval 语句的正确方法

python - 为什么 FrozenList 不同于元组?

python - 查找 Pandas Dataframe 列中某个区间内值的频率

python - 使用其中一个 DataFrame 的列名称合并两个 DataFrame