如下代码所示,我想按account_id
对数据进行分组,然后对system_value
求和并将其重命名为total_value
并保留同时显示每个日期数据。
s = [
{'account_id': '1166470734', 'entity': 'entity1', 'system_value': 10.2, 'date': "2010-01-02", 'sale': 'sale1'},
{'account_id': '1166470734', 'entity': 'entity1', 'system_value': 2.2, 'date': "2010-01-03", 'sale': 'sale1'},
{'account_id': '123232323', 'entity': 'entity2', 'system_value': 4.2, 'date': "2010-01-03", 'sale': 'sale2'},
{'account_id': '123232323', 'entity': 'entity2', 'system_value': 5.2, 'date': "2010-01-04", 'sale': 'sale2'},
{'account_id': '4342343', 'entity': 'entity3', 'system_value': 10.2, 'date': "2010-01-04", 'sale': 'sale3'},
]
import pandas as pd
df = pd.DataFrame.from_records(s)
print(df)
# account_id entity system_value date sale
# 0 1166470734 entity1 10.2 2010-01-02 sale1
# 1 1166470734 entity1 2.2 2010-01-03 sale1
# 2 123232323 entity2 4.2 2010-01-03 sale2
# 3 123232323 entity2 5.2 2010-01-04 sale2
# 4 4342343 entity3 10.2 2010-01-04 sale3
预期输出是:
# account_id entity 2010-01-02 2010-01-03 2010-01-04 total_value sale
# 0 1166470734 entity1 10.2 2.2 12.4 sale1
# 1 123232323 entity2 4.2 5.2 9.4 sale2
# 2 4342343 entity3 10.2 10.2 sale3
抱歉,我是 pandas 的新手,怎样才能得到预期的结果?
Update for my question based on @Ch3steR's answer:
我尝试了一下,发现错误如下
import datetime
from decimal import Decimal
import pandas as pd
s = [
{'account_id': '21312312', 'entity': 'entityname1', 'ae': 'lwe', 'is_pc': 0, 'type': 2, 'medium': 0, 'our_side_entity': 3, 'settlement_title': 'settlementd', 'settlement_short_title': 'kim', 'settlement_type': 0, 'date': datetime.date(2020, 4, 9), 'sale': 'sale1' ,'system_value': Decimal('1038.36')},
{'account_id': '21312312', 'entity': 'entityname1', 'ae': 'lwe', 'is_pc': 0, 'type': 2, 'medium': 0, 'our_side_entity': 3, 'settlement_title': 'settlementd', 'settlement_short_title': 'kim', 'settlement_type': 0, 'date': datetime.date(2020, 4, 10), 'sale': 'sale1' ,'system_value': Decimal('1038.36')},
{'account_id': '21312312', 'entity': 'entityname1', 'ae': 'lwe', 'is_pc': 0, 'type': 2, 'medium': 0, 'our_side_entity': 3, 'settlement_title': 'settlementd', 'settlement_short_title': 'kim', 'settlement_type': 0, 'date': datetime.date(2020, 4, 11), 'sale': 'sale1' ,'system_value': Decimal('1038.36')},
{'account_id': '21312312', 'entity': 'entityname1', 'ae': 'lwe', 'is_pc': 0, 'type': 2, 'medium': 0, 'our_side_entity': 3, 'settlement_title': 'settlementd', 'settlement_short_title': 'kim', 'settlement_type': 0, 'date': datetime.date(2020, 4, 12), 'sale': 'sale1' ,'system_value': Decimal('1038.36')},
{'account_id': '21312312', 'entity': 'entityname1', 'ae': 'lwe', 'is_pc': 0, 'type': 2, 'medium': 0, 'our_side_entity': 3, 'settlement_title': 'settlementd', 'settlement_short_title': 'kim', 'settlement_type': 0, 'date': datetime.date(2020, 4, 13), 'sale': 'sale1' ,'system_value': Decimal('1038.36')},
]
df = pd.DataFrame.from_records(s)
df = df.pivot_table(index=['account_id', 'entity', 'ae', 'is_pc', 'type', 'medium', 'our_side_entity', 'settlement_title', 'settlement_short_title', 'settlement_type', 'sale'],columns='date',values='system_value').\
assign(total_sum=lambda x:x.sum(axis=1)).\
reset_index()
print(df)
# raise DataError("No numeric types to aggregate")
# pandas.core.base.DataError: No numeric types to aggregate
最佳答案
您可以使用df.pivot_table
与 df.assign
df.pivot_table(index=['account_id','entity','sale'],columns='date',values='system_value').\
assign(total_sum=lambda x:x.sum(axis=1)).\
reset_index()
date account_id entity sale 2010-01-02 2010-01-03 2010-01-04 total_sum
0 1166470734 entity1 sale1 10.2 2.2 NaN 12.4
1 123232323 entity2 sale2 NaN 4.2 5.2 9.4
2 4342343 entity3 sale3 NaN NaN 10.2 10.2
编辑:
查看df.dtypes
后,system_value
是object
类型。因此,出现了错误。
df.dtypes
account_id object
entity object
. .
. .
. .
date object
sale object
<i><b>system_value object</b></i>
dtype: object
将system_value
的dtype
设置为float
df = pd.DataFrame.from_records(s).astype({'system_value':'float'})
给出输出:
date account_id entity sale 2020-04-09 2020-04-10 2020-04-11 2020-04-12 2020-04-13 total_sum
0 21312312 entityname1 sale1 1038.36 1038.36 1038.36 1038.36 1038.36 5191.8
关于python - 用pandas groupby求和并重命名旧列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61967667/