我有一个包含交易的数据框。我想将类似的交易组合在一起,并计算 python 中单个客户的这些交易的发生次数。 数据如下所示:
account transaction_date transaction description transaction_amt
55625 15/may/19 POS: McDonalds $15
55625 01/may/19 Netflix $31.5
55625 28/may/19 POS:H&M $150
55625 6/apr/19 Netflix $9
55625 30/may McDonalds $6
55625 25/may/19 POS:H&M $32
55625 6/mar/19 POS:H&M $32
我希望以这样一种方式对数据进行分组,即计算一个月内访问商店的次数和花费的总金额 它应该看起来像这样:
account trans_date trans_description total_spent
55625 may/19 McDonalds $21
55625 may/19 H&M $182
55625 mar/19 H&M $32
55625 may/19 Netflix $31.5
55625 apr/19 Netflix $9
最佳答案
首先处理带有日期时间的列,删除 description
列中 :
之前的值,并从 transaction_amt
中删除 $
和转换为数字:
df['transaction_date'] = (pd.to_datetime(df['transaction_date'],
format='%d/%b/%y', errors='coerce')
.dt.strftime('%b/%y'))
df['transaction description'] = df['transaction description'].str.split(':').str[-1]
df['transaction_amt'] = df['transaction_amt'].str.lstrip('$').astype(float)
print (df)
account transaction_date transaction description transaction_amt
0 55625 May/19 McDonalds 15.0
1 55625 May/19 Netflix 31.5
2 55625 May/19 H&M 150.0
3 55625 Apr/19 Netflix 9.0
4 55625 NaT McDonalds 6.0
5 55625 May/19 H&M 32.0
6 55625 Mar/19 H&M 32.0
然后合计:
df1 = (df.groupby(['account','transaction_date','transaction description'])['transaction_amt']
.sum()
.reset_index(name='total_spent'))
print (df1)
account transaction_date transaction description total_spent
0 55625 Apr/19 Netflix 9.0
1 55625 Mar/19 H&M 32.0
2 55625 May/19 H&M 182.0
3 55625 May/19 McDonalds 15.0
4 55625 May/19 Netflix 31.5
5 55625 NaT McDonalds 6.0
如果在日期时间列中输入数据总是年份解决方案是:
print (df)
account transaction_date transaction description transaction_amt
0 55625 15/may/19 POS:McDonalds $15
1 55625 01/may/19 Netflix $31.5
2 55625 28/may/19 POS:H&M $150
3 55625 6/apr/19 Netflix $9
4 55625 30/may/19 McDonalds $6
5 55625 25/may/19 POS:H&M $32
6 55625 6/mar/19 POS:H&M $32
df['transaction_date'] = df['transaction_date'].str.split('/', n=1).str[1]
df['transaction description'] = df['transaction description'].str.split(':').str[-1]
df['transaction_amt'] = df['transaction_amt'].str.lstrip('$').astype(float)
print (df)
account transaction_date transaction description transaction_amt
0 55625 may/19 McDonalds 15.0
1 55625 may/19 Netflix 31.5
2 55625 may/19 H&M 150.0
3 55625 apr/19 Netflix 9.0
4 55625 may/19 McDonalds 6.0
5 55625 may/19 H&M 32.0
6 55625 mar/19 H&M 32.0
df1 = (df.groupby(['account','transaction_date','transaction description'])['transaction_amt']
.sum()
.reset_index(name='total_spent'))
print (df1)
account transaction_date transaction description total_spent
0 55625 apr/19 Netflix 9.0
1 55625 mar/19 H&M 32.0
2 55625 may/19 H&M 182.0
3 55625 may/19 McDonalds 21.0
4 55625 may/19 Netflix 31.5
关于python - 在python中分组交易描述和计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57306565/