python - 从多索引 DataFrame 中搜索和处理数据

标签 python python-3.x pandas dataframe multi-index

我有两个数据框 df2,其中包含付款统计信息(客户支付一定债务的可能性)和 df3,其中包含新客户数据。

import pandas as pd

d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'], 
     'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
     'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
     'Client Number':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
     'DebtPaid':[0.8,0.1,0.5,0.30,0,0.2,0.4,1,0.60,1,0.5,0.2,0,0.3,0,0,0.2,0,0.1,0.70,0.5,0.1]}

df = pd.DataFrame(data=d)

df2=df.groupby(['City','Card','Colateral'])['DebtPaid'].\
           value_counts(bins=[-0.001,0,0.25,0.5,0.75,1,1.001,2],normalize=True)
d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'], 
     'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
     'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
     'Client Number':[23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44],
     'Total Debt':[100,240,200,1000,50,20,345,10,600,40,50,20,100,30,100,600,200,200,150,700,50,120]}

df3 = pd.DataFrame(data=d)

我想计算客户支付的预估金额。 前任: 如果客户来自里斯本,有 Visa 和抵押品,它有 0.333333 支付 0% 债务的变化,0.3333% 支付变化 ]0-25%] 债务和 0,3333% 支付 ]0,75-1] % 的债务。 所以如果这个客户有 100 的债务,那么期望值将来自

[(0,33 * 0 * 100)+(0 * 0 * 100)+(0,33 * 0,75 * 100] 到 [(0,33 * 0 * 100 + 0,33 * 0, 25 * 100+0,33 * 1 * 100).

所以这个客户将支付 24.75 欧元到 41.25 欧元。

然后为所有其他客户计算一下。

关于如何解决这个问题的任何想法?

最佳答案

你可以这样做:

#First merge dataframes
df_out = df2.rename('Prob').reset_index().merge(df3, on=['City', 'Card', 'Colateral'])

#Use the right and left attributes of pd.Interval
df_out['lower'] = [x.left for x in df_out['DebtPaid']]
df_out['upper'] = [x.right for x in df_out['DebtPaid']]

#Calculate lower and upper partial prices
df_out['l_partial'] = df_out[['lower', 'Prob', 'Total Debt']].prod(axis=1)
df_out['u_partial'] = df_out[['upper', 'Prob', 'Total Debt']].prod(axis=1)

#Sum partial prices to get lower and upper price grouped on Client Number
df_out.groupby('Client Number')[['l_partial', 'u_partial']]\
      .agg(lower_price=('l_partial', 'sum'),
           upper_price=('u_partial', 'sum')).clip(0,np.inf)

输出:

              lower_price  upper_price
Client Number                          
23               37.500000    62.500000
24                0.000000    15.000000
25               50.000000   100.000000
26              187.000000   375.000000
27                0.000000     3.125000
28                0.000000     5.000000
29              143.750000   230.000000
30                2.493333     4.166667
31              225.000000   375.000000
32               16.666667    26.666667
33               18.750000    31.250000
34                7.500000    12.500000
35                0.000000     6.250000
36                7.500000    15.000000
37               18.700000    37.500000
38                0.000000    37.500000
39                0.000000    50.000000
40               49.866667    83.333333
41               37.400000    62.500000
42              130.900000   262.500000
43               20.833333    33.333333
44               22.440000    45.000000

关于python - 从多索引 DataFrame 中搜索和处理数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62721923/

相关文章:

python - Lambda 函数在 Python 中返回函数名而不是值

python-3.x - 在 pandas 数据框中逐个更改/交换所选行的值

python - 检查Python中数组元素的最大长度

python - to_sql 中可能存在 AttributeError 错误

python - Django DoesNotExist 最佳实践

python - 使用python高效读取foxpro DBF

python - 使用 Pyside 和 Qt Designer 在 QTableView 中创建组合框

python - 为什么有时需要导入 module1.module2 有时不需要?

python - Pandas DataFrame 列的 boolean 掩码

python - 计算时间跨度类次所需的设备