python - 交叉引用数据帧以提取Python中的特定值

请协助创建一个Python函数。我有两个数据框，DF1 和 DF2。我想向 DF1 添加一列，DF1['Score']，它基于 DF1 中包含的与 DF2 中的值匹配的值。

DF1:

DF2:

import pandas as pd
DF1 = pd.DataFrame({
     'Age':[25, 54, 33],
     'Income' :[10203, 23822, 84823],
     'Contract Length':[18, 12, 36],
     #'Score':[]
          })

DF2 = pd.DataFrame({
     'variable':['Age', 'Age', 'Age', 'Age',
                 'Income', 'Income', 'Income', 'Income',
                 'Contract Length', 'Contract Length', 'Contract Length', 'Contract Length'],
     'LQ':[ 25, 32.25, 39.5, 46.75, 10203, 28858, 47513, 66168, 12, 18, 24, 30],
     'UQ':[ 32.25, 39.5, 46.75, 54, 28858, 47513, 66168, 84823, 18, 24, 30, 36],
     'Score':[5, 10, 15, 20, 10, 15, 20, 25, 15, 20, 25, 30]
          })

以 DF1 中的客户 UID 1 为例，他今年 25 岁，收入为 10,203，契约(Contract)期限为 18；基于 DF2 我希望能够为客户 1 在 DF1['Score'] 上添加 30 分，计算公式为 5(年龄 25 至 32.5)+ 10(收入 10,2013 至 28,858)+ 15(收入 10,2013 至 28,858)契约(Contract)长度为 12 至 18)。

请协助创建一个 python 函数，为所有客户将正确的分数添加到 DF1['Score']。

最佳答案

您可以使用 pandas pandas.DataFrame.apply迭代第一个数据帧中的行并从第二个数据帧中获取匹配条件行。

创建数据

dict1 = {'customer UID': {0: 1, 1: 2, 2: 3}, 'Age': {0: 25, 1: 54, 2: 33}, 'Income': {0: 10203, 1: 23822, 2: 84823}, 'Contract Length': {0: 18, 1: 12, 2: 36}, 'Score': {0: '', 1: '', 2: ''}}

dict2 = {'variable': {0: 'Age', 1: 'Age', 2: 'Age', 3: 'Age', 4: 'Income', 5: 'Income', 6: 'Income', 7: 'Income', 8: 'Contract Length', 9: 'Contract Length', 10: 'Contract Length', 11: 'Contract Length'}, 'LQ': {0: 25.0, 1: 32.25, 2: 39.5, 3: 46.75, 4: 10203.0, 5: 28858.0, 6: 17513.0, 7: 66168.0, 8: 12.0, 9: 18.0, 10: 24.0, 11: 30.0}, 'UQ': {0: 32.25, 1: 39.5, 2: 46.75, 3: 54.0, 4: 28858.0, 5: 47513.0, 6: 66168.0, 7: 84823.0, 8: 18.0, 9: 24.0, 10: 30.0, 11: 36.0}, 'Score': {0: 5, 1: 10, 2: 15, 3: 20, 4: 10, 5: 15, 6: 20, 7: 25, 8: 15, 9: 20, 10: 25, 11: 30}}

df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

生成输出

def get_values(row):
    age_condition = (row.Age >= df2['LQ']) & (row.Age <= df2['UQ']) & (df2.variable == 'Age')
    income_condition = (row.Income >= df2['LQ']) & (row.Income <= df2['UQ']) & (df2.variable == 'Income')
    contract_condition = (row['Contract Length'] >= df2['LQ']) & (row['Contract Length'] <= df2['UQ']) & (df2.variable == 'Contract Length')
    return df2[age_condition].Score.values[0] + df2[income_condition].Score.values[0] + df2[contract_condition].Score.values[0]

df1['Score'] = df1.apply(get_values, axis=1)

输出:

这给了我们:

df1
   customer UID  Age  Income  Contract Length  Score
0             1   25   10203               18     30
1             2   54   23822               12     45
2             3   33   84823               36     65

关于python - 交叉引用数据帧以提取Python中的特定值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73019689/

python - 交叉引用数据帧以提取Python中的特定值

创建数据

生成输出

输出:

上一篇：python - 创建具有多个层次结构的嵌套字典，以 '.' 分隔

下一篇：python - Pandas - 检查某个值是否出现在前一行中