请协助创建一个Python函数。我有两个数据框,DF1 和 DF2。我想向 DF1 添加一列,DF1['Score'],它基于 DF1 中包含的与 DF2 中的值匹配的值。
DF1:
DF2:
import pandas as pd
DF1 = pd.DataFrame({
'Age':[25, 54, 33],
'Income' :[10203, 23822, 84823],
'Contract Length':[18, 12, 36],
#'Score':[]
})
DF2 = pd.DataFrame({
'variable':['Age', 'Age', 'Age', 'Age',
'Income', 'Income', 'Income', 'Income',
'Contract Length', 'Contract Length', 'Contract Length', 'Contract Length'],
'LQ':[ 25, 32.25, 39.5, 46.75, 10203, 28858, 47513, 66168, 12, 18, 24, 30],
'UQ':[ 32.25, 39.5, 46.75, 54, 28858, 47513, 66168, 84823, 18, 24, 30, 36],
'Score':[5, 10, 15, 20, 10, 15, 20, 25, 15, 20, 25, 30]
})
以 DF1 中的客户 UID 1 为例,他今年 25 岁,收入为 10,203,契约(Contract)期限为 18;基于 DF2 我希望能够为客户 1 在 DF1['Score'] 上添加 30 分,计算公式为 5(年龄 25 至 32.5)+ 10(收入 10,2013 至 28,858)+ 15(收入 10,2013 至 28,858)契约(Contract)长度为 12 至 18)。
请协助创建一个 python 函数,为所有客户将正确的分数添加到 DF1['Score']。
最佳答案
您可以使用 pandas pandas.DataFrame.apply
迭代第一个数据帧中的行并从第二个数据帧中获取匹配条件行。
创建数据
dict1 = {'customer UID': {0: 1, 1: 2, 2: 3}, 'Age': {0: 25, 1: 54, 2: 33}, 'Income': {0: 10203, 1: 23822, 2: 84823}, 'Contract Length': {0: 18, 1: 12, 2: 36}, 'Score': {0: '', 1: '', 2: ''}}
dict2 = {'variable': {0: 'Age', 1: 'Age', 2: 'Age', 3: 'Age', 4: 'Income', 5: 'Income', 6: 'Income', 7: 'Income', 8: 'Contract Length', 9: 'Contract Length', 10: 'Contract Length', 11: 'Contract Length'}, 'LQ': {0: 25.0, 1: 32.25, 2: 39.5, 3: 46.75, 4: 10203.0, 5: 28858.0, 6: 17513.0, 7: 66168.0, 8: 12.0, 9: 18.0, 10: 24.0, 11: 30.0}, 'UQ': {0: 32.25, 1: 39.5, 2: 46.75, 3: 54.0, 4: 28858.0, 5: 47513.0, 6: 66168.0, 7: 84823.0, 8: 18.0, 9: 24.0, 10: 30.0, 11: 36.0}, 'Score': {0: 5, 1: 10, 2: 15, 3: 20, 4: 10, 5: 15, 6: 20, 7: 25, 8: 15, 9: 20, 10: 25, 11: 30}}
df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)
生成输出
def get_values(row):
age_condition = (row.Age >= df2['LQ']) & (row.Age <= df2['UQ']) & (df2.variable == 'Age')
income_condition = (row.Income >= df2['LQ']) & (row.Income <= df2['UQ']) & (df2.variable == 'Income')
contract_condition = (row['Contract Length'] >= df2['LQ']) & (row['Contract Length'] <= df2['UQ']) & (df2.variable == 'Contract Length')
return df2[age_condition].Score.values[0] + df2[income_condition].Score.values[0] + df2[contract_condition].Score.values[0]
df1['Score'] = df1.apply(get_values, axis=1)
输出:
这给了我们:
df1
customer UID Age Income Contract Length Score
0 1 25 10203 18 30
1 2 54 23822 12 45
2 3 33 84823 36 65
关于python - 交叉引用数据帧以提取Python中的特定值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73019689/