pandas - 根据 pandas 中的顺序分配分数

以下是我拥有的数据框

score_df

col1_id col2_id score
1 2 10
5 6 20

records_df

date col_id 
D1    6
D2    4
D3    1
D4    2
D5    5
D6    7

我想根据以下标准计算分数:

当 2 出现在 1 之后时，分数应指定为 10，或者当 1 出现在 2 之后时，分数应指定为 10。

即当 (1,2) 给出分数 10 时 .. (2,1) 也得到相同的分数 10。

考虑 (1,2) 。当 1 第一次出现时，我们不分配分数。我们标记该行并等待 2 发生。当该列中出现 2 时，我们给出分数 10。

考虑(2,1)。当 2 先出现时。我们分配值 0 并等待 1 发生。当 1 出现时，我们给分 10。

所以，第一次——不要分配分数，等待相应事件发生后再分配分数

所以，我的结果数据框应该看起来像这样

结果

date col_id score
D1    6     0 -- Eventhough 6 is there in score list, it occured for first time. So 0
D2    4     0 -- 4 is not even there in list
D3    1     0 -- 1 occurred for first time . So 0
D4    2     10 -- 1 occurred previously. 2 occurred now.. we can assign 10. 
D5    5     20 -- 6 occurred previously. we can assign 20
D6    7     0 -- 7 is not in the list

我在 Score_df 和 record_df 中都有大约 100k 行。循环和分配分数需要时间。有人可以在不循环整个数据帧的情况下帮助处理逻辑吗？

最佳答案

据我了解，你可以尝试melt用于取消旋转，然后 merge 。保持融化的 df 的索引，我们检查 where索引是 duplicated ，然后从合并中返回分数，否则 0。

m = score_df.reset_index().melt(['index','uid','score'],
                              var_name='col_name',value_name='col_id')

final = records_df.merge(m.drop('col_name',1),on=['uid','col_id'],how='left')

c = final.duplicated(['index']) & final['index'].notna()
final = final.drop('index',1).assign(score=lambda x: x['score'].where(c,0))

print(final)

   uid date  col_id  score
0  123   D1       6    0.0
1  123   D2       4    0.0
2  123   D3       1    0.0
3  123   D4       2   10.0
4  123   D5       5   20.0
5  123   D6       7    0.0

关于pandas - 根据 pandas 中的顺序分配分数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60706956/

pandas - 根据 pandas 中的顺序分配分数

上一篇：sql - 确定 2 个用户是否已经进行私有(private)聊天

下一篇：r - R 中的 Yearweek 解析错误