我有一个包含 3 列的数据框,我想根据其他列中给出的值创建第四列。为了创建 new_rank 列,我们从 1 作为所有用户的起点,当 matric_1 大于 15 并且 matric_2 大于 20 时,后续的排名值加 1。
我觉得我需要在 r 中使用 cumsum 函数,但正在努力解决 ifelse 条件。数据帧的代码如下
df<-data.frame(user_id=c("a","a","a","a","b","b","b","c","c","c","c","c","d","d","d","d"),matric_1=c(10,23,4,5,17,5,40,1,2,18,19,5,18,2,19,2),matric_2=c(10,25,10,13,21,10,7,3,4,22,21,4,23,4,21,4),new_rank=c(1,1,2,2,1,2,2,1,1,1,2,3,1,2,2,3))
User_id matric_1 matric_2 new_rank
a 10% 10% 1
a 23% 25% 1
a 4% 10% 2
a 5% 13% 2
b 17% 21% 1
b 5% 10% 2
b 40% 7% 2
c 1% 3% 1
c 2% 4% 1
c 18% 22% 1
c 19% 21% 2
c 5% 6% 3
d 18% 23% 1
d 2% 4% 2
d 19% 21% 2
d 2% 4% 3
最佳答案
按“user_id”分组后,通过获取逻辑向量
的cumsum
的lag
来创建“new_rank1”
library(dplyr)
df %>%
group_by(user_id) %>%
mutate(new_rank1 = lag(cumsum(matric_1 > 15 & matric_2 > 20) + 1, default = 1))
# A tibble: 16 x 5
# Groups: user_id [4]
# user_id matric_1 matric_2 new_rank new_rank1
# <fctr> <dbl> <dbl> <dbl> <dbl>
# 1 a 10.0 10.0 1.00 1.00
# 2 a 23.0 25.0 1.00 1.00
# 3 a 4.00 10.0 2.00 2.00
# 4 a 5.00 13.0 2.00 2.00
# 5 b 17.0 21.0 1.00 1.00
# 6 b 5.00 10.0 2.00 2.00
# 7 b 40.0 7.00 2.00 2.00
# 8 c 1.00 3.00 1.00 1.00
# 9 c 2.00 4.00 1.00 1.00
#10 c 18.0 22.0 1.00 1.00
#11 c 19.0 21.0 2.00 2.00
#12 c 5.00 4.00 3.00 3.00
#13 d 18.0 23.0 1.00 1.00
#14 d 2.00 4.00 2.00 2.00
#15 d 19.0 21.0 2.00 2.00
#16 d 2.00 4.00 3.00 3.00
关于r - 如何根据 r 数据框中的多列条件创建基于排名的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48055962/