假设我在 R 中有这个数据框 df:
UserID <- c(1, 1, 1, 5, 5, 7, 7, 9, 9, 9)
PathID <- c(1,2,3,1,2,1,2,1,2,3)
Page <- c("home", "about", "services", "home", "pricing", "pricing", "home", "about", "home", "services")
df <- data.frame(UserID, PathID, Page)
我想添加一个名为“Set”的列,它是序列组合的索引。
所以,我的输出应该是这样的:
UserID <- c(1, 1, 1, 5, 5, 7, 7, 9, 9, 9)
PathID <- c(1,2,3,1,2,1,2,1,2,3)
Page <- c("home", "about", "services", "home", "pricing", "pricing", "home", "about", "home", "services")
Set <- c(1,1,1,2,2,2,2,1,1,1)
df1 <- data.frame(UserID, PathID, Page, Set)
我真的很感激这里的一些帮助。
最佳答案
data.table
选项使用 as.factor
> setDT(df)[, Set := toString(sort(Page)), UserID][, Set := as.integer(as.factor(Set))][]
UserID PathID Page Set
1: 1 1 home 1
2: 1 2 about 1
3: 1 3 services 1
4: 5 1 home 2
5: 5 2 pricing 2
6: 7 1 pricing 2
7: 7 2 home 2
8: 9 1 about 1
9: 9 2 home 1
10: 9 3 services 1
类似的基础 R 实现是
> transform(df, Set = as.integer(as.factor(ave(Page,UserID,FUN = function(x) toString(sort(x))))))
UserID PathID Page Set
1 1 1 home 1
2 1 2 about 1
3 1 3 services 1
4 5 1 home 2
5 5 2 pricing 2
6 7 1 pricing 2
7 7 2 home 2
8 9 1 about 1
9 9 2 home 1
10 9 3 services 1
关于r - R中序列的索引组合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73499304/