我拥有有关客户及其访问过的商店(至少一次)的数据。
Customer | Store
1 A
1 B
2 A
2 C
3 A
4 A
4 B
4 C
我想知道有多少用户访问了每2 家商店的组合。
如何转换先前的数据结构(使用 R)以获得以下结构?
Store 1 | Store 2 | Nb_Customer
A B 2 (Customer 1 & 4 visited store A & B )
A C 2 (Customer 2 & 4 visited store A & C)
编辑 关于 Henrik 的解决方案:如您所见,我对配对有疑问。
# number of visits for each customer in each store
> df <- data.frame(Customer=c(1,1,2,2,3,4,4,4), Store=c('A', 'B', 'A', 'C', 'A', 'A', 'B', 'C'))
> # number of visits for each customer in each store
> tt <- with(df, table(df$Customer, df$Store))
> tt
A B C
1 1 1 0
2 1 0 1
3 1 0 0
4 1 1 1
>
> # number of stores
> n <- with(df, length(unique(df$Store)))
> n
[1] 3
>
> # all pairs of column numbers, to be selected from the table tt
> cols <- with(df, combn(n, 2))
> cols
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 3 3
>
> # pairs of stores
> pair <- t(with(df, combn(unique(df$Store), 2)))
> pair
[,1] [,2]
[1,] "A" "B"
[2,] "1" "3"
[3,] "2" "3"
最佳答案
另一种可能性:
# number of visits for each customer in each store
tt <- with(df, table(Customer, Store))
tt
# number of stores
n <- with(df, length(unique(Store)))
n
# all pairs of column numbers, to be selected from the table tt
cols <- with(df, combn(n, 2))
cols
# pairs of stores
pair <- t(with(df, combn(unique(Store), 2)))
pair
# select pairs of columns from tt
# count number of rows for which each customer has visited more than one store
# combine the counts with names of stores from 'pairs' to a data frame
ll <- lapply(seq(ncol(cols)), function(x){
tt2 <- tt[ , cols[ , x]]
n_cust <- sum(rowSums(tt2) > 1)
data.frame(store1 = pair[x, 1], store2 = pair[x, 2], n_cust = n_cust)
})
ll
# convert list to data frame
df2 <- do.call(rbind, ll)
df2
# store1 store2 n_cust
# 1 A B 2
# 2 A C 2
# 3 B C 1
关于r - 按组合划分的出现频率(2 × 2),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21379992/