R data.table 按两列分组和迭代

我是 R 新手，正在尝试解决以下问题:

有一个表格，其中有两列books和这些书籍的readers，其中books和readers是图书 ID 和读者 ID 分别:

> books = c (1,2,3,1,1,2)
> readers = c(30, 10, 20, 20, 10, 30)
> bt = data.table(books, readers)
> bt
   books readers
1:     1      30
2:     2      10
3:     3      20
4:     1      20
5:     1      10
6:     2      30

对于每一本书对，我需要使用以下算法计算阅读这两本书的读者数量:

for each book
  for each reader of the book
    for each other_book in books of the reader
      increment common_reader_count ((book, other_book), cnt)

为了实现上述算法，我需要将这些数据分为两个列表:1)包含每本书读者的图书列表，2)读者列表，包含每个读者阅读的书籍，例如:

> bookList = list( 
+ list(1, list(30, 20, 10)),
+         list(2, list(10, 30)),
+         list(3, list(20))
+       )
> 
> readerList = list (
+ list(30, list(1,2)),
+ list(20, list(3,1)),
+ list(10, list(2,1))
+ )
>

问题:

1) 使用什么函数从图书表构建这些列表？

2) 如何从 bookList 和 readerList 生成读过这两本书的读者数量的书籍对？对于上述 bt 图书表，结果应为:

((1, 2), 2)
((1,3), 1)
((2,3), 0)

成对书籍的顺序并不重要，因此，例如 (1,2) 和 (2,1) 应减少为其中之一。

请建议函数和数据结构来解决这个问题。谢谢!

更新:

理想情况下，我需要得到一个矩阵，其中书籍 ID 既作为行又作为列。交集是读过这两本书的读者的计数。因此对于上面的示例矩阵应该是:

books | 1 | 2 | 3 |
   1  | 1 | 2 | 1 |
   2  | 2 | 1 | 0 |
   3  | 1 | 0 | 1 |

   Which means:

   book 1 and 2 are read together by 2 readers 
   book 1 and 3 are read together by 1 reader
   book 2 and 3 are read together by 0 readers

如何构建这样的矩阵？

最佳答案

这是另一种选择:

combs <- combn(unique(books), 2)# Generate combos of books
setkey(bt, books)
both.read <-bt[                 # Cartesian join all combos to our data
  data.table(books=c(combs), combo.id=c(col(combs))), allow.cartesian=T
][,
  .(                            # For each combo, figure out how many readers show up twice, meaning they've read both books
    read.both=sum(duplicated(readers)), 
    book1=min(books), book2=max(books)
  ),
  by=combo.id
]
dcast.data.table(               # dcast to desired format
  both.read, book1 ~ book2, value.var="read.both", fun.aggregate=sum
)

产品:

   book1 2 3
1:     1 2 1
2:     2 0 0

请注意，这仅会进行非等效组合(即，我们不显示书籍 1-2 和 2-1，仅显示书籍 1-2，因为它们是相同的)。

关于R data.table 按两列分组和迭代，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30395073/

R data.table 按两列分组和迭代

上一篇：php - 交换字符串php中的两个单词

下一篇：php - DOMDocument 从 HTML 源中删除脚本标签