我已经使用 ?stats::aggregate
函数实现了一个简单的分组操作。它将每个组的元素收集在一个向量中。我想使用 data.table 包让它更快。但是,我无法使用 data.table 重现所需的行为。
示例数据集:
df <- data.frame(group = c("a","a","a","b","b","b","b","c","c"), val = c("A","B","C","A","B","C","D","A","B"))
使用 data.table 重现的输出:
by_group_aggregate <- aggregate(x = df$val, by = list(df$group), FUN = c)
我尝试过的:
data_t <- data.table(df)
# working, but not what I want
by_group_datatable <- data_t[,j = paste(val,collapse=","), by = group]
# no grouping done when using c or as.vector
by_group_datatable <- data_t[,j = c(val), by = group]
by_group_datatable <- data_t[,j = as.vector(val), by = group]
# grouping leads to error when using as.list
by_group_datatable <- data_t[,j = as.list(val), by = group]
是否可以在 data.table 列中包含不同大小的向量?如果是,我该如何实现?
最佳答案
这是一种方法:
data_t[, list(list(val)), by = group]
# group V1
#1: a A,B,C
#2: b A,B,C,D
#3: c A,B
使用第一个 list()
是因为您要聚合结果。使用第二个 list
是因为您希望将 val 列聚合到每个组的单独列表中。
检查结构:
str(data_t[, list(list(val)), by = group])
#Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables:
# $ group: Factor w/ 3 levels "a","b","c": 1 2 3
# $ V1 :List of 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2
# - attr(*, ".internal.selfref")=<externalptr>
使用 dplyr,您可以执行以下操作:
library(dplyr)
df %>% group_by(group) %>% summarise(val = list(val))
#Source: local data frame [3 x 2]
#
# group val
# (fctr) (chr)
#1 a <S3:factor>
#2 b <S3:factor>
#3 c <S3:factor>
检查结构:
df %>% group_by(group) %>% summarise(val = list(val)) %>% str
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 2 variables:
# $ group: Factor w/ 3 levels "a","b","c": 1 2 3
# $ val :List of 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# ..$ : Factor w/ 4 levels "A","B","C","D": 1 2
关于r - 如何在 data.table 列中放置不同大小的向量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35600416/