r - 如何在 data.table 列中放置不同大小的向量

标签 r data.table

我已经使用 ?stats::aggregate 函数实现了一个简单的分组操作。它将每个组的元素收集在一个向量中。我想使用 data.table 包让它更快。但是,我无法使用 data.table 重现所需的行为。

示例数据集:

df <- data.frame(group = c("a","a","a","b","b","b","b","c","c"), val = c("A","B","C","A","B","C","D","A","B"))

使用 data.table 重现的输出:

by_group_aggregate <- aggregate(x = df$val, by = list(df$group), FUN = c)

我尝试过的:

data_t <- data.table(df)
# working, but not what I want
by_group_datatable <- data_t[,j = paste(val,collapse=","), by = group]
# no grouping done when using c or as.vector
by_group_datatable <- data_t[,j = c(val), by = group]
by_group_datatable <- data_t[,j = as.vector(val), by = group]
# grouping leads to error when using as.list
by_group_datatable <- data_t[,j = as.list(val), by = group]

是否可以在 data.table 列中包含不同大小的向量?如果是,我该如何实现?

最佳答案

这是一种方法:

data_t[, list(list(val)), by = group]
#   group      V1
#1:     a   A,B,C
#2:     b A,B,C,D
#3:     c     A,B

使用第一个 list() 是因为您要聚合结果。使用第二个 list 是因为您希望将 val 列聚合到每个组的单独列表中。

检查结构:

str(data_t[, list(list(val)), by = group])
#Classes ‘data.table’ and 'data.frame': 3 obs. of  2 variables:
# $ group: Factor w/ 3 levels "a","b","c": 1 2 3
# $ V1   :List of 3
#  ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3
#  ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
#  ..$ : Factor w/ 4 levels "A","B","C","D": 1 2
# - attr(*, ".internal.selfref")=<externalptr> 

使用 dplyr,您可以执行以下操作:

library(dplyr)
df %>% group_by(group) %>% summarise(val = list(val))
#Source: local data frame [3 x 2]
#
#   group         val
#  (fctr)       (chr)
#1      a <S3:factor>
#2      b <S3:factor>
#3      c <S3:factor>

检查结构:

df %>% group_by(group) %>% summarise(val = list(val)) %>% str
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  3 obs. of  2 variables:
# $ group: Factor w/ 3 levels "a","b","c": 1 2 3
# $ val  :List of 3
#  ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3
#  ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 4
#  ..$ : Factor w/ 4 levels "A","B","C","D": 1 2

关于r - 如何在 data.table 列中放置不同大小的向量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35600416/

相关文章:

r - data.table用多列均值和ID替换NA

r - 在另一个绘图的绘图区域内添加小直方图

r - 如何从R中聚合结果中删除多余的值

删除带有或不带有NA的常量列

r - 如何解决R Studio中的 'protection stack overflow'问题

r - 学习data.table - 如何按行号和列名更新值

r - 扩大数据框以获取 R 中分类列的所有唯一值的每月收入总和

r - 悬停格式仅适用于 $ 符号,不适用于其他货币?

r - 如何编辑 SparkDataFrame 的架构?

r - rmarkdown - 渲染数据帧列表