r - 使用 dplyr 0.3.02 中的 group_by 对数据帧进行分组后选择列时出错

对 data.frame 进行分组后，我无法选择第二列

d <- data.frame(x = 1:10, y = runif(1))
d[,2] # selects the second column
d <- group_by(d, x)
d[,2] # produces the error: index out of bounds

最佳答案

我认为这是 dplyr 中 grouped_df 对象的预期行为 - 逻辑是在数据仍然分组时不能删除分组变量。考虑这个示例，其中我使用 dplyr 的 select 函数从 grouped_df 中提取变量:

require(dplyr)
d <- data.frame(x = 1:10, y = runif(1), z  = rnorm(2))
d <- group_by(d, x)

select(d, y)  
#Source: local data frame [10 x 2]
#Groups: x
#
#    x         y
#1   1 0.5861766
#2   2 0.5861766
#3   3 0.5861766
#4   4 0.5861766
#5   5 0.5861766
#6   6 0.5861766
#7   7 0.5861766
#8   8 0.5861766
#9   9 0.5861766
#10 10 0.5861766

您可以看到结果包含分组变量，即使在 select 调用中未指定该变量。

select(d, z) # would work the same way

即使您明确排除了分组变量“x”，在使用select时仍然会返回它:

select(d, -x)
#Source: local data frame [10 x 3]
#Groups: x
#
#    x         y         z
#1   1 0.2110696 2.4393919
#2   2 0.2110696 0.8400083
#3   3 0.2110696 2.4393919
#4   4 0.2110696 0.8400083
#5   5 0.2110696 2.4393919
#6   6 0.2110696 0.8400083
#7   7 0.2110696 2.4393919
#8   8 0.2110696 0.8400083
#9   9 0.2110696 2.4393919
#10 10 0.2110696 0.8400083

要仅获取“y”列，您需要先取消数据分组:

ungroup(d) %>% select(y)
#Source: local data frame [10 x 1]
#
#           y
#1  0.5861766
#2  0.5861766
#3  0.5861766
#4  0.5861766
#5  0.5861766
#6  0.5861766
#7  0.5861766
#8  0.5861766
#9  0.5861766
#10 0.5861766

请注意，您可以使用包含分组变量的 [ 的任何子集，例如:

d[, 1:2]

或者

d[, c(1,3)]

关于r - 使用 dplyr 0.3.02 中的 group_by 对数据帧进行分组后选择列时出错，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26969365/

r - 使用 dplyr 0.3.02 中的 group_by 对数据帧进行分组后选择列时出错

上一篇：php - Laravel 3 个模型之间的关系

下一篇：sql - 如何使用django将数据更新到sql数据库中的表中