在dplyr 0.5.0
中,在分组数据帧上调用summarise
并不能保证任何结果行顺序(目前,它按组对行重新排序,不确定如何它处理重复的分组级别)。
为了解决这个问题,我想用 mutate(x = ...) %>% filter(row_number() 替换所有
。这样做有什么缺点或缺点吗?summarise(x = ...)
操作== 1)
两个操作的示例。
tmp_df <-
data.frame(group = rep(c(2L, 1L), each = 5), b = rep(c(-1, 1), each = 5)) %>%
group_by(group)
tmp_df %>%
summarise(b = sum(b))
tmp_df %>%
mutate(b = sum(b)) %>%
filter(row_number() == 1)
制作:
> tmp_df %>%
+ summarise(b = sum(b))
# A tibble: 2 × 2
group b
<int> <dbl>
1 1 5
2 2 -5
> tmp_df %>%
+ mutate(b = sum(b)) %>%
+ filter(row_number() == 1)
Source: local data frame [2 x 2]
Groups: group [2]
group b
<int> <dbl>
1 2 -5
2 1 5
编辑:为了响应评论,为了便于阅读,我可以定义该函数:
summarise_o <- function (.data, ...) {
# order preserving summarise
mutate_(.data, .dots = lazyeval::lazy_dots(...)) %>%
filter(row_number() == 1) %>%
return
}
只需调用:
tmp_df %>%
summarise_o(b = sum(b))
最佳答案
一个选项是将“组”创建为因素
tmp_df <- data.frame(group = rep(c(2L, 1L), each = 5), b = rep(c(-1, 1), each = 5)) %>%
group_by(group = factor(group, levels = unique(group)))
tmp_df %>%
summarise(b = sum(b))
# A tibble: 2 x 2
# group b
# <fctr> <dbl>
#1 2 -5
#2 1 5
关于r - 在分组数据帧上进行变异+过滤与汇总相比有什么缺点吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44193543/