r - dplyr 变异 : create column using first occurrence of another column

我想知道是否有一种更优雅的方式来获取数据框，按 x 分组查看数据集中出现了多少个 x，然后进行变异以找到每个 x 第一次出现的位置 ( y )

test <- data.frame(x = c("a", "b", "c", "d", 
                         "c", "b", "e", "f", "g"),
                   y = c(1,1,1,1,2,2,2,2,2))

电流输出

output <- test %>% 
  group_by(x) %>%
  summarise(count = n())

  x     count
  <fct> <int>
1 a         1
2 b         2
3 c         2
4 d         1
5 e         1
6 f         1
7 g         1

期望输出

  x     count first_seen
  <fct> <int> <dbl>
1 a         1     1
2 b         2     1
3 c         2     1
4 d         1     1
5 e         1     2
6 f         1     2
7 g         1     2

我可以过滤 test第一次出现的数据框然后使用 left_join 但希望有一个使用 mutate 的更优雅的解决方案？

# filter for first occurrences of y
right <- test %>% 
  group_by(x) %>% 
  filter(y == min(y)) %>% 
  slice(1) %>%
  ungroup()

# bind to the output dataframe
left_join(output, right, by = "x")

最佳答案

我们可以使用 first在按 'x' 分组以创建新列后，也在 group_by 中使用它并使用 n() 获取计数

library(dplyr)
test %>% 
   group_by(x) %>%
   group_by(first_seen = first(y), add = TRUE) %>% 
   summarise(count = n())
# A tibble: 7 x 3
# Groups:   x [7]
#  x     first_seen count
#  <fct>      <dbl> <int>
#1 a              1     1
#2 b              1     2
#3 c              1     2
#4 d              1     1
#5 e              2     1
#6 f              2     1
#7 g              2     1

关于r - dplyr 变异 : create column using first occurrence of another column，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59199189/

上一篇：r - 将 'multifeature' GeoJSON 的特征转换为 R 空间对象

下一篇：r - 我可以列出按日期排序的对象吗？

r - check() 新的 R 包

dplyr 的相对频率/比例

r - 如何将此函数应用于 r 中数据框中的多个元素？

r - 如何在 dplyr mutate 中使用由列遮蔽的变量

r - 无法在 dplyr 中使用多字变量，还是我遗漏了什么？

regex - R:如何从数据框中提取一些大数字，而不是其他数字

r - ggvis - 显示为 'NaN' 的自定义非数字 y 轴标签

替换数据框列表中的列值

r - 比使用 for 循环更快的编码