r - 意外的 dplyr::bind_rows() 行为

简短版本:

我遇到了 dplyr::bind_rows() 错误，我不明白。我想根据某些条件(例如 a == 1)拆分数据，对一部分进行操作(例如 b = b * 10)，然后将其绑定(bind)回另一部分在单个管道链中使用 dplyr::bind_rows() 。如果我明确地向这两个部分提供第一个输入，它工作得很好，但如果我用 . 将它们输入，它会提示 agrument 2 的数据类型。

这是该问题的 MRE:

library(tidyverse)

# sim data
d <- tibble(a = 1:4, b = 1:4)

# works when 'd' is supplied directly to bind_rows()
bind_rows(d %>% filter(a == 1),
          d %>% filter(!a == 1) %>% mutate(b = b * 10))
#> # A tibble: 4 x 2
#>       a     b
#>   <int> <dbl>
#> 1     1     1
#> 2     2    20
#> 3     3    30
#> 4     4    40


# fails when 'd' is piped in to bind_rows()
d %>%
  bind_rows(. %>% filter(a == 1),
            . %>% filter(!a == 1) %>% mutate(b = b * 10))
#> Error: Argument 2 must be a data frame or a named atomic vector.

长版本:

如果我捕获 bind_rows() 调用获取的内容作为 list() 输入，我可以看到发生了两件意想不到的(对我来说)事情.

它似乎没有评估我提供的管链，而是将它们捕获为 functional sequence .
我可以看到，除了两个显式参数之外，还无形地提供了输入 (.)，因此我在列表中得到了 3 个项目，而不是 2 个。

# capture intermediate values for diagnostics
d %>%
  list(. %>% filter(a == 1),
            . %>% filter(!a == 1) %>% mutate(b = b * 10))
#> [[1]]
#> # A tibble: 4 x 2
#>       a     b
#>   <int> <int>
#> 1     1     1
#> 2     2     2
#> 3     3     3
#> 4     4     4
#> 
#> [[2]]
#> Functional sequence with the following components:
#> 
#>  1. filter(., a == 1)
#> 
#> Use 'functions' to extract the individual functions. 
#> 
#> [[3]]
#> Functional sequence with the following components:
#> 
#>  1. filter(., !a == 1)
#>  2. mutate(., b = b * 10)
#> 
#> Use 'functions' to extract the individual functions.

这导致我得到以下不优雅的解决方案，我通过管道到内部函数来解决第一个问题，这似乎强制正确评估(出于我不明白的原因)，然后通过子集 来解决第二个问题在执行 bind_rows() 操作之前列出。

# hack solution to force eval and clean duplicated input
d %>%
  list(filter(., a == 1),
       filter(., !a == 1) %>% mutate(b = b * 10)) %>%
  .[-1] %>% 
  bind_rows()
#> # A tibble: 4 x 2
#>       a     b
#>   <int> <dbl>
#> 1     1     1
#> 2     2    20
#> 3     3    30
#> 4     4    40

^{由 reprex package 于 2022 年 1 月 24 日创建(v2.0.1)}

看起来可能与 this 有关问题，但我不太明白如何。如果能够理解为什么会发生这种情况，并找到一种编码方式，而无需分配中间变量或执行这种奇怪的黑客操作来子集中间列表，那就太好了。

编辑:

知道这与大括号 ({}) 有关，使我能够找到一些更有用的链接: 1 , 2 , 3

最佳答案

如果我们想使用 .，则使用作用域运算符 ({}) 阻止它

library(dplyr)
d %>%
   {
  bind_rows({.} %>% filter(a == 1),
            {.} %>% filter(!a == 1) %>% mutate(b = b * 10))
   }

-输出

# A tibble: 4 × 2
      a     b
  <int> <dbl>
1     1     1
2     2    20
3     3    30
4     4    40

关于r - 意外的 dplyr::bind_rows() 行为，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/70840077/

r - 意外的 dplyr::bind_rows() 行为

简短版本:

长版本:

编辑:

上一篇：amazon-web-services - 获取 AWS 的 MFA API 凭证

下一篇：python - 为什么范围对象不能用于索引列表？