r - 如何在嵌套函数中将所有可能的内容传递给 i、j 和 by?

标签 r data.table non-standard-evaluation

我正在开发一个使用 data.table 的包里面。在这个包中,我有一个函数 count_by它计算 data.table 中特定变量的不同 ID 的数量按组。在一些帮助( R data.table: How to pass "everything possible" to by in a function? )下,我让它按预期工作:

library(data.table)
#> Warning: package 'data.table' was built under R version 3.6.2

# create example data
sample_dt <- data.table(
    id = sort(rep(LETTERS[1:3], 3)),
    year = rep(2018L:2020L),
    x = runif(9)
)
sample_dt[id == "B" & year < 2020, x := NA_real_]

# define inner function
count_by <- function(DT, id_var, val_var, by = NULL) {
    id_var <- as.character(substitute(id_var))
    val_var <- as.character(substitute(val_var))

    eval(substitute(
        DT[!is.na(get(val_var)), .(distinct_ids = uniqueN(get(id_var))), by = by]
    ))
}

# test inner function -> works as expected
(reference <- count_by(sample_dt, id_var = id, val_var = x, by = year))
#>    year distinct_ids
#> 1: 2018            2
#> 2: 2019            2
#> 3: 2020            3

identical(count_by(sample_dt, "id", x, year)       , reference)
#> [1] TRUE
identical(count_by(sample_dt, "id", "x", year)     , reference)
#> [1] TRUE
identical(count_by(sample_dt, "id", x, "year")     , reference)
#> [1] TRUE
identical(count_by(sample_dt, "id", x, c("year"))  , reference)
#> [1] TRUE
identical(count_by(sample_dt, "id", "x", "year")   , reference)
#> [1] TRUE
identical(count_by(sample_dt, "id", "x", c("year")), reference)
#> [1] TRUE
identical(count_by(sample_dt, id, "x", year)       , reference)
#> [1] TRUE
identical(count_by(sample_dt, id, "x", "year")     , reference)
#> [1] TRUE
identical(count_by(sample_dt, id, "x", c("year"))  , reference)
#> [1] TRUE
identical(count_by(sample_dt, id, x, "year")       , reference)
#> [1] TRUE
identical(count_by(sample_dt, id, x, c("year"))    , reference)
#> [1] TRUE

创建于 2020-02-20 由 reprex package (v0.3.0)

现在我想使用函数count_by()在另一个函数中(下面的最小示例):

# define wrapper function
wrapper <- function(data, id_var, val_var, by = NULL) {
    data <- as.data.table(data)
    count_by(data, id_var, val_var, by)
}

# test wrapper function
wrapper(sample_dt, id_var = id, val_var = x, by = year)
#> Error in .(distinct_ids = uniqueN(get("id_var"))): could not find function "."

创建于 2020-02-20 由 reprex package (v0.3.0)

调试count_by()导致观察到如果 count_by()来自 wrapper() , substitute(DT[...])也可以替代 DTdata :

Browse[2]> substitute(
+         DT[!is.na(get(val_var)), .(distinct_ids = uniqueN(get(id_var))), by = by]
+     )
data[!is.na(get("val_var")), .(distinct_ids = uniqueN(get("id_var"))), 
    by = by]

datacount_by()的功能环境下不可用它被评估为 utils::data然后导致错误。这使问题变得清晰,但我想不出解决方案。

我需要替换整个DT[...] by 的表达式正常工作(见 R data.table: How to pass "everything possible" to by in a function?pass variables and names to data.table function )。但我不能用整个表达式来代替 DT没有被取代。

解决这个困境的方法是什么?

最佳答案

将 NSE 排除在外适用于这个特定示例,并且可以大大简化事情。但是你应该将参数作为字符串传递:

count_by <- function(DT, id_var, val_var, by = NULL) {
    DT[!is.na(get(val_var)), .(distinct_ids = uniqueN(get(id_var))), by = by]
}

wrapper <- function(data, id_var, val_var, by = NULL) {
    count_by(data, id_var, val_var, by)
}

wrapper(sample_dt, id_var = "id", val_var = "x", by = "year")

#    year distinct_ids
# 1: 2018            2
# 2: 2019            2
# 3: 2020            3

关于r - 如何在嵌套函数中将所有可能的内容传递给 i、j 和 by?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60315465/

相关文章:

r - dplyr() 中的非标准评估和 quasiquotation 未按(天真)预期工作

r - 使用 tidy 评估选择命名的 [list] 元素

r - 运行 weathercan 包时出错 - 致命 SSL/TLS 警报(例如握手失败))

r - 将计数标签添加到聚类条形图 ggplot2

r - 如何使用前导零连续制作不连续的字符数字序列?

r - 基于交替值的快速排序/过滤

c++ - 将 R 代码转换为 C++ 以进行 Rcpp 实现

R - 在 data.table 中查找每个组的第一个非零元素

r - 什么是 data.frame 可以做而 data.table 不能做的事情?

r - 使用非标准评估迭代某些内部环境中定义的符号