r - 在 R 3.4.3 和 R 3.6.0 之间更改存储在 data.table 中的闭包行为

当我从 R 3.4.3 升级到 R 3.6.0(两者都使用 data.table 1.12.6)时，我注意到了以下特殊行为。在 3.4.3 中，下面的代码导致 all.equal 语句为 TRUE，但在 3.6.0 中存在平均相对差异，因为即使我们试图访问从组“a”计算的 approxfun，使用组“b”中的值(可能是由于惰性求值)。在 3.6.0 中，可以通过在基于此问题的 approxfun 调用中添加 copy 语句来解决此问题:
Handling of closures in data.table

令我着迷的是我在 3.4.3 中没有收到错误消息。知道发生了什么变化吗？

library(data.table)
data <- data.table(
  group = c(rep("a", 4), rep("b", 4)),
  x = rep(c(.02, .04, .12, .21), 2),
  y = c(
    0.0122, 0.01231, 0.01325, 0.01374, 0.01218, 0.01229, 0.0133, 0.01379)
)

dtFuncs <- data[ , list(
  func = list(stats::approxfun(x, y, rule = 2))
), by = group]

f <- function(group, x) {
  dtResults <- CJ(group = group, x = x)
  dtResults <- dtResults[ , {
   .g <- group
    f2 <- dtFuncs[group == .g, func][[1]]
    list(x = x, y = f2(x))
  }, by = group] 
  dtResults
}

x0 <- .07
g <- "a"
all.equal(
  with(data[group == g], approx(x, y, x0, rule = 2)$y),
  f(group = g, x = x0)$y
)

最佳答案

在 r-source 上运行 git bisect 后，我能够推断出是这个提交导致了这种行为:https://github.com/wch/r-source/commit/adcf18b773149fa20f289f2c8f2e45e6f7b0dbfe

根本上发生的情况是，在 approxfun 中对 x 进行排序的情况下，不再制作内部副本。如果数据被随机排序，代码将继续工作! (见下面的片段)

给我的教训是，最好不要将复杂的对象与 data.table 混合使用，因为每个“by”组反复使用相同的环境(或者对 data.table::copy 非常谨慎)

## should be run under R > 3.6.0 to see disparity
library(data.table)

## original sorted x (does not work)
data <- data.table(
  group = c(rep("a", 4), rep("b", 4)),
  x = rep(c(.02, .04, .12, .21), 2),
  y = c(
    0.0122, 0.01231, 0.01325, 0.01374, 0.01218, 0.01229, 0.0133, 0.01379)
)

dtFuncs <- data[ , {
    print(environment())
    list(
        func = list(stats::approxfun(x, y, rule = 2))
    )
}, by = group]

f <- function(group, x) {
  dtResults <- CJ(group = group, x = x)
  dtResults <- dtResults[ , {
   .g <- group
    f2 <- dtFuncs[group == .g, func][[1]]
    list(x = x, y = f2(x))
  }, by = group] 
  dtResults
}

get("y", environment(dtFuncs$func[[1]]))
get("y", environment(dtFuncs$func[[2]]))

x0 <- .07
g <- "a"
all.equal(
  with(data[group == g], approx(x, y, x0, rule = 2)$y),
  f(group = g, x = x0)$y
)

## unsorted x (works)
data <- data.table(
  group = c(rep("a", 4), rep("b", 4)),
  x = rep(c(.02, .04, .12, .21), 2),
  y = c(
    0.0122, 0.01231, 0.01325, 0.01374, 0.01218, 0.01229, 0.0133, 0.01379)
)
set.seed(10)
data <- data[sample(1:.N, .N)]
dtFuncs <- data[ , {
    print(environment())
    list(
        func = list(stats::approxfun(x, y, rule = 2))
    )
}, by = group]

f <- function(group, x) {
  dtResults <- CJ(group = group, x = x)
  dtResults <- dtResults[ , {
   .g <- group
    f2 <- dtFuncs[group == .g, func][[1]]
    list(x = x, y = f2(x))
  }, by = group] 
  dtResults
}

get("y", environment(dtFuncs$func[[1]]))
get("y", environment(dtFuncs$func[[2]]))

x0 <- .07
g <- "a"
all.equal(
  with(data[group == g], approx(x, y, x0, rule = 2)$y),
  f(group = g, x = x0)$y
)

## better approach: maybe safer to avoid mixing objects treated by reference
## (data.table & closures) all together...
fList <- lapply(split(data, by = "group"), function(x){
    with(x, stats::approxfun(x, y, rule = 2))
})
fList
fList[[1]](.07) != fList[[2]](.07)

关于r - 在 R 3.4.3 和 R 3.6.0 之间更改存储在 data.table 中的闭包行为，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59013362/

r - 在 R 3.4.3 和 R 3.6.0 之间更改存储在 data.table 中的闭包行为

上一篇：typescript - 如何映射联合数组类型？

下一篇：android - 如何在 ViewPager2 中按标签获取 fragment