r - Advanced R 中修改列表的示例

标签 r list dataframe for-loop memory

我似乎无法理解以下 example in Advanced R .

x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))

y <- as.list(x)
cat(tracemem(y), "\n")
#> <0x7f80c5c3de20>

for (i in 1:5) {
  y[[i]] <- y[[i]] - medians[[i]]
}
#> tracemem[0x7f80c5c3de20 -> 0x7f80c48de210]: 
我不明白为什么在这种情况下会制作副本,因为“如果一个对象绑定(bind)了一个名称,R 将就地修改它”以及 y 引用的对象确实只有一个名字y绑定(bind)到它。

最佳答案

虽然关于 RStudio 引用的评论可能是正确的,但这本书似乎已经过时了。
last commit该页面的源代码上的日期为 2019-06-25——该日期早于 R v4.0.0 的发布。
如果您查看 change log for R ,您会发现 v4.0.0 中列出了以下更改:

Reference counting is now used instead of the NAMED mechanism for determining when objects can be safely mutated in base C code. This reduces the need for copying in some cases and should allow further optimizations in the future. It should help make the internal code easier to maintain.


R v3.6.3
实际上,如果您在 R v3.6.3(v4.0.0 之前的版本)下运行示例代码:
#> R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
#> Copyright (C) 2020 The R Foundation for Statistical Computing
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> 
#> R is free software and comes with ABSOLUTELY NO WARRANTY.
#> You are welcome to redistribute it under certain conditions.
#> Type 'license()' or 'licence()' for distribution details.
#> 
#>   Natural language support but running in an English locale
#> 
#> R is a collaborative project with many contributors.
#> Type 'contributors()' for more information and
#> 'citation()' on how to cite R or R packages in publications.
#> 
#> Type 'demo()' for some demos, 'help()' for on-line help, or
#> 'help.start()' for an HTML browser interface to help.
#> Type 'q()' to quit R.

x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))

for (i in seq_along(medians)) {
  x[[i]] <- x[[i]] - medians[[i]]
}

cat(tracemem(x), "\n")
#> <000000002457F7D0> 

for (i in 1:5) {
  x[[i]] <- x[[i]] - medians[[i]]
}
#> tracemem[0x000000002457f7d0 -> 0x0000000024697c90]: 
#> tracemem[0x0000000024697c90 -> 0x0000000024697c20]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697c20 -> 0x0000000024697bb0]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697bb0 -> 0x0000000024697b40]: 
#> tracemem[0x0000000024697b40 -> 0x0000000024697ad0]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697ad0 -> 0x0000000024697a60]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697a60 -> 0x00000000246979f0]: 
#> tracemem[0x00000000246979f0 -> 0x0000000024697980]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697980 -> 0x0000000024697910]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697910 -> 0x00000000246978a0]: 
#> tracemem[0x00000000246978a0 -> 0x0000000024697830]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697830 -> 0x00000000246977c0]: [[<-.data.frame [[<- 
#> tracemem[0x00000000246977c0 -> 0x0000000024697750]: 
#> tracemem[0x0000000024697750 -> 0x00000000246976e0]: [[<-.data.frame [[<- 
#> tracemem[0x00000000246976e0 -> 0x0000000024697670]: [[<-.data.frame [[<- 

untracemem(x)

y <- as.list(x)
cat(tracemem(y), "\n")
#> <0000000024697600> 
 
for (i in 1:5) {
  y[[i]] <- y[[i]] - medians[[i]]
}
#> tracemem[0x0000000024697600 -> 0x00000000247ec708]:

untracemem(y)
根据本书,我们观察到为数据框制作了 15 份副本,为列表制作了一份副本。
R v4.0.0
但是,如果我们在 R v4.0.0 下运行相同的示例代码:
#> R version 4.0.0 (2020-04-24) -- "Arbor Day"
#> Copyright (C) 2020 The R Foundation for Statistical Computing
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> 
#> R is free software and comes with ABSOLUTELY NO WARRANTY.
#> You are welcome to redistribute it under certain conditions.
#> Type 'license()' or 'licence()' for distribution details.
#> 
#>   Natural language support but running in an English locale
#> 
#> R is a collaborative project with many contributors.
#> Type 'contributors()' for more information and
#> 'citation()' on how to cite R or R packages in publications.
#> 
#> Type 'demo()' for some demos, 'help()' for on-line help, or
#> 'help.start()' for an HTML browser interface to help.
#> Type 'q()' to quit R.

x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))

for (i in seq_along(medians)) {
  x[[i]] <- x[[i]] - medians[[i]]
}

cat(tracemem(x), "\n")
#> <00000000236B0C50> 

for (i in 1:5) {
  x[[i]] <- x[[i]] - medians[[i]]
}
#> tracemem[0x00000000236b0c50 -> 0x00000000237a7a90]: 
#> tracemem[0x00000000237a7a90 -> 0x00000000237a7a20]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7a20 -> 0x00000000237a79b0]: 
#> tracemem[0x00000000237a79b0 -> 0x00000000237a7940]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7940 -> 0x00000000237a78d0]: 
#> tracemem[0x00000000237a78d0 -> 0x00000000237a7860]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7860 -> 0x00000000237a77f0]: 
#> tracemem[0x00000000237a77f0 -> 0x00000000237a7780]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7780 -> 0x00000000237a7710]: 
#> tracemem[0x00000000237a7710 -> 0x00000000237a76a0]: [[<-.data.frame [[<- 

untracemem(x)

y <- as.list(x)
cat(tracemem(y), "\n")
#> <00000000237A7630> 

for (i in 1:5) {
  y[[i]] <- y[[i]] - medians[[i]]
}

untracemem(y)
我们观察了更改对减少执行副本数量的影响。数据帧的副本已从 15 变为 10,并且不再为列表执行副本。
为了直接回答 OP 的问题,根据 NAMED 机制不必要地制作了副本。但是,R v4.0.0 中对引用计数的更改防止了不必要的复制,并且对象现在按预期进行了修改。

关于r - Advanced R 中修改列表的示例,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61844416/

相关文章:

r - na.fill 显示意外结果的日期

c# - 为什么在循环内使用 List.IndexOf(List.Where()) 时会出现性能问题?

java - 如何从列表中删除比指定元素大 "lower"的元素

c - 图表可以实现为列表吗?

r - 如何在不重新启动 R 的情况下禁用包?

r - 在 R 中绘制随机森林模型的重要性变量

python - 在 Python 中, Pandas 。如何通过 WOM - 'Week of the Month' 对数据框进行子集化?

python-2.7 - 如何使用 pandas 计算时间序列的扩展平均值?

r - 在 sqldf 中将整数值转换为日期时间

r - 带有 ggplot map 的 Shiny 应用程序 - 多边形颜色与用户输入不匹配