r - ddply : push or pull?

ddply在分组数据时是推还是拉？即，它是否涉及对数据帧的多次传递，还是仅一次？

最佳答案

如果你看一下代码，你会看到函数的一般结构:

function (.data, .variables, .fun = NULL, ..., .progress = "none", 
    .drop = TRUE, .parallel = FALSE) 
{
    .variables <- as.quoted(.variables)
    pieces <- splitter_d(.data, .variables, drop = .drop)
    ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, 
        .parallel = .parallel)
}
<environment: namespace:plyr>

所以它基本上以更易于使用的格式重新排列变量，然后将数据分成几部分，然后对这些部分使用 ldply。这些片段是由函数 splitter_d 生成的。 Pieces 实际上比列表更复杂一些 - 它是一个指向原始数据的指针和一个索引列表。每当您请求列表中的一部分时，它都会查找匹配的索引并提取适当的数据。这避免了数据的多个副本四处 float 。您可以使用 getAnywhere("splitter_d") 或 plyr:::splitter_d 查看其功能。

ldply 对每条数据传递一次。之后，它将所有内容组合回数据帧中。其实ldply的帮助文件里是这么写的:

All plyr functions use the same split-apply-combine strategy: they split the input into simpler pieces, apply .fun to each piece, and then combine the pieces into a single data structure. This function splits lists by elements and combines the result into a data frame. If there are no results, then this function will return a data frame with zero rows and columns (data.frame()).

我自己也说不出更好的说法。奇迹是，第一句话也可以在 ddply 的帮助页面上找到。

关于r - ddply : push or pull?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4191296/

r - ddply : push or pull?

上一篇：c#-4.0 - 需要一种方法在 C# 中运行时获取 msi 安装程序使用的版本号，而不知道用于安装的 msi 文件的位置

下一篇：SQL Server 2005 表值函数奇怪的性能