任何人都可以解释为什么在下面的 R 语句中 playerID
之前会有一个句点吗?
dataframe.AB<-ddply(Batting, .(playerID), summarize, Career.AB=sum(AB, na.rm=TRUE))
我已经看到该参数通过 c(...) 函数传递了一些内容...不确定这里发生了什么。我应该提到 playerID
是 Batting
数据帧中的一个变量。
谢谢。
最佳答案
仅当您想使用表达式(例如 .(playerID + 1)
)时,这才是真正必要的。 .(
函数告诉 ddply 在数据上下文中捕获并计算表达式(在您的情况下为 Batting
)。如果您只想按未修改的列进行分组,您可以可以将该列的名称作为字符向量传递(例如 "playerID"
或 c("playerID", "someOtherColumnName")
以按多个列进行分组)。我的评论,来自 Hadley's vignette 对 plyr (p6-7) 的评论:
When operating on a data frame, you usually want to split it up into groups based on combinations of variables in the data set. For d*ply you specify which variables (or functions of variables) to use. These variables are specified in a special way to highlight that they are computed first from the data frame, then the global environment (in which case it is your responsibility to ensure that their length is equal to the number of rows in the data frame).
.(var1) will split the data frame into groups defined by the value of the var1 variable. If you use multiple variables, .(a, b, c), the groups will be formed by the interaction of the variables, and output will be labelled with all three variables...
You can also use functions of variables: .(round(a)), .(a * b). When outputting to a data frame, ugly names (produced by make.names()) may result, but you can override them by specifying names in the call: .(product = a * b).
Alternatively, you can use two more familiar ways of describing the splits: As a character vector of column names: c("var1", "var2").
关于r - R 中关于 ddply 的语法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21649919/