r - 在函数中使用dplyr的问题(group_by)

标签 r function plyr dplyr

我想使用dplyr进行一些数据操作。背景:我有一个调查权重和一堆变量(主要是Likert项)。我想对带有或不带有调查权重的每个类别的频率和百分比求和。

例如,让我们只使用频率作为性别变量。结果应该是这样的:

 gender freq    freq.weighted
    1       292     922.2906
    2       279     964.7551
    9         6      21.7338

我将针对许多变量执行此操作。因此,我决定将dplyr代码放入函数中,因此我只需要更改变量并减少类型。
#exampledata
gender<-c("2","2","1","2","2","2","2","2","2","2","2","2","1","1","2","2","2","2","2","2","1","2","2","2","2","2","2","2","2","2")
survey_weight<-c("2.368456","2.642901","2.926698","3.628653","3.247463","3.698195","2.776772","2.972387","2.686365","2.441820","3.494899","3.133106","3.253514","3.138839","3.430597","3.769577","3.367952","2.265350","2.686365","3.189538","3.029999","3.024567","2.972387","2.730978","4.074495","2.921552","3.769577","2.730978","3.247463","3.230097")
test_dataframe<-data.frame(gender,survey_weight)

#function
weighting.function<-function(dataframe,variable){
  test_weighted<- dataframe %>% 
    group_by_(variable) %>% 
    summarise_(interp(freq=count(~weight)),
               interp(freq_weighted=sum(~weight)))
  return(test_weighted)
}

result_dataframe<-weighting.function(test_dataframe,"gender")

#this second step was left out in this example:
#mutate_(perc=interp(~freq/sum(~freq)*100),perc_weighted=interp(~freq_weighted/sum(~freq_weighted)*100))

这导致以下错误消息:
Error in UseMethod("group_by_") : 
  no applicable method for 'group_by_' applied to an object of class "formula" 

我尝试了很多不同的东西。首先,我使用freq=n()来计数频率,但是我总是遇到一个错误(我检查过,plyr是在dplyr之前加载的,而不是在之后加载的-它也没有用。)

有任何想法吗?我阅读了有关标准评估的小插图。但是,我总是遇到问题,不知道该怎么解决。

最佳答案

我认为您有一些嵌套的错误会给您带来麻烦。最大的一种是使用count()而不是summarise()。我猜你想n():

weighting.function <- function(dataframe, variable){
  dataframe %>% 
    group_by_(variable) %>% 
    summarise_(
      freq = ~n(),
      freq_weighted = ~sum(survey_weight)
    )
}

weighting.function(test_dataframe, ~gender)

您还对interp()进行了一些不必要的使用。如果您确实使用interp(),则调用应该看起来像freq = interp(~n()),即名称在对interp的调用之外,并且要内插的事物以~开头。

关于r - 在函数中使用dplyr的问题(group_by),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28157919/

相关文章:

r - 使用 dplyr 从包含多个物种、处理和变量的数据框中计算百分比

R中矩形括号的正则表达式

oop - 在 R 编程中使用 S4 方法何时会带来返回

python - 从变量名列表构建函数签名

actionscript-3 - AS3 : How to force parameters in callback functions

r - 基于较大 Data.Frame 的多个子集创建多个列表

python - 等效于 Python/pandas 中 R/ddply 中的转换?

r - unscale 和 uncenter glmer 参数

r - Geomnet 包 - 用户定义的坐标错误

java - Java 中 ORACLE 函数 MONTHS_BETWEEN 的模拟