r - 将描述性统计值行从 R 导出到 Excel 工作表

标签 r excel

我有一个大型数据库,其中包含 100 多家不同公司的 85,000 多个值,并标记了 100 多个变量。我的目标是确定与多个变量相对应的描述性统计数据(平均值、标准差、最小值、最大值和值的数量)。

以下是有关一家给定公司(我将其称为公司 F)的一组信息。

Attendance   Number of representatives   Number of Presenters     Company Audience  
29           2                            30                      2
20           3                            30                      4   
30           10                           20                      5
40           20                           10                      5
10           30                           13                      5

我想做的是让 R 计算这些特定列中每一列的描述性统计数据 [平均值、标准差、最小值和最大值],并按以下方式将其导出到 Excel 中:

Company F  Average Number of Attendance Standard Deviation of Number of Attendance Min  Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives   Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience 

因为这是很长的一行,所以我将通过说我正在尝试查找这些列中每一列的描述性统计数据 [平均值、标准差、最小值、最大值和 n] 来总结它。这些应该都对应于F公司。

我如何尝试解决这个问题:

我使用了 R 中的描述性统计功能来获取数据帧来为我识别代码。为此,我使用了 psych 包:

 library(psych)
 describe(CompanyF$Attendance)
 describe(CompanyF$NumberofRepresentatives)
 describe(CompanyF$Number_of_Presenters
 describe(CompanyF$Company Audience)

通过使用该包,我能够获取数据帧,然后进入 Excel 并手动构建行,输入我收到的值并忽略 psych 库包提供的与我的内容不符的任何信息感兴趣。以下是我从 psych 包中获取的信息类型的示例:

vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
1    1 559 2.02 2.21      1    1.75 1.48   0   9     9 0.78    -0.65 0.09

这个过程非常耗时并且容易出错。完成 F 公司的工作后,我在 Excel 中在 F 公司的行下方创建一个新行,但这次是为另一家公司(例如 G 公司)继续查找描述性统计数据的过程 [平均值、标准差、最小值、 max 和 n] 对于这些感兴趣的变量中的每一个(出席率、代表人数、演示者人数和公司观众)。

我寻找了各种解决方案,其中一个来自堆栈溢出帖子 Export data from R into Excel但我无法找到如何将 R 逐行信息导入 Excel 以及如何指定它识别我上面列出的描述性统计数据的解释。

理想情况下,我会将以下输出放入 Excel 中:

Company F  Average Number of Attendance Standard Deviation of Number of Attendance Min  Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives   Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience 
Company G  Average Number of Attendance Standard Deviation of Number of Attendance Min  Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives   Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience 
Company H  Average Number of Attendance Standard Deviation of Number of Attendance Min  Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives   Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience 

等等。

我的数据的原始子集如下:

structure(list(sn = structure(c(2L, 2L, 3L, 5L, 2L, 7L, 1L, 9L, 
1L, 9L, NA, 9L, 1L, 26L, 11L, 9L, 7L, NA, NA, 7L, 17L, 9L, NA, 
21L, 7L, 17L, 7L, 7L, 16L, 7L, 7L, 7L, 7L, 26L, 7L, 6L, 26L, 
22L, NA, NA, 11L, 23L, 23L, 26L, NA, 7L, 23L, 1L, NA, 1L, 7L, 
11L, 12L, 13L, 9L, NA, 15L, NA, 20L, 15L, NA, 17L, 5L, NA, 22L, 
15L, NA, NA, 5L, 8L, 32L, 29L, 23L, 33L, 1L, 23L, 14L, 6L, 7L, 
15L), .Label = c("Broome Street", "Company A", "Company B", "Company BC",
"Company C", "Company CC", "Company D Clinton", "Company DD", 
"Company E", "Company ED BroadCompany", "Company G", "Company H     
BroadCompany", 
"Company I BroadCompany", "Company I Studio", "Company J", "Company K", 
"Company L", "Company M", "Company M BroadCompany", "Company M HS    
 BroadCompany", 
"Company MCC BroadCompany", "Company N", "Company P", "Company Q", 
"Company Q Company N", "Company Q Company ZZ", "Company R - Company ZZ", 
"Company SLab", "Company Z", "Company ZE", "Company ZED", "Company ZEQ", 
"Company ZZ", "Company ZZQ", "Company ZZQ Company N"), class = "factor"), 
earn_tot = c(21.85, 20.8, NA, 8.16, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 7.16, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 43.32, NA, 30.48, NA, NA, 34.9, NA, NA, NA, NA, NA, 25.82, 
40.75, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, 
30, NA, NA, NA, NA, NA, NA, 39.1, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 52.29, 44.32, NA, 7, 38.32, 0, NA, NA, 8.25, 
NA, NA), earn_and_current_tot = c(29.43, 20.8, NA, 8.16, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 7.16, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, 49.9, NA, 37.56, NA, NA, 41.98, 
NA, NA, NA, NA, NA, 37.32, 49, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, 0, NA, NA, NA, 37, NA, NA, NA, NA, NA, NA, 47.68, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 57.29, 48.48, NA, 
7, 45.9, 0, NA, NA, 15.75, NA, NA), pass_99 = c(0L, 0L, NA, 
NA, NA, NA, 1L, NA, NA, NA, NA, 5L, NA, 0L, NA, 5L, NA, NA, 
NA, 0L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 0L, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 4L, 0L, 
NA, NA, NA, 4L, 4L, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, 
NA, 1L, NA, NA, NA, NA, 1L, NA, NA, 0L, 4L, 0L, NA, NA, 0L, 
NA, NA), pass_65 = c(0L, 0L, 5L, 0L, 6L, NA, 0L, 5L, NA, 
5L, NA, 6L, NA, 0L, 5L, 2L, NA, NA, NA, 0L, 5L, 5L, NA, NA, 
NA, 0L, NA, 1L, 4L, 7L, 5L, 5L, 7L, 0L, 5L, NA, 0L, 1L, NA, 
NA, NA, 2L, 0L, 6L, NA, 8L, 2L, 0L, NA, 4L, 0L, 1L, 3L, NA, 
NA, NA, NA, NA, 4L, 0L, NA, 5L, 7L, NA, 0L, NA, NA, NA, 5L, 
0L, 5L, 4L, 0L, 2L, 0L, 0L, 7L, 0L, NA, 5L)), .Names = c("sn", 
"earn_tot", "earn_and_current_tot", "pass_99", "pass_65"), row.names = c(NA, 
80L), class = "data.frame")

有四个最重要的子集列。这些列是“earn_tot”、“earn_and_current_tot”、“pass_99”和“pass_65”。这里列出了许多已匿名的公司。我正在与大约 100 家公司合作。 “sn”栏下有许多公司名称。整个子集化数据集的名称称为 Subset.MergedEx.So。

对于没有提供一个良好的可重现示例,我深表歉意。感谢您的耐心等待。我一直在阅读如何构建一个并使用了以下代码: dput((head(Subset.MergedEx.SO, 80)))

最佳答案

这可能不是最佳解决方案,但它只使用了 basepsych 包。

这是数据

df <- data.frame(company = rep(c("A","B", "C","D"), each = 5),
              attendance = sample(5:10,20,TRUE),
              representatives = sample(2:30,20,TRUE),
              presenters = sample(20:30,20,TRUE),
              audience = sample(50:70,20,TRUE))

我编写了一个函数来获取您需要的值。 我假设你只有 5 类信息:公司名称、出席情况、代表、演讲者、观众。

    get.values<-function(x){
    require(psych)
    info<-describeBy(x[,2:5], group = x[,1])
    n.companies<-length(levels(df[,1]))
    n<-list()
    mean<-list()
    sd<-list()
    min<-list()
    max<-list()
    for(i in 1:n.companies){
      n[[i]]<-info[[i]][,2]
      mean[[i]]<-info[[i]][,3]
      sd[[i]]<-info[[i]][,4]
      min[[i]]<-info[[i]][,8]
      max[[i]]<-info[[i]][,9]
    }
  l<-Map(c, mean, sd, min, max, n)
  valuedf<-do.call(rbind, l)
return(valuedf)
}

我还编写了一个函数来生成您想要的列名称,您可以将它们命名为您想要的任何名称:

get.names<-function(x){
      require(psych)
      names<-rownames(describe(x[,2:5]))
      avg<-character()
      sd<-character()
      min<-character()
      max<-character()
      total<-character()
  for(i in 1:length(names)){
      avg[i]<-paste("average number of", names[i])
      sd[i]<-paste("standard deviation of", names[i])
      min[i]<-paste("min number of", names[i])
      max[i]<-paste("max number of", names[i])
      total[i]<-paste("total number of", names[i])
  }
  cnames<-c(avg,sd,min,max,total)
return(cnames)
}

将值和名称合并到一个新的数据框中:

output<-get.values(df)
col.names<-get.names(df)
colnames(output)<-col.names
rownames(output)<-levels(df[,1]) 

导出到 Excel:

library(xlsx)
write.xlsx(output, "descriptives.xlsx")

关于r - 将描述性统计值行从 R 导出到 Excel 工作表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35435770/

相关文章:

excel - 将标题分成单词并在标题中搜索

R arulesSequences 查找序列支持哪些模式

r - 用ggplot2画一个圆

excel - Excel VBA 中的 While 循环问题

VBA - 循环和低效的 If 语句

xml - Excel 2013 FILTERXML() 返回错误值?

r - 基于 ifelse 在旧数据表上创建新数据表

r - 如何正确解释具有大 z 统计数据/冲突错误消息的 glmmTMB 模型?

r - 我可以将 R 中的两个彩色数据集与 ggplot2 组合起来并在组合图上保留它们的颜色吗?

c# - 我想打开一个excel文件并等待用户编辑并保存它