r - 创建信息更丰富的表格输出

标签 r data.table reshape mean

我有一个数据表,如下

panelID = c(1:50)   
year= c(2001:2010)
country = c("NLD", "BEL", "GER")
urban = c("A", "B", "C")
indust = c("D", "E", "F")
sizes = c(1,2,3,4,5)
n <- 2
library(data.table)
set.seed(123)
DT <- data.table(panelID = rep(sample(panelID), each = n),
                 country = rep(sample(country, length(panelID), replace = T), each = n),
                 year = c(replicate(length(panelID), sample(year, n))),
                 some_NA = sample(0:5, 6),                                             
                 some_NA_factor = sample(0:5, 6), 
                 industry = rep(sample(indust, length(panelID), replace = T), each = n),
                 urbanisation = rep(sample(urban, length(panelID), replace = T), each = n),
                 size = rep(sample(sizes, length(panelID), replace = T), each = n),
                 norm = round(runif(100)/10,2),
                 sales= round(rnorm(10,10,10),2),
                 Happiness = sample(10,10),
                 Sex = round(rnorm(10,0.75,0.3),2),
                 Age = sample(100,100),
                 Educ = round(rnorm(10,0.75,0.3),2))        
DT [, uniqueID := .I]                                                         # Creates a unique ID     
DT[DT == 0] <- NA 
DT$sales[DT$sales< 0] <- NA 
DT <- as.data.frame(DT)

setDT(DT)[,Mean_Sales_pergroup := mean(sales, na.rm=TRUE),  by=c("industry", "year")]

现在我想比较多年来每个行业Mean_Sales_pergroup的差异,所以我想尝试一下:

table(DT$Mean_Sales_pergroup, DT$year)

但这给了我:

                   2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
  2.11                0    0    0    0    0    0    1    0    0    0
  2.18                0    0    0    0    0    0    0    0    0    1
  2.61                2    0    0    0    0    0    0    1    0    0
  3.6775              0    0    0    0    4    0    0    0    0    0
  ...
  14.19               0    0    0    0    0    0    0    2    0    0

这当然毫无信息可言。

我该怎么做才能得到类似的东西:

           2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Industry D  ..
Industry E
Industry F

编辑:

@rg255 的评论给出:

dcast(DT, industry ~ year, value.var = "Mean_Sales_pergroup")
Aggregate function missing, defaulting to 'length'
   industry 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
1:        D    1    1    5    5    3    4    1    1    6    1
2:        E    2    5    5    3    4    3    3    1    3    5
3:        F    1    6    2    3    4    7    5    2    4    4

最佳答案

制作唯一的行然后进行转换

dcast(unique(DT[, .(industry, year, Mean_Sales_pergroup)]), ... ~ year)

给出所需的输出

   industry  2001  2002   2003     2004    2005     2006     2007  2008
1:        D  2.61 4.260  6.204 9.650000 10.7050 8.625000 2.110000  2.61
2:        E 13.24 6.766  9.940 5.156667  3.6775 9.225000 4.606667 13.24
3:        F  2.61 8.000  ...

关于r - 创建信息更丰富的表格输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60866886/

相关文章:

r - 计算每行 R 条件的实例数

r - 覆盖ggplot2图例文本

r - data.table "list"与 ":="处理 NaN

r - 使用来自另一个数据框中的唯一值和分配给列的相应值的列名称创建新数据框

python - 将行重新调整为列组

r - R中下上限的快速索引

r - Data.Table:每两周汇总一次

R data.table fwrite 到 fread 空间分隔符和空

r - 通过对包含冗余观察的日期进行分段,将长格式数据转换为短格式

r - 如何在不复制对象的情况下重命名 R 中的变量?