r - 枢轴更长 : Multiple rows to columns in R

标签 r pivot tidyverse tidyr reshape2

我目前正在尝试解决如何旋转我的数据框(下面的小 dput)。目前一栏包含有关国家、ISO 代码、行业和部门的信息。我需要将此信息分为 4 列,并有一个相应的值列。我之前使用过melt 和pivot_long 函数,但不确定如何生成4 个新列以及值列。

DI_SMALL <- structure(list(V1 = structure(c(NA, NA, NA, NA, 1L, 1L, 1L, 1L
), .Label = "Energy Usage (TJ)", class = "factor"), V2 = structure(c(NA, 
NA, NA, NA, 2L, 1L, 4L, 3L), .Label = c("Coal", "Natural Gas", 
"Nuclear Electricity", "Petroleum"), class = "factor"), V3 = structure(c(5L, 
4L, 7L, 6L, 3L, 2L, 1L, 1L), .Label = c("0", "1.29327085460648e-05", 
"1.59504500372979e-05", "AFG", "Afghanistan", "Agriculture", 
"Industries"), class = "factor"), V4 = structure(c(5L, 4L, 7L, 
6L, 3L, 2L, 1L, 1L), .Label = c("0", "6.53466630114587e-06", 
"8.05944706428482e-06", "AFG", "Afghanistan", "Fishing", "Industries"
), class = "factor"), V5 = structure(c(5L, 4L, 6L, 7L, 3L, 2L, 
1L, 1L), .Label = c("0", "1.88562621206664e-05", "2.32557880912235e-05", 
"AFG", "Afghanistan", "Industries", "Mining and Quarrying"), class = "factor"), 
    V6 = structure(c(5L, 4L, 7L, 6L, 3L, 2L, 1L, 1L), .Label = c("0", 
    "2.00284547443433e-05", "2.47018365704401e-05", "AFG", "Afghanistan", 
    "Food & Beverages", "Industries"), class = "factor")), row.names = c("V1", 
"V2", "V3", "V4", "X", "X.1", "X.2", "X.3"), class = "data.frame")

理想情况下,输出将包含 7 列。现有的第一个列是“国家/地区”、“ISO”、“行业”和“部门”,然后是“值”。如下所示:

Output <- structure(list(NA. = structure(c(1L, 1L, 1L, 1L), .Label = "Energy Usage (TJ)", class = "factor"), 
    NA..1 = structure(c(2L, 1L, 4L, 3L), .Label = c("Coal ", 
    "Natural Gas", "Nuclear Electricity", "Petroleum"), class = "factor"), 
    Country = structure(c(1L, 1L, 1L, 1L), .Label = "Afghanistan", class = "factor"), 
    ISO = structure(c(1L, 1L, 1L, 1L), .Label = "AFG", class = "factor"), 
    Industry = structure(c(1L, 1L, 1L, 1L), .Label = "Industries", class = "factor"), 
    Sector = structure(c(1L, 1L, 1L, 1L), .Label = "Agriculture", class = "factor"), 
    Value = c(1.595045004, 1.2932706, 0, 0)), class = "data.frame", row.names = c(NA, 
-4L))

希望这是有道理的,任何想法将不胜感激!

谢谢

最佳答案

这不是 pivot_long 适合的情况,因为您有映射到行和列的变量,并且它们不是列/行的名称。相反,您必须从变量中提取这些属性,然后“手动”构建 data.frame。这是一个示例,我建议检查每个步骤中的变量值,以便更好地理解此处的过程:

library(dplyr)

df <- DI_SMALL %>% 
  mutate_all(as.character) 

row_attr <-  paste0(df$V1, "/", df$V2)
row_attr <- row_attr[row_attr!= "NA/NA"]

col_attr <- df[1:4, -(1:2)] %>%
  apply(MARGIN = 2, function(x) paste0(x, collapse = "/"))

values <- df[-(1:4), -(1:2)] %>%
  mutate_all(as.numeric) %>%
  as.matrix() %>%
  c()

out <- expand.grid(row_attr, col_attr)
out <- cbind(out, values)

out <- out %>% 
  tidyr::separate(col = "Var1", into = c("NA.", "NA..1"), sep = "/") %>%
  tidyr::separate(col = "Var2", 
                  into = c("Country", "ISO", "Industry", "Sector"),
                  sep = "/")

out[1:4]

我认为Output 中的结果和DI_SMALL 的值具有不同的比例,但除此之外,这似乎是所需的输出。

                NA.               NA..1     Country ISO   Industry      Sector       values
1 Energy Usage (TJ)         Natural Gas Afghanistan AFG Industries Agriculture 1.595045e-05
2 Energy Usage (TJ)                Coal Afghanistan AFG Industries Agriculture 1.293271e-05
3 Energy Usage (TJ)           Petroleum Afghanistan AFG Industries Agriculture 0.000000e+00
4 Energy Usage (TJ) Nuclear Electricity Afghanistan AFG Industries Agriculture 0.000000e+00

关于r - 枢轴更长 : Multiple rows to columns in R,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60450617/

相关文章:

r - 如何在 R barplot 斜体上制作列标签

MySQL 4.0.16 更新表不带子查询

python - 如何在 pandas 中拆栈(或旋转?)

根据原始列名称重命名列 R

r - 重命名变量时使用 numlist 循环

r - 在 R 中为管道中的多个操作枚举变量时 DRY

r - glmer 过度离散的模型检验和检验

r - 伪 RNG 的不同行为取决于 R 的版本

r - 一个边列表 R 的多个邻接矩阵

sql - 将 1000 列旋转/透视为行