从宽变量组 reshape 到长变量组

标签 r dplyr reshape tidyr tidyverse

这个问题与已经存在的 question 非常相似。

但是我无法将其扩展到多组变量。这是我正在处理的数据集

A tibble: 12 x 9
   Month Cabo_BU_PCT Acapulco_BU_PCT Cabo_LOS_AVG Acapulco_LOS_AVG BED_BUGS_Cabo BED_BUGS_Acapulco TOTAL_OCCUPIED_Cabo TOTAL_OCCUPIED_Acapulco

       1   0.6470034       0.6260116     5.223000         4.307667             5                 3               19216                    6498
       2   0.6167027       0.6777457     5.893571         4.247500             3                 0               17095                    6566
       3   0.6372108       0.6348126     5.229677         4.327742             5                 1               19556                    6809
       4   0.6357912       0.6548170     5.356667         4.220000             4                 6               18883                    6797
       5   0.6449006       0.6409659     5.344194         4.162903             2                 5               19792                    6875
       6   0.6747811       0.6935453     5.812667         4.362000             4                 3               20041                    7199
       7   0.6697947       0.6932687     5.544516         4.462903             5                 6               20556                    7436
       8   0.6595960       0.6777923     5.260323         4.135806             0                 7               20243                    7270
       9   0.6792256       0.6863198     5.424333         4.133333             5                 0               20173                    7124
      10   0.6976214       0.7370875     5.419677         4.350000             3                 3               21410                    7906
      11   0.6600337       0.6615607     5.450000         4.184333             3                 2               19603                    6867
      12   0.6761812       0.6773261     5.347097         4.318710             2                 2               20752                    7265

我的目标是将其 reshape 为如下所示的长格式,其中列 Cabo_BU_PCT Acapulco_BU_PCT 被转换为列名称 BU_PCT 下的长格式,类似的列, Cabo_LOS_AVG Acapulco_LOS_AVG 被转换为列名 LOS_AVG 下的长格式,依此类推。

  Month    Location    BU_PCT      LOS_AVG     BED_BUGS       TOTAL_OCCUPIED
  1        Cabo        0.6470034   5.223000    5              19216
  1        Acapulco    0.6260116   4.307667    3              6498
  2        Cabo        0.6167027   5.893571    3              17095
  2        Acapulco    0.6777457   4.247500    0              6566
  .
  .
  .
  12       Cabo        0.6761812   5.347097    2              20752
  12       Acapulco    0.6773261   4.318710    2              7265  

非常感谢任何帮助 reshape 此数据框的人。谢谢。

========数据集===========

df_wide <- structure(list(Month = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
), Cabo_BU_PCT = c(0.647003367003367, 0.616702741702742, 0.637210817855979, 
0.635791245791246, 0.644900619094168, 0.674781144781145, 0.669794721407625, 
0.65959595959596, 0.679225589225589, 0.69762137504073, 0.66003367003367, 
0.676181166503747), Acapulco_BU_PCT = c(0.626011560693642, 0.677745664739884, 
0.634812604885325, 0.654816955684008, 0.640965877307477, 0.69354527938343, 
0.693268692895767, 0.677792280440052, 0.686319845857418, 0.737087451053515, 
0.661560693641619, 0.677326123438374), Cabo_LOS_AVG = c(5.223, 
5.89357142857143, 5.22967741935484, 5.35666666666667, 5.3441935483871, 
5.81266666666667, 5.54451612903226, 5.26032258064516, 5.42433333333333, 
5.41967741935484, 5.45, 5.34709677419355), Acapulco_LOS_AVG = c(4.30766666666667, 
4.2475, 4.32774193548387, 4.22, 4.16290322580645, 4.362, 4.46290322580645, 
4.1358064516129, 4.13333333333333, 4.35, 4.18433333333333, 4.31870967741935
), BED_BUGS_Cabo = c(5, 3, 5, 4, 2, 4, 5, 0, 5, 3, 3, 2), BED_BUGS_Acapulco = c(3, 
0, 1, 6, 5, 3, 6, 7, 0, 3, 2, 2), TOTAL_OCCUPIED_Cabo = c(19216, 
17095, 19556, 18883, 19792, 20041, 20556, 20243, 20173, 21410, 
19603, 20752), TOTAL_OCCUPIED_Acapulco = c(6498, 6566, 6809, 
6797, 6875, 7199, 7436, 7270, 7124, 7906, 6867, 7265)), class = c("tbl_df", 
"tbl", "data.frame"), .Names = c("Month", "Cabo_BU_PCT", "Acapulco_BU_PCT", 
"Cabo_LOS_AVG", "Acapulco_LOS_AVG", "BED_BUGS_Cabo", "BED_BUGS_Acapulco", 
"TOTAL_OCCUPIED_Cabo", "TOTAL_OCCUPIED_Acapulco"), row.names = c(NA, 
-12L))

最佳答案

如果你只有两个位置,你可以把它们放在正则表达式中,考虑到它们可能在名称的开头或结尾:

library(tidyverse)

df_wide %>% 
    gather(variable, value, -Month) %>% 
    mutate(location = sub('.*(Cabo|Acapulco).*', '\\1', variable), 
           variable = sub('_?(Cabo|Acapulco)_?', '', variable)) %>% 
    spread(variable, value)
#> # A tibble: 24 x 6
#>    Month location BED_BUGS    BU_PCT  LOS_AVG TOTAL_OCCUPIED
#>  * <dbl>    <chr>    <dbl>     <dbl>    <dbl>          <dbl>
#>  1     1 Acapulco        3 0.6260116 4.307667           6498
#>  2     1     Cabo        5 0.6470034 5.223000          19216
#>  3     2 Acapulco        0 0.6777457 4.247500           6566
#>  4     2     Cabo        3 0.6167027 5.893571          17095
#>  5     3 Acapulco        1 0.6348126 4.327742           6809
#>  6     3     Cabo        5 0.6372108 5.229677          19556
#>  7     4 Acapulco        6 0.6548170 4.220000           6797
#>  8     4     Cabo        4 0.6357912 5.356667          18883
#>  9     5 Acapulco        5 0.6409659 4.162903           6875
#> 10     5     Cabo        2 0.6449006 5.344194          19792
#> # ... with 14 more rows

关于从宽变量组 reshape 到长变量组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47425451/

相关文章:

r - 从 TM 包中取消列出 Corpus 即可获得 NA

r - 从汇总函数中提取参数系数

r - 在多个条件下使用 dplyr filter() 进行过滤

r - 如何使用 ddply 获取数据框中类的加权平均值?

r - 有没有办法让 RMarkdown 选项卡即使在添加 Shiny 运行时后也能显示?

r - 通过连接列连接两个数据框

r - 在 R 中的 ggplot2 中操作数据点的值

r - 遍历 dplyr 中的列

R:与使用 tidytext::unnest_tokens 聚合相反。多变量和大写

python - Pandas: reshape 和多索引