r - 当你的数据有多个 "key"变量时，如何使用 spread() ？

编辑:对这个超出最小范围的示例表示歉意。我用一个更简洁的例子重新做了这个，看起来 aosmith 的答案已经成功了!

这是this question之后的下一步，在同一过程中。真是太棒了。

我有一个包含一系列变量的数据集，每个变量都有低、中和高值。还有多个标识变量，在此示例中我将其称为“场景”和“月份”。我正在进行涉及 3 个不同值的计算，其中一些值具有低、中或高值，这些值在每个场景和每个月中都不同。

# generating a practice dataset

library(dplyr)
library(tidyr)
set.seed(123)

pracdf <- bind_cols(expand.grid(ID = letters[1:2], 
                                month = 1:2, 
                                scenario = c("a", "b")),
                    data_frame(p.mid = runif(8, 100, 1000),
                               a = rep(runif(2), 4),
                               b = rep(runif(2), 4),
                               c = rep(runif(2), 4)))

pracdf <- pracdf %>% mutate(p.low = p.mid * 0.75,
                            p.high = p.mid * 1.25) %>%
  gather(p.low, p.mid, p.high, key = "ptype", value = "p") 

# all of that is just to generate the practice dataset.
# 2 IDs * 2 months * 2 scenarios * 3 different values of p = 24 total rows in this dataset

# Do the calculation

pracdf2 <- pracdf %>%
  mutate(result = p * a * b * c)

这个完全“收集”的数据集具有我想要的结果。让我们进行扩展类型操作，以一种更具可读性的方式获得此结果，每个月份、场景和 p 类型组合都有其自己的列。示例列名称为“month1_scenario.a_p.low”。该数据集的总时长为 2 个月 * 3 p 类型 * 2 个场景 = 12 列。

# this fully "gathered" dataset is exactly what I want. 
# Let's put it in a format that the supervisor for this project will be happy with
# ID, month, scenario, and p.type are all "key" variables
# spread() only allows one key variable at a time, so...

pracdf2.spread1 <- pracdf2 %>% spread(ptype, result, sep = ".")
# Produces NA's. Looks like it's messing up with the different values of p

pracdf2.spread2 <-  pracdf2 %>% select(-p) %>% spread(ptype, result, sep = ".")
# that's better, now let's spread across scenarios

pracdf2.spread2.spread2low <- pracdf2.spread2 %>% select(-ptype.p.high, -ptype.p.mid) %>% spread(scenario, ptype.p.low, sep = ".")
pracdf2.spread2.spread2mid <- pracdf2.spread2 %>% select(-ptype.p.low, -ptype.p.high) %>% spread(scenario, ptype.p.mid, sep = ".")
pracdf2.spread2.spread2high <- pracdf2.spread2 %>% select(-ptype.p.mid, -ptype.p.low) %>% spread(scenario, ptype.p.high, sep = ".")

pracdf2.spread2.spread2 <- pracdf2.spread2.spread2low %>% left_join(pracdf2.spread2.spread2mid)

# Ok, that was rough and will clearly spiral out of control quickly
# what am I still doing with my life?

我可以使用 spread() 来扩展每个键列，然后为每个后续值列重做扩展，但这需要很长时间，并且可能容易出错。

有没有更干净、更整洁的方法来做到这一点？

谢谢!

最佳答案

您可以在传播之前使用 tidyr 中的 unite 将三列合并为一列。

然后您可以传播，使用新列作为键，使用“结果”作为值。

在传播之前，我还删除了“a”到“p”列，因为所需的结果似乎不需要这些。

pracdf2 %>%
     unite("allgroups", month, scenario, ptype) %>%
     select(-(a:p)) %>%
     spread(allgroups, result)

# A tibble: 2 x 13
  ID    `1_a_p.high` `1_a_p.low` `1_a_p.mid` `1_b_p.high` `1_b_p.low` `1_b_p.mid` `2_a_p.high` `2_a_p.low`
  <fct>        <dbl>       <dbl>       <dbl>        <dbl>       <dbl>       <dbl>        <dbl>       <dbl>
1 a              160        96.2       128          423         254         338            209       126  
2 b              120        72.0        96.0         20.9        12.5        16.7          133        79.5
# ... with 4 more variables: `2_a_p.mid` <dbl>, `2_b_p.high` <dbl>, `2_b_p.low` <dbl>, `2_b_p.mid` <dbl>

关于r - 当你的数据有多个 "key"变量时，如何使用 spread() ？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48311834/

r - 当你的数据有多个 "key"变量时，如何使用 spread() ？

上一篇：r - Shiny 允许用户选择要显示的绘图输出

下一篇：angular - Protractor afterEach 将 browser.manage().logs() 提取到辅助函数