编辑:对这个超出最小范围的示例表示歉意。我用一个更简洁的例子重新做了这个,看起来 aosmith 的答案已经成功了!
这是this question之后的下一步,在同一过程中。真是太棒了。
我有一个包含一系列变量的数据集,每个变量都有低、中和高值。还有多个标识变量,在此示例中我将其称为“场景”和“月份”。我正在进行涉及 3 个不同值的计算,其中一些值具有低、中或高值,这些值在每个场景和每个月中都不同。
# generating a practice dataset
library(dplyr)
library(tidyr)
set.seed(123)
pracdf <- bind_cols(expand.grid(ID = letters[1:2],
month = 1:2,
scenario = c("a", "b")),
data_frame(p.mid = runif(8, 100, 1000),
a = rep(runif(2), 4),
b = rep(runif(2), 4),
c = rep(runif(2), 4)))
pracdf <- pracdf %>% mutate(p.low = p.mid * 0.75,
p.high = p.mid * 1.25) %>%
gather(p.low, p.mid, p.high, key = "ptype", value = "p")
# all of that is just to generate the practice dataset.
# 2 IDs * 2 months * 2 scenarios * 3 different values of p = 24 total rows in this dataset
# Do the calculation
pracdf2 <- pracdf %>%
mutate(result = p * a * b * c)
这个完全“收集”的数据集具有我想要的结果。让我们进行扩展类型操作,以一种更具可读性的方式获得此结果,每个月份、场景和 p 类型组合都有其自己的列。示例列名称为“month1_scenario.a_p.low”。该数据集的总时长为 2 个月 * 3 p 类型 * 2 个场景 = 12 列。
# this fully "gathered" dataset is exactly what I want.
# Let's put it in a format that the supervisor for this project will be happy with
# ID, month, scenario, and p.type are all "key" variables
# spread() only allows one key variable at a time, so...
pracdf2.spread1 <- pracdf2 %>% spread(ptype, result, sep = ".")
# Produces NA's. Looks like it's messing up with the different values of p
pracdf2.spread2 <- pracdf2 %>% select(-p) %>% spread(ptype, result, sep = ".")
# that's better, now let's spread across scenarios
pracdf2.spread2.spread2low <- pracdf2.spread2 %>% select(-ptype.p.high, -ptype.p.mid) %>% spread(scenario, ptype.p.low, sep = ".")
pracdf2.spread2.spread2mid <- pracdf2.spread2 %>% select(-ptype.p.low, -ptype.p.high) %>% spread(scenario, ptype.p.mid, sep = ".")
pracdf2.spread2.spread2high <- pracdf2.spread2 %>% select(-ptype.p.mid, -ptype.p.low) %>% spread(scenario, ptype.p.high, sep = ".")
pracdf2.spread2.spread2 <- pracdf2.spread2.spread2low %>% left_join(pracdf2.spread2.spread2mid)
# Ok, that was rough and will clearly spiral out of control quickly
# what am I still doing with my life?
我可以使用 spread() 来扩展每个键列,然后为每个后续值列重做扩展,但这需要很长时间,并且可能容易出错。
有没有更干净、更整洁的方法来做到这一点?
谢谢!
最佳答案
您可以在传播之前使用 tidyr 中的 unite
将三列合并为一列。
然后您可以传播
,使用新列作为键
,使用“结果”作为值
。
在传播之前,我还删除了“a”到“p”列,因为所需的结果似乎不需要这些。
pracdf2 %>%
unite("allgroups", month, scenario, ptype) %>%
select(-(a:p)) %>%
spread(allgroups, result)
# A tibble: 2 x 13
ID `1_a_p.high` `1_a_p.low` `1_a_p.mid` `1_b_p.high` `1_b_p.low` `1_b_p.mid` `2_a_p.high` `2_a_p.low`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 160 96.2 128 423 254 338 209 126
2 b 120 72.0 96.0 20.9 12.5 16.7 133 79.5
# ... with 4 more variables: `2_a_p.mid` <dbl>, `2_b_p.high` <dbl>, `2_b_p.low` <dbl>, `2_b_p.mid` <dbl>
关于r - 当你的数据有多个 "key"变量时,如何使用 spread() ?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48311834/