R:将宽格式转换为具有多个 3 个时间段变量的长格式

抱歉，如果这是一个简单的问题，但我在搜索后未能找到简单的解决方案。我对 R 相当陌生，并且在使用 Melt (reshape2) 或 Gather(tidyr) 函数将宽格式转换为长格式时遇到问题。我正在使用的数据集包含 22 个不同的时间变量，每个变量都有 3 个时间段。当我尝试一次性将所有这些格式从宽格式转换为长格式时，就会出现问题。我已经成功地单独转换它们，但这是一个非常低效且漫长的过程，所以我想知道是否有人可以提出一个更简单的解决方案。下面是我创建的示例数据集，其格式与我正在使用的数据集类似:

Subject <- c(1, 2, 3)
BlueTime1 <- c(2, 5, 6)
BlueTime2 <- c(4, 6, 7)
BlueTime3 <- c(1, 2, 3)
RedTime1 <- c(2, 5, 6)
RedTime2 <- c(4, 6, 7)
RedTime3 <- c(1, 2, 3)
GreenTime1 <- c(2, 5, 6)
GreenTime2 <- c(4, 6, 7)
GreenTime3 <- c(1, 2, 3)

sample.df <- data.frame(Subject, BlueTime1, BlueTime2, BlueTime3,
                    RedTime1, RedTime2, RedTime3,
                    GreenTime1,GreenTime2, GreenTime3)

对我有用的解决方案是使用 tidyr 的收集功能，按主题排列数据(以便将每个主题的数据分组在一起)，然后仅选择主题、时间段和评级。这是针对每个变量(在我的例子中是 22)完成的。

install.packages("dplyr")
install.packages("tidyr")
library(dplyr)
library(tidyr)

BlueGather <- gather(sample.df, Time_Blue, Rating_Blue, c(BlueTime1,
                                                          BlueTime2,
                                                          BlueTime3))
BlueSorted <- arrange(BlueGather, Subject)

BlueSubtracted <- select(BlueSorted, Subject, Time_Blue, Rating_Blue)

在这段代码之后，我将所有内容合并到一个数据框中。这对我来说似乎非常缓慢且效率低下，希望有人可以帮助我找到一个更简单的解决方案。谢谢你!

最佳答案

这里的想法是gather()所有时间变量(除Subject之外的所有变量)，在上使用separate() >key 将它们拆分为 label 和 time，然后 spread() label和 value 以获得所需的输出。

library(dplyr)
library(tidyr)

sample.df %>%
  gather(key, value, -Subject) %>%
  separate(key, into = c("label", "time"), "(?<=[a-z])(?=[0-9])") %>%
  spread(label, value)

这给出:

#  Subject time BlueTime GreenTime RedTime
#1       1    1        2         2       2
#2       1    2        4         4       4
#3       1    3        1         1       1
#4       2    1        5         5       5
#5       2    2        6         6       6
#6       2    3        2         2       2
#7       3    1        6         6       6
#8       3    2        7         7       7
#9       3    3        3         3       3

注意

这里我们在 answer 的 separate() 中使用 regex通过@RichardScriven 在第一个遇到的数字上拆分列。

编辑

从您的评论中我了解到，您的数据集列名称实际上采用 ColorTime_Pre、ColorTime_Post、ColorTime_Final 的形式。如果是这种情况，您不必在 separate() 中指定正则表达式作为默认值 sep = "[^[:alnum:]]+"将匹配您的 _ 并将 key 相应地拆分为 label 和 time:

sample.df %>%
  gather(key, value, -Subject) %>%
  separate(key, into = c("label", "time")) %>%
  spread(label, value)

将给出:

#  Subject  time BlueTime GreenTime RedTime
#1       1 Final        1         1       1
#2       1  Post        4         4       4
#3       1   Pre        2         2       2
#4       2 Final        2         2       2
#5       2  Post        6         6       6
#6       2   Pre        5         5       5
#7       3 Final        3         3       3
#8       3  Post        7         7       7
#9       3   Pre        6         6       6

关于R:将宽格式转换为具有多个 3 个时间段变量的长格式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38505035/

R:将宽格式转换为具有多个 3 个时间段变量的长格式

上一篇：tcl - 当 TCL 中的变量发生变化时执行某些操作

下一篇：ruby-on-rails - gemspec 不允许 "https://rubygems.org"，它只允许 "' http ://rubygems. org '"