r - "number of items to replace is not a multiple of replacement length"从半长整形到长整形时

标签 r dataframe reshape

我想将半长数据帧转换为长格式。然而,在 reshape 命令之后,有几个警告说“要替换的项目数不是替换长度的倍数”。当我打开新的数据帧时,格式基本正确,但它说数据帧已损坏。

这是怎么回事?

这是我使用的命令。它明确要求我插入 idvar 的值:

df2 = reshape(df,
              direction="long",
              varying=3:ncol(df),
              ids="id",
              idvar="newid",
              timevar="category")

这是我原始数据框的结构(实际上,不仅有汽车和树木,还有更多类别):

id  trial  resp.car rt.car color.car resp.tree rt.tree color.tree
 1      1         1    500    "blue"         3     765    "green"
 1      1         3    534   "green"         1     455   "yellow"
 1      2         2    553  "yellow"         2     794      "red"
 1      2         3    577   "black"         3     834     "blue"
 2      1         1    598   "green"         1     756      "red"
 2      1         3    355  "yellow"         3     457    "black"
 2      2         3    876    "blue"         1     767   "yellow"
 2      2         2    466   "black"         1     439    "green"

期望的结果:

id  trial  category   resp        rt     color
 1      1     "car"      1       500    "blue"    
 1      1     "car"      3       534   "green"  
 1      2     "car"      2       553  "yellow"     
 1      2     "car"      3       577   "black"    
 1      1    "tree"      3       765   "green"     
 1      1    "tree"      1       455  "yellow"    
 1      2    "tree"      2       794     "red"     
 1      2    "tree"      3       834    "blue"     
 2      1     "car"      1       598   "green"
 ...

最佳答案

使用pivot_longer可能会更容易 - 在cols中指定要 reshape 为long的列,在names_pattern中捕获列名称的子字符串以及 names_to 中的列名称。 .value 将返回列的值,其中 category 将是从列名称中提取的子字符串后缀的列名称。正则表达式模式匹配从列名开头 (^) 开始的一个或多个字符 (.*),捕获 ((..) ) 后跟一个点(\\. - 转义,因为它是匹配任何字符的元字符),后跟第二个捕获组 ((.*)) 以匹配所有字符接下来的其他字符

library(tidyr)
pivot_longer(df, cols = -c(id, trial), 
  names_to = c(".value", "category"), names_pattern = "^(.*)\\.(.*)")

-输出

# A tibble: 16 × 6
      id trial category  resp    rt color 
   <int> <int> <chr>    <int> <int> <chr> 
 1     1     1 car          1   500 blue  
 2     1     1 tree         3   765 green 
 3     1     1 car          3   534 green 
 4     1     1 tree         1   455 yellow
 5     1     2 car          2   553 yellow
 6     1     2 tree         2   794 red   
 7     1     2 car          3   577 black 
 8     1     2 tree         3   834 blue  
 9     2     1 car          1   598 green 
10     2     1 tree         1   756 red   
11     2     1 car          3   355 yellow
12     2     1 tree         3   457 black 
13     2     2 car          3   876 blue  
14     2     2 tree         1   767 yellow
15     2     2 car          2   466 black 
16     2     2 tree         1   439 green 

使用reshape,我们可能必须将variing作为与“idvar”的唯一索引分组在一起的唯一列的列表来传递

out <- reshape(transform(df, idnew = seq_along(id)), 
 idvar = "idnew", varying = list(c(3, 6), c(4,7), c(5,8)), direction="long",
         v.names = c('resp','rt', "color"), timevar = "category")

row.names(out) <- NULL
out
   id trial idnew category resp  rt  color
1   1     1     1        1    1 500   blue
2   1     1     2        1    3 534  green
3   1     2     3        1    2 553 yellow
4   1     2     4        1    3 577  black
5   2     1     5        1    1 598  green
6   2     1     6        1    3 355 yellow
7   2     2     7        1    3 876   blue
8   2     2     8        1    2 466  black
9   1     1     1        2    3 765  green
10  1     1     2        2    1 455 yellow
11  1     2     3        2    2 794    red
12  1     2     4        2    3 834   blue
13  2     1     5        2    1 756    red
14  2     1     6        2    3 457  black
15  2     2     7        2    1 767 yellow
16  2     2     8        2    1 439  green

数据

structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), trial = c(1L, 
1L, 2L, 2L, 1L, 1L, 2L, 2L), resp.car = c(1L, 3L, 2L, 3L, 1L, 
3L, 3L, 2L), rt.car = c(500L, 534L, 553L, 577L, 598L, 355L, 876L, 
466L), color.car = c("blue", "green", "yellow", "black", "green", 
"yellow", "blue", "black"), resp.tree = c(3L, 1L, 2L, 3L, 1L, 
3L, 1L, 1L), rt.tree = c(765L, 455L, 794L, 834L, 756L, 457L, 
767L, 439L), color.tree = c("green", "yellow", "red", "blue", 
"red", "black", "yellow", "green")), class = "data.frame", row.names = c(NA, 
-8L))

关于r - "number of items to replace is not a multiple of replacement length"从半长整形到长整形时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71427031/

相关文章:

r - 列名等于变量名

dataframe - 在 Julia 中,如果某些列不同,我如何组合多个数据框?

python - 如何将 Pandas 中的数据帧设为 "unconcatenate"?

r - 如何使用 r 传播数据

python - reshape python连接文本不规则数据框

带有for循环的R rep函数

r - 如何检查 R 中是否通过 "..."(省略号)传递了任何参数? Missing(...) 有效吗?

r - 在 R 中使用 saveHTML() {animation} 时增加 png 分辨率

scala - 如何根据作为映射的列值过滤 Spark 数据帧条目

python - reshape 时出错