这是我需要采用宽格式的表格:
V1 V2 V3 V4
1 A0 numeric string
1 A1 . .
1 A2 . .
1 A3 . .
1 A4 . .
1 A5 . .
1 A6 . .
1 A7 . .
2 A0 . .
2 A1 . .
... ... . .
我一直在尝试这样的事情:
reshape(variable.name, timevar = "V2", idvar = "V1", direction = "wide")
这导致了以下结果,这似乎是我想要的:
V1 V3.A0 V4.A0 V3.A1 ...
1 Numeric String Numeric ...
2 ... ... ... ...
但我收到一条警告消息:
Warning message:
In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, :
multiple rows match for V2 = blah: first taken
为什么会发生此警告,我该如何规避?我不想忽略它,因为我必须对几个数据文件做同样的事情。谢谢!非常感谢帮助。
最佳答案
正如一些人指出的那样,您需要决定您想用额外的值(value)做什么。 dcast
允许你指定一个聚合函数,和 reshape
本质上是一样的具有方向广泛但能够指定当您有多个值时要做什么。这是一个示例,其中基本上每个组合都有重复,我们将每个组合的完整向量显示为一个已解析的字符串(例如,1:2 显示为 c(1, 2))。
library(reshape2)
# Make up data
df <- data.frame(
V1=rep(1:3, 14),
V2=rep(paste0("A", 0:6), 6),
V3=sample(1:100, 42),
V4=paste0(sample(letters, 42, replace=TRUE), sample(letters, 42, replace=TRUE))
)
# Need to melt V3 and V4 together first because
# dcast does not allow multiple value variables,
# unfortunately, this allso coerces V1 to character
df.melt <- melt(df, id.vars=c("V1", "V2"))
# Function to handle multiple items for one V1 - V2
# pair. In this case we just deparse the vectors,
# but if you wanted, you could convert the numerics
# back to integers, or do whatever you want (e.g.
# paste if character, median if numeric).
my_func <- function(x) {
paste0(deparse(x), collapse="")
}
# Now convert to wide format with dcast
dcast(
df.melt,
V1 ~ V2 + variable,
value.var="value",
fun.aggregate=my_func
)
这导致以下结果:
V1 A0_V3 A0_V4 A1_V3 A1_V4
1 1 c("86", "93") c("yf", "pr") c("5", "76") c("py", "aj")
2 2 c("53", "71") c("as", "mi") c("42", "12") c("ho", "la")
3 3 c("69", "16") c("lm", "un") c("66", "100") c("xk", "px")
A2_V3 A2_V4 A3_V3 A3_V4 A4_V3
1 c("43", "67") c("xh", "bk") c("79", "94") c("ix", "cx") c("51", "50")
2 c("14", "68") c("nq", "sr") c("25", "19") c("dw", "ay") c("28", "35")
3 c("21", "24") c("wu", "il") c("39", "88") c("vz", "yw") c("74", "65")
A4_V4 A5_V3 A5_V4 A6_V3 A6_V4
1 c("hv", "uw") c("85", "34") c("cn", "ql") c("73", "87") c("px", "vy")
2 c("qb", "dc") c("2", "72") c("ci", "du") c("81", "49") c("sd", "rx")
3 c("jk", "fv") c("6", "90") c("sr", "yr") c("62", "97") c("rg", "dv")
完美的解决方案是
reshape
的组合和 dcast
.不幸的是dcast
(AFAIK) 不允许多个 Z 列,而 reshape
确实(因此需要 melt
步骤和对字符的强制),而 reshape
不允许聚合函数 (AFAIK)。您可以通过运行
dcast
来解决此问题。两次,一次是 V3
,曾经与 V4
,然后合并结果,或者在聚合函数中添加更多智能。
关于r - 为什么在 R 中 reshape 时会返回警告?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20795290/