我正在使用嵌套 R 循环从大型三维数组创建一个新的数据帧。我尝试运行代码,但工作在大约 48 小时后失败。执行嵌套循环的当前代码如下所示。我真的很想对循环进行矢量化以使其更加高效,但不确定如何或是否可以在多维数组上实现这一点。非常感谢任何有关如何提高工作效率的建议。作为引用,my_array 是我的数组的一小部分,有两个切片。数组中的数据是概率值,循环查找特定鼠标和标记处具有最大概率值的创建者。最终输出是一个数据框,其中小鼠名称为行,标记为列,创始人为数据。示例代码如下。
founder_names <- rownames(model.probs[1,,])
mice_names <- rownames(model.probs[,1,])
marker_names <- colnames(model.probs[1,,])
# Create empty data frame
probs.df <- data.frame()
## Instructions for nested loop
for(marker in marker_names) {
for(mouse in mice_names){
probs.df[mouse, marker] = names(which.max(my_array[mouse,,marker]))
}
}
来自dput(my_array)
的示例数据:
structure(c(1.86334813592728e-08, 2.02070595143633e-10, 2.1558577630356e-08,
2.1558577630356e-08, 2.04388477395613e-10, 2.04388477395593e-10,
2.04388477395613e-10, 2.031707697502e-10, 2.04388477395593e-10,
2.0317076975018e-10, 0.999999939150967, 1.19701878645413e-10,
2.94522644878888e-08, 2.94522644878888e-08, 1.20988752710968e-10,
1.20988752710968e-10, 1.20988752710968e-10, 1.20313358746148e-10,
1.20988752710968e-10, 1.20313358746148e-10, 2.41632503275453e-12,
2.53195197455819e-08, 2.89630046322804e-12, 2.89630046322804e-12,
2.46380958026699e-08, 2.46380958026699e-08, 2.46380958026724e-08,
2.44127737551662e-08, 2.46380958026699e-08, 2.44127737551638e-08,
1.08633475857376e-12, 0.999999925628544, 1.30167423493078e-12,
1.30167423493078e-12, 2.49445205965502e-08, 2.49445205965502e-08,
2.49445205965527e-08, 2.47171256696929e-08, 2.49445205965502e-08,
2.47171256696904e-08, 1.84322523200704e-08, 6.29795050516582e-11,
2.13175870442828e-08, 2.13175870442849e-08, 6.40871335417646e-11,
6.40871335417646e-11, 6.40871335417646e-11, 6.35035199711943e-11,
6.40871335417646e-11, 6.3503519971188e-11, 0.999999939821495,
2.75475678555388e-11, 2.91247770927105e-08, 2.91247770927134e-08,
2.80325925630150e-11, 2.80325925630123e-11, 2.80325925630150e-11,
2.77773153893157e-11, 2.80325925630123e-11, 2.77773153893129e-11,
6.56947829427486e-13, 2.50477863870057e-08, 7.89281798086196e-13,
7.89281798086277e-13, 2.43639980473783e-08, 2.43639980473783e-08,
2.43639980473783e-08, 2.41399147887054e-08, 2.43639980473783e-08,
2.4139914788703e-08, 1.7742262257411e-13, 0.999999926913761,
2.13166988220277e-13, 2.13166988220277e-13, 2.46686866862984e-08,
2.46686866862984e-08, 2.46686866863009e-08, 2.44425383948499e-08,
2.46686866862984e-08, 2.44425383948499e-08), .Dim = c(10L, 4L,
2L), .Dimnames = list(c("B6HER2", "X100", "X1002", "X1005", "X1006",
"X1007", "X1010", "X1011", "X1012", "X1014"), c("AI", "BI", "CI",
"DI"), c("UNC6", "JAX00000010")))
最佳答案
the loop finds the founder with max probability value at a specific mouse&marker.
我也许会做...
# assign the dim names directly to the array:
names(dimnames(my_array)) <- c("founder", "mouse", "marker")
# enumerate combos with expand.grid(), not data.frame()
resdf = expand.grid(mouse = dimnames(my_array)$mouse, marker = dimnames(my_array)$marker)
# take maxes within slices
resdf$founder_max = dimnames(my_array)$founder[
c(apply(my_array, c("mouse", "marker"), which.max))
]
mouse marker founder_max
1 AI UNC6 X1002
2 BI UNC6 B6HER2
3 CI UNC6 X100
4 DI UNC6 X100
5 AI JAX00000010 X1005
6 BI JAX00000010 B6HER2
7 CI JAX00000010 X100
8 DI JAX00000010 X100
或者,使用 reshape2:
library(reshape2)
resdf2 = melt(apply(my_array, c("mouse", "marker"), function(x)
dimnames(my_array)$founder[which.max(x)]
))
mouse marker value
1 AI UNC6 X1002
2 BI UNC6 B6HER2
3 CI UNC6 X100
4 DI UNC6 X100
5 AI JAX00000010 X1005
6 BI JAX00000010 B6HER2
7 CI JAX00000010 X100
8 DI JAX00000010 X100
如果您仍然遇到速度问题,可以使用其他方法来apply
,例如,matrixStats 包,或者您可以使用 Rcpp 编写自己的自定义快速代码。可能还有一些方法可以处理您的问题,以使用 base 中的快速 max.col
函数......尽管我没有立即看到它。
The final output is a dataframe with mice names as rows, markers with columns, and the founder as the data.
如果您确实想要该格式,可以在apply
之后停止:
apply(my_array, c("mouse", "marker"), function(x)
dimnames(my_array)$founder[which.max(x)]
)
marker
mouse UNC6 JAX00000010
AI "X1002" "X1005"
BI "B6HER2" "B6HER2"
CI "X100" "X100"
DI "X100" "X100"
这是一个矩阵,而不是数据框。我认为它不应该转换为 data.frame(除非像 melt
那样),但如果您以某种方式需要它,您可以将其包装在 as.data.frame
中>.
关于r - 在 3 维数组上向量化 R 中的嵌套循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51753398/