我在 R 中有一个 1246 x 60,660 数据帧。数据片段:
gene1 gene2 gene3
sample1 1615.7529 41.932474 697.9728
sample2 663.2001 8.602831 1198.1398
sample3 2406.1532 12.622443 1033.4625
sample4 836.3808 60.144235 259.3720
sample5 1217.8192 22.775497 695.9924
sample6 865.0344 15.350298 683.5397
sample7 935.3658 20.380676 540.6242
sample8 667.3883 56.939874 1056.6981
对于每个基因,我希望将每个样本的值放入以下组之一:
无 = 0
超低 = 1-4
低 = 5 - 100
中 = 101 - 1000
最高 = 1000 及以上
最终产品将是另一个矩阵/数据框,如下所示:
gene1 gene2 gene3
none 0 0 0
ultra low 0 0 0
low 0 8 0
medium 5 0 5
high 3 0 3
我怎样才能做到这一点?经过一番搜索后,我想我可能最终会使用 count
或者 aggregate
?但我不确定如何将其应用到每一列。我见过的大多数示例都只计算一列。
最佳答案
使用cut
可能会更容易 - 即通过使用循环来指定每列中的
并使用 breaks
和相应的labels
lapplytable
sapply(df1, \(x) table(cut(x, breaks = c(0, 1, 5, 101, 1001, Inf),
labels = c("none", "ultra low", "low", "medium", "high"))))
-输出
gene1 gene2 gene3
none 0 0 0
ultra low 0 0 0
low 0 8 0
medium 5 0 5
high 3 0 3
或者正如@ZheyuanLi提到的,制表
可能会更快
lbls <- c("none", "ultra low", "low", "medium", "high")
out <- sapply(df1, \(x) tabulate(cut(x, breaks = c(0, 1, 5, 101, 1001, Inf),
labels = lbls), nbins = length(lbls)))
row.names(out) <- lbls
-输出
> out
gene1 gene2 gene3
none 0 0 0
ultra low 0 0 0
low 0 8 0
medium 5 0 5
high 3 0 3
数据
df1 <- structure(list(gene1 = c(1615.7529, 663.2001, 2406.1532, 836.3808,
1217.8192, 865.0344, 935.3658, 667.3883), gene2 = c(41.932474,
8.602831, 12.622443, 60.144235, 22.775497, 15.350298, 20.380676,
56.939874), gene3 = c(697.9728, 1198.1398, 1033.4625, 259.372,
695.9924, 683.5397, 540.6242, 1056.6981)), class = "data.frame",
row.names = c("sample1",
"sample2", "sample3", "sample4", "sample5", "sample6", "sample7",
"sample8"))
关于r - 根据 R 数据帧的每一列中的值所在的范围对它们进行计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73055861/