给定 R 中的以下 data.table
:
set.seed(123666)
dt <- data.table(sample1 = sample(10),
sample2 = sample(10),
sample3 = sample(10),
sample4 = sample(10),
sample5 = sample(10),
sample6 = sample(10))
dt
sample1 sample2 sample3 sample4 sample5 sample6
1: 2 6 3 9 1 2
2: 10 9 10 3 7 5
3: 6 10 8 5 5 1
4: 8 2 9 8 6 6
5: 5 4 5 10 10 8
6: 7 1 7 4 4 10
7: 4 3 1 6 3 7
8: 1 5 6 1 2 3
9: 3 7 2 2 8 9
10: 9 8 4 7 9 4
假设前 3 个样本位于 group_a
中,后 3 个样本位于 group_b
中。现在我们要过滤满足每组中 3 个样本中至少有 2 个大于 2 的条件的行。在给定的情况下,我们可以使用以下代码来实现这一点:
group_a <- paste0('sample', seq(1,3))
group_b <- paste0('sample', seq(4,6))
dt[rowSums(dt[, ..group_a, with = FALSE] > 2) >= 2 & rowSums(dt[, ..group_b, with = FALSE] > 2) >= 2]
sample1 sample2 sample3 sample4 sample5 sample6
1: 10 9 10 3 7 5
2: 6 10 8 5 5 1
3: 8 2 9 8 6 6
4: 5 4 5 10 10 8
5: 7 1 7 4 4 10
6: 4 3 1 6 3 7
7: 3 7 2 2 8 9
8: 9 8 4 7 9 4
现在,让我们考虑一个 data.table
,其中每列仍然代表一个样本名称,但样本数量不确定。还有一个附加变量 group
描述样本分组:
group <- paste0('sample', seq(1,6))
group_id <- c(rep('group_a', 3), rep('group_b', 3))
names(group) <- group_id
group
group_a group_a group_a group_b group_b group_b
"sample1" "sample2" "sample3" "sample4" "sample5" "sample6"
如何使用 data.table
语法并使用最简洁的代码来完成此任务?
最佳答案
您可以拆分名称并迭代列表以对列进行子集化并检查条件,然后减少结果以对行进行子集化:
library(data.table)
dt[Reduce(`&`, lapply(split(group, names(group)), \(x) rowSums(dt[, .SD, .SDcols = x] > 1) >= 2 )), ]
sample1 sample2 sample3 sample4 sample5 sample6
1: 2 6 3 9 1 2
2: 10 9 10 3 7 5
3: 6 10 8 5 5 1
4: 8 2 9 8 6 6
5: 5 4 5 10 10 8
6: 7 1 7 4 4 10
7: 4 3 1 6 3 7
8: 1 5 6 1 2 3
9: 3 7 2 2 8 9
10: 9 8 4 7 9 4
所有行都符合您的示例条件,但如果我们将其更改为至少两个大于二的值,我们可以看到它有效:
dt[Reduce(`&`, lapply(split(group, names(group)), \(x) rowSums(dt[, .SD, .SDcols = x] > 2) >= 2 )), ]
sample1 sample2 sample3 sample4 sample5 sample6
1: 10 9 10 3 7 5
2: 6 10 8 5 5 1
3: 8 2 9 8 6 6
4: 5 4 5 10 10 8
5: 7 1 7 4 4 10
6: 4 3 1 6 3 7
7: 3 7 2 2 8 9
8: 9 8 4 7 9 4
@r2evans 提出了一种替代方案,可以在大量组的情况下提供更好的性能。
dt[rowSums(sapply(split(group, names(group)), \(x) rowSums(dt[, .SD, .SDcols = x] <= 2) >= 2 )) == 0, ]
关于r - 如何根据不确定数量的条件过滤data.table?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76967470/