我有2个数据框:
at1 = data.frame(ID = c("A", "B", "C", "D", "E"), Sample1 = rnorm(5, 50000, 2500),
Sample2 = rnorm(5, 50000, 2500), Sample3 = rnorm(5, 50000, 2500),
row.names = "ID")
Sample1 Sample2 Sample3
A 52626.55 51924.51 50919.90
B 51430.51 49100.38 51005.92
C 50038.27 52254.73 50014.78
D 48644.46 53926.53 51590.05
E 46462.01 45097.48 50963.39
bt1 = data.frame(ID = c("A", "B", "C", "D", "E"), Sample1 = c(0,1,1,1,1),
Sample2 = c(0,0,0,1,0), Sample3 = c(1,0,1,1,0),
row.names = "ID")
Sample1 Sample2 Sample3
A 0 0 1
B 1 0 0
C 1 0 1
D 1 1 1
E 1 0 0
我想根据bt1(0或1)中相应单元格的值来过滤at1中的每个单元格,并将结果存储在新的数据帧ct1中。例如,如果bt1 [1,“Sample1”] = 1,则ct1 [1,“Sample1”] = at1 [1,“Sample1”]。如果bt1 [1,“Sample1”] = 0,则ct1 [1,“Sample1”] =0。我的原始数据帧有100列以上和30,000行以上。
我想知道是否有比编写if循环更简单的方法(例如,使用“apply”?)。
最佳答案
这是data.table
解决方案,第二个是简单解决方案
请注意,我已将ID
设置为data.frame
中的特定列,而不是row.names
出于意识形态和实际原因
data.table
没有行名library(data.table)
library(reshape2)
bt1 <- data.frame(ID = c("A", "B", "C", "D", "E"), Sample1 = c(0,1,1,1,1),
Sample2 = c(0,0,0,1,0), Sample3 = c(1,0,1,1,0))
at1 <- data.frame(ID = c("A", "B", "C", "D", "E"), Sample1 = rnorm(5, 50000, 2500),
Sample2 = rnorm(5, 50000, 2500), Sample3 = rnorm(5, 50000, 2500))
# place in long form
at_long <- data.table(melt(at1, id.var = 1))
bt_long <- data.table(melt(bt1, value.name = 'bt_value', id.var = 1))
# set keys for easy merging with data.tabl
setkeyv(at_long, c('ID','variable'))
setkeyv(bt_long, c('ID','variable'))
# merge
combined <- at_long[bt_long]
# set those where 'bt_value == 0' as 0
set(combined, which(combined[['bt_value']]==0), 'value',0)
# or (using the fact that the `bt` data is only 0 or 1
combined[value := value * bt_value]
# then reshape to wide format
dcast(combined, ID~variable, value.var = 'value')
## ID Sample1 Sample2 Sample3
## 1 A 0.00 0.00 50115.24
## 2 B 50173.16 0.00 0.00
## 3 C 48216.31 0.00 51952.30
## 4 D 52387.53 50889.95 44043.66
## 5 E 50982.56 0.00 0.00
第二种简单方法
如果您知道
bt1
和at1
(您的数据集)中的行顺序相同,则可以简单地将data.frames的适当组成部分相乘(*
在元素方面起作用)sample_cols <- paste0('Sample',1:3)
at1[,sample_cols] * bt1[,sample_cols]
## Sample1 Sample2 Sample3
## 1 0.00 0.00 50115.24
## 2 50173.16 0.00 0.00
## 3 48216.31 0.00 51952.30
## 4 52387.53 50889.95 44043.66
## 5 50982.56 0.00 0.00
您可以将
cbind
从ID
或at1
转换为bt1
,或者如果将ID
保留为row.names
,则row.names将保留。
关于r - 根据第二个数据帧中的值过滤数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11996856/