我有一个大型数据集,并且希望在数据集中插入一个包含二进制值(0 和 1)的新列(如果它满足以下条件)。
如果列包含 df1$seg.mean >= 0.5
等于df1$id == gain
和df1$seg.mean <= -0.5
等于df1$id == loss
,在 df1$Occurance
中插入 1 。
对于那些不满足此条件的行,分配 df1$Occurance == 0
df1 <-
Chr start end num.mark seg.mean id
1 68580000 68640000 8430 0.7 gain
1 115900000 116260000 8430 0.0039 loss
1 173500000 173680000 5 -1.7738 loss
1 173500000 173680000 12 0.011 loss
1 173840000 174010000 6 -1.6121 loss
期望的输出
Chr start end num.mark seg.mean id Occurance
1 68580000 68640000 8430 0.7 gain 1
1 115900000 116260000 8430 0.0039 loss 0
1 173500000 173680000 5 -1.7738 loss 1
1 173500000 173680000 12 0.011 loss 0
1 173840000 174010000 6 -1.6121 loss 1
最佳答案
尝试使用ifelse
df1$Occurance <- ifelse((df1$seg.mean >= 0.5 & df1$id == "gain") |
(df1$seg.mean <= -0.5 & df1$id == "loss"), 1, 0)
编辑:避免 ifelse
并使用 within
,这样就不必一直编写 df1
transform(df1, Occurance = as.numeric((seg.mean >= 0.5 & id == "gain") |
(seg.mean <= -0.5 & id == "loss")))
评论:如果您也接受 TRUE/FALSE 代替 1/0,您可以跳过 as.numeric
编辑#2:如果您想要多个结果,例如 -1,0,1,您可以执行以下操作
df1$Occurance = 0
within(df1, {Occurance[seg.mean >= 0.5 & id == "gain"] <- 1;
Occurance[seg.mean <= -0.5 & id == "loss"] <- -1})
结果
Chr start end num.mark seg.mean id Occurance
1 1 68580000 68640000 8430 0.7000 gain 1
2 1 115900000 116260000 8430 0.0039 loss 0
3 1 173500000 173680000 5 -1.7738 loss -1
4 1 173500000 173680000 12 0.0110 loss 0
5 1 173840000 174010000 6 -1.6121 loss -1
关于r - 如果满足语句,如何在数据集中插入带有值的新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29674930/