r - 找到重叠区域并提取各自的值

标签 r bioinformatics bioconductor

如何找到重叠坐标并提取重叠区域各自的 seg.mean 值?

data1
      Rl       pValue     chr  start    end     CNA
      2        2.594433   6 129740000 129780000 gain
      2        3.941399   6 130080000 130380000 gain
      1        1.992114  10  80900000  81100000 gain
      1        7.175750  16  44780000  44920000 gain

数据2

ID     chrom   loc.start   loc.end   num.mark  seg.mean
8410     6     129750000  129760000      8430   0.0039
8410     10    80907000   81000000        5   -1.7738
8410     16    44790000   44910000       12    0.0110

数据输出

  Rl       pValue     chr  start    end        CNA    seg.mean
  2        2.594433   6 129750000   129760000  gain   0.0039
  1        1.992114  10  80907000   81000000   gain   -1.7738  
  1        7.175750  16  44790000   44910000   gain   0.0110

最佳答案

正如@Roland正确建议的那样,这是一个可能的data.table::foverlaps解决方案

library(data.table)
setDT(data1) ; setDT(data2) # Convert data sets to data.table objects
setnames(data2, c("loc.start", "loc.end"), c("start", "end")) # Rename columns so they will match in both sets
setkey(data2, start, end) # key the smaller data so foverlaps will work
foverlaps(data1, data2, nomatch = 0L)[, 1:5 := NULL][] # run foverlaps and remove the unnecessary columns
#    seg.mean Rl   pValue chr   i.start     i.end  CNA
# 1:   0.0039  2 2.594433   6 129740000 129780000 gain
# 2:  -1.7738  1 1.992114  10  80900000  81100000 gain
# 3:   0.0110  1 7.175750  16  44780000  44920000 gain

或者

indx <- foverlaps(data1, data2, nomatch = 0L, which = TRUE) # run foverlaps in order to find indexes using `which`
data1[indx$xid][, seg.mean := data2[indx$yid]$seg.mean][] # update matches
#    Rl   pValue chr     start       end  CNA seg.mean
# 1:  2 2.594433   6 129740000 129780000 gain   0.0039
# 2:  1 1.992114  10  80900000  81100000 gain  -1.7738
# 3:  1 7.175750  16  44780000  44920000 gain   0.0110

关于r - 找到重叠区域并提取各自的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29648127/

相关文章:

r - 在 GenomicRanges 对象中合并具有相同属性的相邻容器

r - 构建循环系统发育树

r - 跟踪网络中的连接?

R:绘制 MASS polr 序数模型的预测

r - 计算因子内的单词数

r - 读入 R 时,将缺失值 (-9) 转换为 Plink PED 文件中的 NA

python - 罗莎琳德 "Mendel' s 第一定律“IPRB

r - 错误 : Bioconductor version '3.13' requires R version '4.1' (R version 4. 0.2)

r - 具有多个绘图的动态选项卡 r markdown

r - validityMethod(as(object, superClass)) 错误 : object 'Matrix_validate' not found