我有一个很长的数据帧,其中将近56列中的1列具有许多不同的值,而其余数据根据第一列ID进行更改。这是一个例子
ID chrom left right ref_seq var_type zygosity transcript_name
0 chr1 1590327 1590328 a SNP Hom NM_033486
0 chr1 1590327 1590328 a SNP Hom NM_033487
0 chr1 1590327 1590328 a SNP Hom NM_033488
0 chr1 1590327 1590328 a SNP Hom NM_033489
0 chr1 1590327 1590328 a SNP Hom NM_033492
0 chr1 1590327 1590328 a SNP Hom NM_033493
1 chr1 1590526 1590527 g SNP Hom NM_033486
1 chr1 1590526 1590527 g SNP Hom NM_033487
1 chr1 1590526 1590527 g SNP Hom NM_033488
1 chr1 1590526 1590527 g SNP Hom NM_033489
1 chr1 1590526 1590527 g SNP Hom NM_033492
理想的结果是将所有重复值连接到一个逗号分隔的字符串中,但仅维护一次ID,就像这样
ID chrom left right ref_seq var_type zygosity transcript_name
0 chr1 1590327 1590328 a SNP Hom NM_033486NM_033487,NM_033488,NM_033489,NM_033492,NM_033493
1 chr1 1590526 1590527 g SNP Hom NM_033486,NM_033487,NM_033488,NM_033489,NM_033492
我已经搜索了类似的问题,但the following solutions到目前为止还没有奏效。相反,他们返回了一个零行数据框。
最佳答案
data.table
的一种方法:
library(data.table)
#setDT will convert the data.frame into data.table
#.SD gives you access to the groups of data.tables created by the 'by' argument
setDT(df)[, list(transcript_name = paste(transcript_name, collapse = ', ')),
by = c('ID', 'chrom', 'left', 'right', 'ref_seq', 'var_type', 'zygosity')]
# ID chrom left right ref_seq var_type zygosity transcript_name
#1: 0 chr1 1590327 1590328 a SNP Hom NM_033486, NM_033487, NM_033488, NM_033489, NM_033492, NM_033493
#2: 1 chr1 1590526 1590527 g SNP Hom NM_033486, NM_033487, NM_033488, NM_033489, NM_033492
数据
df <- read.table(header = TRUE, text = 'ID chrom left right ref_seq var_type zygosity transcript_name
0 chr1 1590327 1590328 a SNP Hom NM_033486
0 chr1 1590327 1590328 a SNP Hom NM_033487
0 chr1 1590327 1590328 a SNP Hom NM_033488
0 chr1 1590327 1590328 a SNP Hom NM_033489
0 chr1 1590327 1590328 a SNP Hom NM_033492
0 chr1 1590327 1590328 a SNP Hom NM_033493
1 chr1 1590526 1590527 g SNP Hom NM_033486
1 chr1 1590526 1590527 g SNP Hom NM_033487
1 chr1 1590526 1590527 g SNP Hom NM_033488
1 chr1 1590526 1590527 g SNP Hom NM_033489
1 chr1 1590526 1590527 g SNP Hom NM_033492')
关于r - 在R中串联重复的数据框值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38268378/