我想转换下面的数据框以将重复的数据连接到一行中。例如:
data_dict={'FromTo_U': {0: 'L->R', 1: 'L->R', 2: 'S->I'},
'GeneName': {0: 'EGFR', 1: 'EGFR', 2: 'EGFR'},
'MutationAA_C': {0: 'p.L858R', 1: 'p.L858R', 2: 'p.S768I'},
'MutationDescription': {0: 'Substitution - Missense',
1: 'Substitution - Missense',
2: 'Substitution - Missense'},
'PubMed': {0: '22523351', 1: '23915069', 2: '26862733'},
'VariantID': {0: 'COSM12979', 1: 'COSM12979', 2: 'COSM18486'},
'VariantPos_U': {0: '858', 1: '858', 2: '768'},
'VariantSource': {0: 'COSMIC', 1: 'COSMIC', 2: 'COSMIC'}}
df1=pd.DataFrame(data_dict)
转换后的数据框应该是
data_dict_t={'FromTo_U': {0: 'L->R', 2: 'S->I'},
'GeneName': {0: 'EGFR', 2: 'EGFR'},
'MutationAA_C': {0: 'p.L858R', 2: 'p.S768I'},
'MutationDescription': {0: 'Substitution - Missense',2: 'Substitution - Missense'},
'PubMed': {0: '22523351,23915069', 2: '26862733'},
'VariantID': {0: 'COSM12979', 2: 'COSM18486'},
'VariantPos_U': {0: '858', 2: '768'},
'VariantSource': {0: 'COSMIC', 2: 'COSMIC'}}
仅当 PubMed ID 不同且其余列具有相同数据时,我才想合并 df1 的两行。提前致谢!
最佳答案
使用 groupby
+ agg
和 str.join
作为 aggfunc。
c = df1.columns.difference(['PubMed']).tolist()
df1.groupby(c, as_index=False).PubMed.agg(','.join)
FromTo_U GeneName MutationAA_C MutationDescription VariantID \
0 L->R EGFR p.L858R Substitution - Missense COSM12979
1 S->I EGFR p.S768I Substitution - Missense COSM18486
VariantPos_U VariantSource PubMed
0 858 COSMIC 22523351,23915069
1 768 COSMIC 26862733
关于python - 在Python中使用重复数据转换Dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48483330/