我有两个 csv 文件:
1.csv
id,noteId,text id2,idNote19,This is my old text 2 id5,idNote13,This is my old text 5 id1,idNote12,This is my old text 1 id3,idNote10,This is my old text 3 id4,idNote11,This is my old text 4
2.csv
id,noteId,text,other id3,idNote10,new text 3,On1 id2,idNote19,My new text 2,Pre8
像这样加载它们:
>>> df1 = pd.read_csv('1.csv', encoding='utf-8').set_index('id') >>> df2 = pd.read_csv('2.csv', encoding='utf-8').set_index('id') >>> >>> print df1 noteId text id id2 idNote19 This is my old text 2 id5 idNote13 This is my old text 5 id1 idNote12 This is my old text 1 id3 idNote10 This is my old text 3 id4 idNote11 This is my old text 4 >>> print df2 noteId text other id id3 idNote10 new text 3 On1 id2 idNote19 My new text 2 Pre8 id5 NaN My new text 2 Hl0 id22 idNote22 My new text 22 M1
我需要像这样合并两个 DataFrame(覆盖 df1 上 df2 上为空的值,添加 df1 上不存在的额外列和行):
noteId text other id id2 idNote19 My new text 2 Pre8 id5 NaN My new text 2 Hl0 id1 idNote12 This is my old text 1 NaN id3 idNote10 new text 3 On1 id4 idNote11 This is my old text 4 NaN id22 idNote22 My new text 22 M1
我的真实 DataFrames 还有其他列也应该合并,而不仅仅是 text
我尝试使用 merge
得到类似的东西:
>>> df1 = pd.read_csv('1.csv', encoding='utf-8') >>> df2 = pd.read_csv('2.csv', encoding='utf-8') >>> >>> print df1 id noteId text 0 id2 idNote19 This is my old text 2 1 id5 idNote13 This is my old text 5 2 id1 idNote12 This is my old text 1 3 id3 idNote10 This is my old text 3 4 id4 idNote11 This is my old text 4 >>> print df2 id noteId text 0 id3 idNote10 new text 3 1 id2 idNote19 My new text 2 >>> >>> print merge(df1, df2, how='left', on=['id']) id noteId_x text_x noteId_y text_y 0 id2 idNote19 This is my old text 2 idNote19 My new text 2 1 id5 idNote13 This is my old text 5 NaN NaN 2 id1 idNote12 This is my old text 1 NaN NaN 3 id3 idNote10 This is my old text 3 idNote10 new text 3 4 id4 idNote11 This is my old text 4 NaN NaN >>>
但这不是我需要的。我不知道我是否走在正确的道路上,是否应该合并带后缀的列,或者是否有更好的方法来执行此操作。
谢谢!
更新: 在 df1 上添加了在 df2 上为空的值,在 df2 上添加了在“合并”之后应该出现在 df1 上的额外列以及应该附加在 df1 上的行
--
解决方案
根据@U2EF1(谢谢!)的评论,我找到了解决方案:
df1.fillna(value='None', inplace=True) df2.fillna(value='None', inplace=True) concat([df1, df2]).groupby('id').last().fillna(value='None')
在我的例子中,定义一个默认的“空”值非常重要,这就是 fillna
的原因。
最佳答案
通常你可以用合适的索引来解决这个问题:
df1.set_index(['id', 'noteId'], inplace=True)
df1.update(df2)
(如果您之后不想要那个索引,只需 df1.reset_index(inplace=True)
)
关于python - 合并两个具有相同列的 DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24746628/