我有两个 csv 文件:

1.csv

id,noteId,text
id2,idNote19,This is my old text 2
id5,idNote13,This is my old text 5
id1,idNote12,This is my old text 1
id3,idNote10,This is my old text 3
id4,idNote11,This is my old text 4

2.csv

id,noteId,text,other
id3,idNote10,new text 3,On1
id2,idNote19,My new text 2,Pre8

像这样加载它们:

>>> df1 = pd.read_csv('1.csv', encoding='utf-8').set_index('id')
>>> df2 = pd.read_csv('2.csv', encoding='utf-8').set_index('id')
>>>
>>> print df1
       noteId                   text
id
id2  idNote19  This is my old text 2
id5  idNote13  This is my old text 5
id1  idNote12  This is my old text 1
id3  idNote10  This is my old text 3
id4  idNote11  This is my old text 4
>>> print df2
        noteId            text other
id
id3   idNote10      new text 3   On1
id2   idNote19   My new text 2  Pre8
id5        NaN   My new text 2   Hl0
id22  idNote22  My new text 22    M1

我需要像这样合并两个 DataFrame(覆盖 df1 上 df2 上为空的值，添加 df1 上不存在的额外列和行):

        noteId                   text other
id
id2   idNote19          My new text 2  Pre8
id5        NaN          My new text 2   Hl0
id1   idNote12  This is my old text 1   NaN
id3   idNote10             new text 3   On1
id4   idNote11  This is my old text 4   NaN
id22  idNote22         My new text 22    M1

我的真实 DataFrames 还有其他列也应该合并，而不仅仅是 `text`

我尝试使用 merge 得到类似的东西:

>>> df1 = pd.read_csv('1.csv', encoding='utf-8')
>>> df2 = pd.read_csv('2.csv', encoding='utf-8')
>>>
>>> print df1
    id    noteId                   text
0  id2  idNote19  This is my old text 2
1  id5  idNote13  This is my old text 5
2  id1  idNote12  This is my old text 1
3  id3  idNote10  This is my old text 3
4  id4  idNote11  This is my old text 4
>>> print df2
    id    noteId           text
0  id3  idNote10     new text 3
1  id2  idNote19  My new text 2
>>>
>>> print merge(df1, df2, how='left', on=['id'])
    id  noteId_x                 text_x  noteId_y         text_y
0  id2  idNote19  This is my old text 2  idNote19  My new text 2
1  id5  idNote13  This is my old text 5       NaN            NaN
2  id1  idNote12  This is my old text 1       NaN            NaN
3  id3  idNote10  This is my old text 3  idNote10     new text 3
4  id4  idNote11  This is my old text 4       NaN            NaN
>>>

但这不是我需要的。我不知道我是否走在正确的道路上，是否应该合并带后缀的列，或者是否有更好的方法来执行此操作。

谢谢!

更新: 在 df1 上添加了在 df2 上为空的值，在 df2 上添加了在“合并”之后应该出现在 df1 上的额外列以及应该附加在 df1 上的行

解决方案

根据@U2EF1(谢谢!)的评论，我找到了解决方案:

df1.fillna(value='None', inplace=True)
df2.fillna(value='None', inplace=True)

concat([df1, df2]).groupby('id').last().fillna(value='None')

在我的例子中，定义一个默认的“空”值非常重要，这就是 fillna 的原因。

最佳答案

通常你可以用合适的索引来解决这个问题:

df1.set_index(['id', 'noteId'], inplace=True)
df1.update(df2)

(如果您之后不想要那个索引，只需 df1.reset_index(inplace=True))

关于python - 合并两个具有相同列的 DataFrame，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24746628/

python - 合并两个具有相同列的 DataFrame

我的真实 DataFrames 还有其他列也应该合并，而不仅仅是 `text`

解决方案

上一篇：python - 从带有行和列标题的 csv 文件中读取 networkx 图

下一篇：python - 以 r 为前缀的字符串的拆分行为

python - 合并两个具有相同列的 DataFrame

我的真实 DataFrames 还有其他列也应该合并，而不仅仅是 text

解决方案

上一篇：python - 从带有行和列标题的 csv 文件中读取 networkx 图

下一篇：python - 以 r 为前缀的字符串的拆分行为

我的真实 DataFrames 还有其他列也应该合并，而不仅仅是 `text`