python - 如何比较具有不同索引的两个数据帧并打印出重复的行？

我正在尝试通过各自的 UniqueID 列来比较两个数据帧。以下数据帧的代码如下所示。

# Define first dataframe
list1 = {'UniqueID': [13579, 24680, 54678, 1169780, 1195847, 23572],
        'Name': ['Joe', 'Pete', 'Jessica', 'Jackson', 'Griffin', 'Katie'],
        'Level': ['Beginner', 'Beginner', 'Intermediate', 'Advanced', 'Intermediate', 'Advanced']}
df1 = pd.DataFrame(list1, columns=['UniqueID','Name','Level'])

# Define second dataframe
list2 = {'UniqueID': (88922,13579, 24680, 54678, 1169780, 1195847, 23572, 54895, 478952, 45921),
        'Name': ('Zain','Joe', 'Pete', 'Jessica','Griffin','Jackson','Katie', 'Gaby', 'Haley', 'Caden'),
        'Level': ('Beginner', 'Intermediate', 'Intermediate', 'Advanced', 'Intermediate','Advanced','Advanced',
                  'Beginner', 'Intermediate', 'Novice')}
df2 = pd.DataFrame(list2, columns=['UniqueID','Name','Level'])

从上面可以看出，数据帧的索引长度不同。这就是导致我的下一个问题的原因。我查找重复项的过程如下。

# Define new column which displays Match iff the UniqueID of the first dataframe is equal to that of the second
df1['UniqueMatch'] = np.where(df1.UniqueID == df2.UniqueID, 'Match','Ignore') #Create

# Simplify the list to only display rows that are duplicates
df_match = df1[df1['UniqueMatch'] =='Match']

每当我尝试查找数据帧 UniqueID 彼此相等的位置时，都会遇到错误。我收到的错误是“ValueError:只能比较相同标签的系列对象”。根据我的理解，这意味着只有当两个数据帧的索引彼此相等时才能实现我正在使用的过程。我认为他们必须有办法解决这个问题，如果不是的话，你怎么能比较不同大小的数据帧。

最佳答案

根据您的评论更新:

After I find the duplicated, I would then like to iterate through each cell of level, and update df1 from the updated level listed in df2. For example, Joe goes from beginner to intermediate from df1 to df2. I would like to auto update those instances.

连接 2 个数据帧并保留最后一个值 (df2) 不重复:

df3 = pd.concat([df1, df2], ignore_index=True) \
        .drop_duplicates(['UniqueID', 'Name'], keep='last')

>>> df3
    UniqueID     Name         Level
3    1169780  Jackson      Advanced
4    1195847  Griffin  Intermediate
6      88922     Zain      Beginner
7      13579      Joe  Intermediate  # Joe is now Intermediate
8      24680     Pete  Intermediate  # Pete is now Intermediate
9      54678  Jessica      Advanced  # Jessica is now Advanced
10   1169780  Griffin  Intermediate
11   1195847  Jackson      Advanced
12     23572    Katie      Advanced
13     54895     Gaby      Beginner
14    478952    Haley  Intermediate
15     45921    Caden        Novice

旧答案

使用合并和查询查找重复项:

dup = pd.merge(df1, df2, on='UniqueID') \
        .query("(Name_x == Name_y) & (Level_x == Level_y)")

>>> dup
   UniqueID Name_x   Level_x Name_y   Level_y
5     23572  Katie  Advanced  Katie  Advanced

关于python - 如何比较具有不同索引的两个数据帧并打印出重复的行？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68200045/

python - 如何比较具有不同索引的两个数据帧并打印出重复的行？

上一篇：python - 发送 'websocket.send'后运行时错误: Unexpected ASGI message 'websocket.close' ,

下一篇：Pandas groupby 滚动删除索引列