我正在尝试通过各自的 UniqueID 列来比较两个数据帧。以下数据帧的代码如下所示。
# Define first dataframe
list1 = {'UniqueID': [13579, 24680, 54678, 1169780, 1195847, 23572],
'Name': ['Joe', 'Pete', 'Jessica', 'Jackson', 'Griffin', 'Katie'],
'Level': ['Beginner', 'Beginner', 'Intermediate', 'Advanced', 'Intermediate', 'Advanced']}
df1 = pd.DataFrame(list1, columns=['UniqueID','Name','Level'])
# Define second dataframe
list2 = {'UniqueID': (88922,13579, 24680, 54678, 1169780, 1195847, 23572, 54895, 478952, 45921),
'Name': ('Zain','Joe', 'Pete', 'Jessica','Griffin','Jackson','Katie', 'Gaby', 'Haley', 'Caden'),
'Level': ('Beginner', 'Intermediate', 'Intermediate', 'Advanced', 'Intermediate','Advanced','Advanced',
'Beginner', 'Intermediate', 'Novice')}
df2 = pd.DataFrame(list2, columns=['UniqueID','Name','Level'])
从上面可以看出,数据帧的索引长度不同。这就是导致我的下一个问题的原因。我查找重复项的过程如下。
# Define new column which displays Match iff the UniqueID of the first dataframe is equal to that of the second
df1['UniqueMatch'] = np.where(df1.UniqueID == df2.UniqueID, 'Match','Ignore') #Create
# Simplify the list to only display rows that are duplicates
df_match = df1[df1['UniqueMatch'] =='Match']
每当我尝试查找数据帧 UniqueID 彼此相等的位置时,都会遇到错误。我收到的错误是“ValueError:只能比较相同标签的系列对象”。根据我的理解,这意味着只有当两个数据帧的索引彼此相等时才能实现我正在使用的过程。我认为他们必须有办法解决这个问题,如果不是的话,你怎么能比较不同大小的数据帧。
最佳答案
根据您的评论更新:
After I find the duplicated, I would then like to iterate through each cell of level, and update df1 from the updated level listed in df2. For example, Joe goes from beginner to intermediate from df1 to df2. I would like to auto update those instances.
连接 2 个数据帧并保留最后一个值 (df2) 不重复:
df3 = pd.concat([df1, df2], ignore_index=True) \
.drop_duplicates(['UniqueID', 'Name'], keep='last')
>>> df3
UniqueID Name Level
3 1169780 Jackson Advanced
4 1195847 Griffin Intermediate
6 88922 Zain Beginner
7 13579 Joe Intermediate # Joe is now Intermediate
8 24680 Pete Intermediate # Pete is now Intermediate
9 54678 Jessica Advanced # Jessica is now Advanced
10 1169780 Griffin Intermediate
11 1195847 Jackson Advanced
12 23572 Katie Advanced
13 54895 Gaby Beginner
14 478952 Haley Intermediate
15 45921 Caden Novice
旧答案
使用合并
和查询
查找重复项:
dup = pd.merge(df1, df2, on='UniqueID') \
.query("(Name_x == Name_y) & (Level_x == Level_y)")
>>> dup
UniqueID Name_x Level_x Name_y Level_y
5 23572 Katie Advanced Katie Advanced
关于python - 如何比较具有不同索引的两个数据帧并打印出重复的行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68200045/