python - pandas drop_duplicates 使用比较函数

是否有可能使用 pandas.drop_duplicates使用比较运算符比较特定列中的两个对象以识别重复项？如果不能，还有什么选择？

这是一个可以使用它的例子:

我有一个 pandas DataFrame，它在特定列中有列表作为值，我想根据列 A

删除重复项

import pandas as pd

df = pd.DataFrame( {'A': [[1,2],[2,3],[1,2]]} )
print df

给我

        A
0  [1, 2]
1  [2, 3]
2  [1, 2]

df.drop_duplicates( 'A' )

给我一个TypeError

[...]
TypeError: type object argument after * must be a sequence, not itertools.imap

然而，我想要的结果是

        A
0  [1, 2]
1  [2, 3]

我的比较函数会在这里:

def cmp(x,y):
    return x==y

但原则上它可以是其他东西，例如，

def cmp(x,y):
    return x==y and len(x)>1

如何基于比较函数有效地去除重复项？

更重要的是，如果我有更多的列可以分别使用不同的比较函数进行比较，我该怎么办？

最佳答案

选项 1

df[~pd.DataFrame(df.A.values.tolist()).duplicated()]

选项 2

df[~df.A.apply(pd.Series).duplicated()]

关于python - pandas drop_duplicates 使用比较函数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39506438/

相关文章：

Python pandas 将行插入 DF 并将缺失值作为 Nan 的快速方法