python - 根据另一个 Pandas 数据框有条件地提取 Pandas 行

标签 python pandas indexing dataframe conditional-statements

我有两个数据框:

df1:

col1    col2
1       2
1       3
2       4

df2:

col1
2
3

我想提取 df1 中的所有行，其中 df1 的 col2 不在 df2 的 col1。所以在这种情况下它将是:

col1    col2
2       4

我第一次尝试:

df1[df1['col2'] not in df2['col1']]

但它返回了:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

然后我尝试了:

df1[df1['col2'] not in df2['col1'].tolist]

但它返回了:

TypeError: argument of type 'instancemethod' is not iterable

最佳答案

您可以使用isin使用 ~ 来反转 bool 掩码:

print (df1['col2'].isin(df2['col1']))
0     True
1     True
2    False
Name: col2, dtype: bool

print (~df1['col2'].isin(df2['col1']))
0    False
1    False
2     True
Name: col2, dtype: bool

print (df1[~df1['col2'].isin(df2['col1'])])
   col1  col2
2     2     4

时间:

In [8]: %timeit (df1.query('col2 not in @df2.col1'))
1000 loops, best of 3: 1.57 ms per loop

In [9]: %timeit (df1[~df1['col2'].isin(df2['col1'])])
1000 loops, best of 3: 466 µs per loop

关于python - 根据另一个 Pandas 数据框有条件地提取 Pandas 行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39477328/

上一篇：python - 当列数事先未知时如何访问 Pandas 数据框列

下一篇：python - 打印json文件的key，其值是通过输入选择的

pandas - 在 Pandas 中为 to_csv() 设置 File_Path

database - SOLR - 索引数据库，配置

oracle - 在 B 树索引的前沿按范围查询，PostgreSQL

python - 当 pandas 是导入时，Cx_freeze TypeError 只能使用 numpy 依赖项连接列表(不是 "NoneType")来列出

mysql - 是否可以让 MySQL 使用 1 DESC，2 ASC 的 ORDER 索引？

python - GridSearchCV 分数结果是否应该等于使用相同输入的 cross_validate 分数？

python - 无法以非 root 用户身份激活 virtualenv

Python:转换 pandas 数据框，使索引和列 id 成为行的元素

python - 如果一行中有多个关键字，是否有可能分隔关键字