python - 如何从 pandas 数据框中排除值？

我有两个数据框:

1) 客户 ID，性别 2) customer_id,...[其他字段]

第一个数据集是答案数据集(性别是答案)。因此，我想从第二个数据集中排除第一个数据集中的 customer_id(我们知道性别)并将其称为“火车”。其余记录应成为“测试”数据集。

最佳答案

我认为你需要boolean indexing和条件 isin ，反转 bool 系列由~:

df1 = pd.DataFrame({'customer_id':[1,2,3],
                   'gender':['m','f','m']})

print (df1)
   customer_id gender
0            1      m
1            2      f
2            3      m

df2 = pd.DataFrame({'customer_id':[1,7,5],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df2)
   B  C  D  E  F  customer_id
0  4  7  1  5  7            1
1  5  8  3  3  4            7
2  6  9  5  6  3            5

mask = df2.customer_id.isin(df1.customer_id)
print (mask)
0     True
1    False
2    False
Name: customer_id, dtype: bool

print (~mask)
0    False
1     True
2     True
Name: customer_id, dtype: bool

train = df2[mask]
print (train)
   B  C  D  E  F  customer_id
0  4  7  1  5  7            1

test  = df2[~mask]
print (test)
   B  C  D  E  F  customer_id
1  5  8  3  3  4            7
2  6  9  5  6  3            5

关于python - 如何从 pandas 数据框中排除值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39958646/

上一篇：python - 从 csv 读取解析时修改日期时间格式 - pandas

下一篇：Python通过函数调用读取输入行

相关文章：

python - 展平函数检索错误值

python - python 文件的奇怪 IDE 行为

python - 在 webapps 中查找按钮的 Xpath [Selenium]

python - tkinter Canvas 不更新颜色

python - python如何找到字符串的结尾？

Python 每执行 98 次就会面临开销？

python - 使用不同格式(csv、json、avro)将数据加载到 pd.DataFrame 的最快方法

python - 如何将从方法获得的一列结果添加到现有数据框？

python - 根据最大长度和最大总和排序

python - pandas 重新采样 - 5 分钟 block (不是每小时的第 5 分钟)