python - Pandas 对字符串的列过滤给出了意想不到的结果

我有一个数据框，其中有一列ClientAccount，其中包含大量我想要过滤掉的测试数据。

要查找包含测试客户端的行数，我执行以下操作:

test_users = order_data[order_data['ClientAccount'].str.contains("DEMO|test")==True]

返回名称:ClientAccount，长度:2493

很酷，71.458 原始行中有 2.493 行。

那么为了获取 2.493 行之外的所有内容，我不应该做相反的事情吗？

order_data = order_data[order_data['ClientAccount'].str.contains("DEMO|test")==False]

虽然这给出了 48.046 行，但这有什么意义呢？我错过了什么？

最佳答案

我认为有 NaN 或 None 值，因此可以在 str.contains 中使用参数 na 。同样对于反转 bool 掩码(True + False Series)，请使用~:

mask = order_data['ClientAccount'].str.contains("DEMO|test", na=False)

test_users1 = order_data[mask]
test_users2 = order_data[~mask]

<小时/>

示例:

order_data = pd.DataFrame({'ClientAccount':['DEMO ss','test f','dfd', None, np.nan, 'test']})
print (order_data)
  ClientAccount
0       DEMO ss
1        test f
2           dfd
3          None
4           NaN
5          test

mask = order_data['ClientAccount'].str.contains("DEMO|test", na=False)

test_users1 = order_data[mask]
test_users2 = order_data[~mask]

print (test_users1)
  ClientAccount
0       DEMO ss
1        test f
5          test

print (test_users2)
  ClientAccount
2           dfd
3          None
4           NaN

如果没有参数，我会收到错误:

mask = order_data['ClientAccount'].str.contains("DEMO|test")

ValueError: cannot index with vector containing NA / NaN values

关于python - Pandas 对字符串的列过滤给出了意想不到的结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45916652/

上一篇：python - 如何在JS代码中正确使用gettext？

下一篇：python - 使用 Direct Kafka API 运行 Spark 流应用程序所需的最佳资源是什么？

python - AWS 产品 API : what is the correct regional location and url for us west 2

python - PANDAS 从 df 删除一系列行

pandas - 在 Pandas 时间序列日期列表中查找当月的最大日期

python-3.x - Pandas 滚动意味着不要将 DataFrame 中的数字更改为 NaN

python - Pandas 数据框列上的子字符串

python - 如何删除 CGI 正在显示的临时文件(图像)？

python - 在命令行中运行带有别名的 python 命令，如 npm

Python pandas hub_table 多个时间索引

python - bool 值到一个列表中的列名，dataframe pandas python