Python Pandas - 根据列表删除多个值

我正在尝试从模糊匹配列表中项目的数据框中删除值。

我有一个数据框 (test_df)，如下所示:

   id          email         created_at      
0  1   son@mail_a.com   2017-01-21 18:19:00  
1  2   boy@mail_b.com   2017-01-22 01:19:00  
2  3  girl@mail_c.com   2017-01-22 01:19:00

我有一个包含数百个电子邮件域的列表，我正在从一个 txt 文件中读取这些域，如下所示:

mail_a.com
mail_d.com
mail_e.com

我试图从数据框中删除包含匹配电子邮件域的任何行，使用:

email_domains = open('file.txt', 'r')
to_drop = email_domains.read().splitlines()    
dropped_df = test_df[~test_df['email'].isin(to_drop)]
    print(test_df)

所以，结果应该是这样的:

   id          email         created_at       
0  2   boy@mail_b.com   2017-01-22 01:19:00  
1  3  girl@mail_c.com   2017-01-22 01:19:00

但是带有“son@mail_a.com”的第一行没有被删除。有什么建议么？

最佳答案

从电子邮件中解析域名非常容易，所以我们可以先使用 .str.split('@') 解析域名，然后使用 isin()方法:

In [12]: df[~df.email.str.split('@').str[1].isin(domains.domain)]
Out[12]:
   id            email           created_at
1   2   boy@mail_b.com  2017-01-22 01:19:00
2   3  girl@mail_c.com  2017-01-22 01:19:00

哪里:

In [13]: domains
Out[13]:
       domain
0  mail_a.com
1  mail_d.com
2  mail_e.com

关于Python Pandas - 根据列表删除多个值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43596020/

上一篇：python - 在 flask 中发送帖子请求

下一篇：python - itertools.product 的更快替代品

相关文章：

python - 如何获取所有 Python 类型的列表(以编程方式)？

python - 从 for 循环内部保存数据帧

python - 在 countvectorizer() 中找不到 get_feature_names

pandas - 如何检查 pandas 数据框中一列对另一列的依赖关系

Python主控

python - 没有括号的 "raise exception()"和 "raise exception"有区别吗？

python - 如何按特定月份/日期过滤日期数据框？

python - Pandas 数据框中的字典列

Python urllib2 : Reading content body even during HTTPError exception?

python - 使用 re.findall() 替换所有匹配项