python - 数据框过滤列列表中的空值

所以，我有一个像这样的df，

ID,A,B,C,D,E,F,G
1,123,30,3G,1,123,30,3G
2,456,40,4G,NaN,NaN,NaN,4G
3,789,35,5G,NaN,NaN,NaN,NaN

我还有一个列表，其中包含 df header 列表的子集，如下所示，

header_list = ["D","E","F","G"]

现在我想从 df 获取 header_list 中所有列名称包含空值的记录。

预期输出:

ID,A,B,C,D,E,F,G
3,789,35,5G,NaN,NaN,NaN,NaN

我试过了， new_df = df[df[header_list].isnull()] 但这会抛出错误，ValueError:条件需要 bool 数组，而不是 float64

我知道我可以做这样的事情

new_df = df[(df['D'].isnull()) & (df['E'].isnull()) & (df['F'].isnull()) & (df['G'].isnull())]

但我不想像这样硬编码。那么有更好的方法吗？

最佳答案

您可以使用以下方式过滤此内容:

df[df[header_list]<b>.isnull().all(axis=1)</b>]

因此，我们检查一行是否包含 .all() 值为 .isnull() 的值。

对于给定的示例输入，这给出了预期的输出:

>>> df[df[header_list].isnull().all(axis=1)]
     A   B   C   D   E   F    G
3  789  35  5G NaN NaN NaN  NaN

.all(axis=1) [pandas-doc]因此，如果该行的所有列都为 True，则该行返回 True，否则返回 False。因此，对于给定的示例输入，我们得到:

>>> df[header_list]
     D      E     F    G
1  1.0  123.0  30.0   3G
2  NaN    NaN   NaN   4G
3  NaN    NaN   NaN  NaN
>>> df[header_list].isnull()
       D      E      F      G
1  False  False  False  False
2   True   True   True  False
3   True   True   True   True
>>> df[header_list].isnull().all(axis=1)
1    False
2    False
3     True
dtype: bool

关于python - 数据框过滤列列表中的空值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57240053/

上一篇：python - self.direction = pygame.Vector2(1, 0)#Vector2(1,0) ;属性错误: 'module' object has no attribute 'Vector2'

下一篇：python - 如何使用 pysimplegui 在新行中打印列表的每个元素？

python - 在使用 OOP 尝试钻石形状问题时，Python 中发生了什么

python - 在 Python 中，循环引用的引用计数是多少，为什么？

Python:计数直到列表中的元素是元组

django - django后端中的身份验证

Python2.7 : How to create bar graph using index and column information?

python - 使用 Google App Engine (Python) 返回数据(json、xml 等)的 jQuery Post

python - 在 set 的子类上调用 super().__repr__() 时出现意外行为？

python - 为什么子模块的代码会尝试直接和通过相对导入来导入同级子模块？

django - 如何在 django rest 框架模型中创建 created_by 字段？