python - 如果相应的 IDs pandas 中只存在值，则删除 NaNs

我有这个数据框

 Id,ProductId,Product
  1,100,a
  1,100,x
  1,100,NaN
  2,150,NaN
  3,150,NaN
  4,100,a
  4,100,x
  4,100,NaN

这里我想删除一些包含 NaN 的行和一些我不想删除的行。删除标准如下。我只想删除那些 NaNs 行，其 Id 已经包含 Product 列中的值。例如，这里的 Id1 已经在 Product 列中有值并且仍然包含 NaN，所以我想删除该行。但是对于 id2，Product 列中只存在 NaN。所以我不想删除那个。同样对于 Id3，Product 列中只有 NaN 值，我也想保留它。

最终输出是这样的

Id,ProductId,Product
1,100,a
1,100,x
2,150,NaN
3,150,NaN
4,100,a
4,100,x

最佳答案

如果存在替代方案，请不要使用 groupby，因为速度很慢。

vals = df.loc[df['Product'].notnull(), 'Id'].unique()
df = df[~(df['Id'].isin(vals) & df['Product'].isnull())]
print (df)
   Id  ProductId Product
0   1        100       a
1   1        100       x
3   2        150     NaN
4   3        150     NaN
5   4        100       a
6   4        100       x

解释:

首先获取所有具有非缺失值的Id:

print (df.loc[df['Product'].notnull(), 'Id'].unique())
[1 4]

然后用缺失值检查这些组:

print (df['Id'].isin(vals) & df['Product'].isnull())
0    False
1    False
2     True
3    False
4    False
5    False
6    False
7     True
dtype: bool

反转 bool 掩码:

print (~(df['Id'].isin(vals) & df['Product'].isnull()))
0     True
1     True
2    False
3     True
4     True
5     True
6     True
7    False
dtype: bool

最后按 boolean indexing 过滤:

print (df[~(df['Id'].isin(vals) & df['Product'].isnull())])
   Id  ProductId Product
0   1        100       a
1   1        100       x
3   2        150     NaN
4   3        150     NaN
5   4        100       a
6   4        100       x

关于python - 如果相应的 IDs pandas 中只存在值，则删除 NaNs，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53281147/

python - 如果相应的 IDs pandas 中只存在值，则删除 NaNs

上一篇：Python:检查 2 个列表是否一起增加的任何优化方法？

下一篇：python - 是否有更 Pythonic 的方式通过字符串名称访问函数？