python - 如何过滤 pandas 数据框中的 NaN 行的连续数据行？

标签 python pandas dataframe filter

我有一个如下所示的数据框。有 >=1 个连续行，其中 y_l 被填充，y_h 为 NaN，反之亦然。当 NaN 之间有超过 1 条连续填充的行时，我们只想保留 y_l 最低或 y_h 最高的行。例如在下面最后 3 行的 df 中，我们只保留第二行并丢弃其他两行。实现这一目标的明智方法是什么？

df = pd.DataFrame({'y_l': [NaN, 97,95,98,NaN],'y_h': [90, NaN,NaN,NaN,95]}, columns=['y_l','y_h'])

>>> df

   y_l   y_h
0  NaN   90.0
1  97.0  NaN
2  95.0  NaN
3  98.0  NaN
4  NaN   95

期望的结果:

     y_l  y_h
0    NaN  90.0
1    95.0  NaN
2    NaN   95

最佳答案

您需要创建新列或系列来区分每个连续项，然后使用groupby聚合 agg ，最后更改列的顺序使用 reindex :

a = df['y_l'].isnull()
b = a.ne(a.shift()).cumsum()
df = (df.groupby(b, as_index=False)
        .agg({'y_l':'min', 'y_h':'max'})
        .reindex(columns=['y_l','y_h']))
print (df)
    y_l   y_h
0   NaN  90.0
1  95.0   NaN
2   NaN  95.0

详细信息:

print (b)
0    1
1    2
2    2
3    2
4    3
Name: y_h, dtype: int32

关于python - 如何过滤 pandas 数据框中的 NaN 行的连续数据行？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46876504/

上一篇：python - Cosmos DB - 使用 Python 删除文档

下一篇：python - 将 py.test 的输出作为对象读取

相关文章：

regex - 如何将单列中的内容拆分为 R 中的两个单独的列？

python - 使用公共(public)索引从两个单独的数据框中划分两个单独的列

python - 如何让PyPI自动安装依赖

python - 高效top K PostgreSQL

python - 平衡 numpy 数组与过采样

python - 使用 FOR 循环完成 OR

Python Unittest shortDescription 打印出 None

python - 来自给定类别值的所有可能组合的数据框

python - 将 DataFrame 列拆分为两个 + MultiIndex

python - 将 Pandas 单元格中的列表拆分为多列