python - 如果包含单个 NaN 并组合列，则将整个组设置为 NaN

我有一个df

a  b  c    d
0  1  nan  1
0  2  2    nan     
0  2  3    4
1  3  1    nan
1  1  nan  3
1  1  2    3
1  1  2    4

我需要按 a 和 b 进行分组，然后如果 c 或 d 在组内包含 1 个或多个 nan，我希望特定列中的整个组都是 nan:

a  b  c    d
0  1  nan  1
0  2  2    nan     
0  2  3    nan
1  3  1    nan
1  1  nan  3
1  1  nan  3
1  1  nan  4

然后将c和d组合起来，就不再有nan了

a  b  c    d    e
0  1  nan  1    1
0  2  2    nan  2   
0  2  3    nan  3
1  3  1    nan  1
1  1  nan  3    3
1  1  nan  3    3
1  1  nan  4    4

最佳答案

您需要检查每个组是否为 nan，然后设置适当的值(nan 或现有值)，然后使用 combine_first() 组合列。

from io import StringIO
import pandas as pd
import numpy as np
df = pd.read_csv(StringIO("""
a b c d
0 1 nan 1
0 2 2 nan
0 2 3 4
1 3 1 nan
1 1 nan 3
1 1 2 3
1 1 2 4
"""), sep=' ')

for col in ['c', 'd']:
    df[col] = df.groupby(['a','b'])[col].transform(lambda x: np.nan if any(x.isna()) else x)

df['e'] = df['c'].combine_first(df['d'])
df
    a   b   c   d   e
0   0   1   NaN 1.0 1.0
1   0   2   2.0 NaN 2.0
2   0   2   3.0 NaN 3.0
3   1   3   1.0 NaN 1.0
4   1   1   NaN 3.0 3.0
5   1   1   NaN 3.0 3.0
6   1   1   NaN 4.0 4.0

关于python - 如果包含单个 NaN 并组合列，则将整个组设置为 NaN，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61563551/

上一篇：python - “tee”不被识别为内部或外部命令、可操作程序或批处理文件

下一篇：r - 在数据框中保留两个以上的值

相关文章：

Python - 使用现有格式保存新的 Excel 工作表

python - 从平面 csv 创建嵌套的 JSON

python - the_model = TheModelClass(*args, **kwargs) 是什么意思？

python - 如何有选择地填充使用较少时间的 na 值 30,800 个数据点

python - 如何根据某些条件转换(长到宽)数据框

python - 计算列中连续真实值的数量

python - 如何识别 Pandas 中仅包含元组的列？

python - Pyro4 无法在计算机之间连接

python - 当 dll 具有嵌入式 Python 解释器时，将 dll(用 C++ 编程)中的函数导入到 Python 脚本中

python - Python 2.5 的抽象类