python - pandas 棘手的列内逻辑

我有一个包含三列的 DataFrame，t、b 和 h:

              t          b           h
0           NaN      False           6
1      6.023448      False          38
2     12.996233      False          46
3      2.484907      False          67
4      5.062595      False          81
5      4.624973      False          82
6      3.367296      False          38
7      3.688879      False          53
8      6.926577       True          38
9     14.972346      False          81
10    14.442651      False          78
11     3.367296      False          67
12     5.236442      False          46
13     5.298317       True           8

我想生成一个新列，它向后传播 h 的每个实例的值，其中 b==True ，并且仅传播到下一个此类实例或第一次出现 t>9.5。其余的用 NaN 填充。这是我需要的输出示例:

              t          b           h       i
0           NaN      False           6     NaN
1      6.023448      False          38     NaN
2     12.996233      False          46      38
3      2.484907      False          67      38
4      5.062595      False          81      38
5      4.624973      False          82      38
6      3.367296      False          38      38
7      3.688879      False          53      38
8      6.926577       True          38      38
9     14.972346      False          81     NaN
10    14.442651      False          78       8
11     3.367296      False          67       8
12     5.236442      False          46       8
13     5.298317       True           8       8

我想避免迭代行，因为我有数百万行。我尝试使用 where 获取 b==True 实例，然后使用 bfill 选项获取 fillna 实例，但无法告诉他何时开始填充。另外，这将应用到groupby中的各个组，因此我需要一个函数，将一列添加到其参数并返回整个框架

def get_i(x):
    x['i']=x['h'].where(x['b']==True).fillna(value=None,method='backfill').dropna()
    return x

最佳答案

您可以使用:

#create NaN where False values
df['i'] = np.where(df.b, df.h, np.nan)
#bfill all NaN
df['i'] = df.i.fillna(method='bfill')

#create NaN by condition
a = df[::-1].groupby('i')['t'].apply(lambda x: (x > 9.5).shift().cumsum()) >= 1
df['i'] = df.i.mask(a, np.nan)

print (df)
            t      b   h     i
0         NaN  False   6   NaN
1    6.023448  False  38   NaN
2   12.996233  False  46  38.0
3    2.484907  False  67  38.0
4    5.062595  False  81  38.0
5    4.624973  False  82  38.0
6    3.367296  False  38  38.0
7    3.688879  False  53  38.0
8    6.926577   True  38  38.0
9   14.972346  False  81   NaN
10  14.442651  False  78   8.0
11   3.367296  False  67   8.0
12   5.236442  False  46   8.0
13   5.298317   True   8   8.0

关于python - pandas 棘手的列内逻辑，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37676413/

python - pandas 棘手的列内逻辑

上一篇：python - 指向 Ctypes 中的 c_int16 数组缓冲区的指针

下一篇：python - 将 python df.replace 与 dict 一起使用不会永久更改值